Question: RNA STAR "empty"
0
gravatar for k.witmer
13 months ago by
k.witmer10
k.witmer10 wrote:

Hi there,

I wanted to run RNA STAR to align my RNAseq data. The genome I am working on is not in the directory, but I added a genome fasta file as well as a gff3 file when running the program. however, it somehow doesn't finish with the analysis. This is what I get in Galaxy:

Oct 10 11:19:33 ..... started STAR run Oct 10 11:19:33 ... starting to generate Genome files Oct 10 11:19:34 ... starting to sort Suffix Array. This may take a long time... Oct 10 11:19:35 ... sorting Suffix Array chunks and saving them to disk...

and then it stops and the file is empty...

can you please let me know what I am doing wrong? many thanks

rna-seq galaxy • 691 views
ADD COMMENTlink modified 13 months ago by Jennifer Hillman Jackson25k • written 13 months ago by k.witmer10
0
gravatar for Jennifer Hillman Jackson
13 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

Is your custom genome really large? The output from an assembly that has not been filtered, or perhaps just NGS reads? This could produce failure due to the job exceeding resources.

A mismatch between the reference annotation and the reference genome could also be a source of a problem. The "chromosome" identifiers in each must be exactly the same (and present in that format in any BAM datasets).

These and other common reasons for problems are covered in FAQs here:

If you cannot determine the failure reason, and are working at https://usegalaxy.org or can reproduce the problem there, would you please send in a bug report from the error dataset? If this was not an error dataset (red) but rather an empty successful dataset (green), a shared history link can be sent to galaxy-bugs@lists.galaxyproject.org. Be sure to leave the inputs and outputs undeleted. In the comments, include a link to this post and any other info you would like to add. From there, we'll troubleshoot the problem through email using the bug report/shared history as a reference. All reported data is private to our small internal team and is never made public.

Thanks, Jen, Galaxy team

ADD COMMENTlink modified 13 months ago • written 13 months ago by Jennifer Hillman Jackson25k

Hi Jen,

thanks for your help. The custom genome is not large, it is 18Mb. I sorted it according to chromosomes (14 chromosomes, 2 mitochondrial genomes and some contigs), but I just saw that the gff file is slightly differently sorted, do you think this could be a problem? I am working at usegalaxy.org, and it didn't give me an error, so I shared the history link with you. thanks!

ADD REPLYlink written 13 months ago by k.witmer10

Hi Jen,

I tried on useGalaxy.org again with adjusted input datasets, but have the same outcome...i.e. the output is empty. Would you have any more suggestions please?

ADD REPLYlink written 13 months ago by k.witmer10

Hi Kathin, I am looking into your email and the comments about the smaller genome size/index building requirements. The tool might need another form option added. More feedback soon. Meanwhile, maybe try using HISAT2 instead? Thanks! Jen

ADD REPLYlink written 13 months ago by Jennifer Hillman Jackson25k

Thanks Jen! Will use HiSat2 for now, but wanted to try STAR as I am interested in antisense reads...

ADD REPLYlink written 13 months ago by k.witmer10

I wrote back this morning but also updating here:

Most people should use HISAT2 instead of RNASTAR.

The root of the problem was a smaller genome that requires a different indexing strategy than the tool currently supports. A ticket to request an enhancement the tool may be opened.

ADD REPLYlink written 13 months ago by Jennifer Hillman Jackson25k

Quick update: The genome indexing does not seem to be the core issue. Rather, when the reference annotation is included, no hits result when "Filter alignments containing non-canonical junctions" is set to "yes". There could a reference genome mismatch problem (data is from a different genome version or identifier mismatches) or a format problem with the annotation file. There could also simply no overlaps that meet the other criteria set.

You could try running RNA STAR without annotation, then examine the overlaps between those hits, the annotation, and form settings. The same problem would come up if that annotation and same filters were used with HISAT2.

ADD REPLYlink written 12 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 180 users visited in the last hour