RNA STAR "empty"

Question: RNA STAR "empty"

13 months ago by

k.witmer • 10 wrote:

Hi there,

I wanted to run RNA STAR to align my RNAseq data. The genome I am working on is not in the directory, but I added a genome fasta file as well as a gff3 file when running the program. however, it somehow doesn't finish with the analysis. This is what I get in Galaxy:

Oct 10 11:19:33 ..... started STAR run Oct 10 11:19:33 ... starting to generate Genome files Oct 10 11:19:34 ... starting to sort Suffix Array. This may take a long time... Oct 10 11:19:35 ... sorting Suffix Array chunks and saving them to disk...

and then it stops and the file is empty...

can you please let me know what I am doing wrong? many thanks

rna-seq galaxy • 691 views

ADD COMMENT • link •

modified 13 months ago by Jennifer Hillman Jackson ♦ 25k • written 13 months ago by k.witmer • 10

13 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

Is your custom genome really large? The output from an assembly that has not been filtered, or perhaps just NGS reads? This could produce failure due to the job exceeding resources.

A mismatch between the reference annotation and the reference genome could also be a source of a problem. The "chromosome" identifiers in each must be exactly the same (and present in that format in any BAM datasets).

These and other common reasons for problems are covered in FAQs here:

https://galaxyproject.org/support/#troubleshooting

If you cannot determine the failure reason, and are working at https://usegalaxy.org or can reproduce the problem there, would you please send in a bug report from the error dataset? If this was not an error dataset (red) but rather an empty successful dataset (green), a shared history link can be sent to galaxy-bugs@lists.galaxyproject.org. Be sure to leave the inputs and outputs undeleted. In the comments, include a link to this post and any other info you would like to add. From there, we'll troubleshoot the problem through email using the bug report/shared history as a reference. All reported data is private to our small internal team and is never made public.

Thanks, Jen, Galaxy team

ADD COMMENT • link modified 13 months ago • written 13 months ago by Jennifer Hillman Jackson ♦ 25k

Hi Jen,

thanks for your help. The custom genome is not large, it is 18Mb. I sorted it according to chromosomes (14 chromosomes, 2 mitochondrial genomes and some contigs), but I just saw that the gff file is slightly differently sorted, do you think this could be a problem? I am working at usegalaxy.org, and it didn't give me an error, so I shared the history link with you. thanks!

ADD REPLY • link written 13 months ago by k.witmer • 10

Hi Jen,

I tried on useGalaxy.org again with adjusted input datasets, but have the same outcome...i.e. the output is empty. Would you have any more suggestions please?

ADD REPLY • link written 13 months ago by k.witmer • 10

Hi Kathin, I am looking into your email and the comments about the smaller genome size/index building requirements. The tool might need another form option added. More feedback soon. Meanwhile, maybe try using HISAT2 instead? Thanks! Jen

ADD REPLY • link written 13 months ago by Jennifer Hillman Jackson ♦ 25k

Thanks Jen! Will use HiSat2 for now, but wanted to try STAR as I am interested in antisense reads...

ADD REPLY • link written 13 months ago by k.witmer • 10

I wrote back this morning but also updating here:

Most people should use HISAT2 instead of RNASTAR.

The root of the problem was a smaller genome that requires a different indexing strategy than the tool currently supports. A ticket to request an enhancement the tool may be opened.

ADD REPLY • link written 13 months ago by Jennifer Hillman Jackson ♦ 25k

Quick update: The genome indexing does not seem to be the core issue. Rather, when the reference annotation is included, no hits result when "Filter alignments containing non-canonical junctions" is set to "yes". There could a reference genome mismatch problem (data is from a different genome version or identifier mismatches) or a format problem with the annotation file. There could also simply no overlaps that meet the other criteria set.

You could try running RNA STAR without annotation, then examine the overlaps between those hits, the annotation, and form settings. The same problem would come up if that annotation and same filters were used with HISAT2.

ADD REPLY • link written 12 months ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »