Question: Problem with collectrnaseqmetrics (Galaxy)
3 months ago by
United States
dagreen0 wrote:

I am attempting to run CollectRnaSeqMetrics on a tophat run (with my gff3 annotations and fasta genome for reference). To note, I am dealing with Danaus plexippus (monarch butterfly). The program "successfully" executes, yet it tells me that all of my reads/bases are intergenic, which is clearly not true. When I simultaneously visualize the tophat hits with my gff3 annotations, I see precisely the info I'm looking for i.e. the overlap of reads onto the gene models. This leads me to believe that intervals are being scrambled somewhere along the way. I "tidied" my original gff3 with genometools and converted the tidy gff3 to refFlat with gff3ToGenPred (UCSC). By quick inspection, the intervals in the refFlat file are unchanged from the original gff3 (that I can visualize). Anyone have an idea of what may be happening here? Any and all help appreciated!

Thank you for asking the question. We can test and address the problems with the tool once the Main server is back up. I can let you know that there were prior issues with it (input formatting + dependency related).

The server is down for maintenance. More details will be updated to this post during the day. Thanks, Jen, Galaxy team


Update: This is the downtime announcement. We missed getting this up as a banner on before the system was taken down. Very sorry for the confusion!!

The server is back up.

Are you working at If so or can reproduce the problem there, please send an email to with a link to the shared history. All inputs and the results must be left undeleted (or changed back to active) in order for us to review the inputs/results to provide feedback. Please include a link to this Biostars post so the two can be associated. Also please note the number of the dataset(s) with the result that you find to be incorrect.

Thanks, Jen, Galaxy team

Hi Jen, I appreciate your response. The reply email address ( is being bounced back to me as address not found. Is it correct? Thanks, Delbert

Sorry, typo. Corrected it above and is also here:

3 months ago by
United States
Jennifer Hillman Jackson22k wrote:

Hello Delbert,

The problem is due to the format of the annotation dataset (#5). It is in genePred format, but not refFlat format. See this FAQ (near end) to understand the difference.

Specifically, I noticed that dataset 5 needs one more column added between column 1 and column 2 - so that column 3 is the "chromosome" while also placing the latter fields in the expected columns for refFlat format.

Try correcting the format and then rerun the tool. You may need to resort the input BAM dataset first (SortSam) and wrap the lines of the custom reference genome (NormalizeFasta). Both are required by many downstream tools for correct, successful results.

Hope this helps! Jen, Galaxy team

