Question: Picard 'CollectRnaSeqMetrics' run on hg19 Tophat BAM in Galaxy yields no exonic reads, all intergenic
0
gravatar for Jake1
23 months ago by
Jake110
USA
Jake110 wrote:

Greetings, I am trying to obtain basic alignment metrics for human 50bp-paired end RNA-seq files (fastq) I processed and aligned to hg19 with Tophat2 galaxy, and have so far failed using Picard 'CollectRnaSeqMetrics' in the main/public instance of Galaxy.

The output of this tool indicates all my reads are intergenic (literally no exonic mapping), which is not correct according to the bigwigs/Bams I can visualize in UCSC GB or IGV.

I get the same result (all intergenic reads) irrespective of whether I use straight Tophat2 output (bam), or Tophat2 output2 sorted with Picard Tools in Galaxy (sorted by coordinate bam).

Ditto whether I select the first '11 fields' as output for the RefFlat file imported to Galaxy from UCSC GB (hg19 RefSeq, per instructions) or try importing a gtf file of hg19 from UCSC GB.

Can anyone tell me where I am going wrong? This must be simple and obvious, but I don't see it. Help much appreciated.

Suspicions/questions you may have about parameters run for 'CollectRnaSeqMetrics' in Picard Tools:

1) reference genome: it's hg19 canonical, same as Tophat2 alignment that generated BAM in question

2) Gene annotations in refFlat form: I selected the first 11 fields from table browser in hg19 (tool instructions, but also tried 'all fields' and '.gtf' as refFlat output to Galaxy)

3) RNA-seq library strand specificity: 'none' (paired end)

4) assume sorted: No (but whether BAM is Picard coordinate -sorted or -unsorted doesn't change wholly intergenic mapping)

5) the level(s) at which to accumulate metrics: all reads

6) remaining parameters = default

PF_ALIGNED_BASES = 3629418122 INTERGENIC_BASES = 3629418122 CODING_BASES =0 INTRONIC_BASES =0

Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/galaxy-repl/main/scratch -Xmx7680m -Xms256m CollectRnaseqMetrics, inputs & parameters

p.s. After filtering by quality (>20), I aligned my fastq files to hg19 canonical, and the resulting BAM file is 7GB (> 30k reads for each F and R fastq file).

help sincerely appreciated,

jake

rna-seq tophat galaxy • 1.5k views
ADD COMMENTlink modified 22 months ago by y.hoogstrate460 • written 23 months ago by Jake110
1

When in doubt, look at the BAM file in IGV and see if the results of picard seem reasonable compared to what you visually observe.

ADD REPLYlink written 23 months ago by Devon Ryan1.9k

Mr. Ryan, thank you for your response (A+ Biostar). I checked both in IGV and UCSC GB as you suggested, and having 0 exonic reads doesn't seem to be the problem. This has to be something with the reference file format, right? Frustrating.

0 exonic reads doesn't seem to be my problem

ADD REPLYlink modified 23 months ago • written 23 months ago by Jake110
1

Ah, it looks like you tried to make the "refFlat" file from the UCSC table browser. While one would expect that to work, it won't (I agree that this is highly annoying). Here is a link for the file you'll need. Upload that to Galaxy and I expect Picard will be happy.

ADD REPLYlink written 23 months ago by Devon Ryan1.9k
1

Thanks again. I broke-down and downloaded the new Picard Tools and am trying this command-line (https://broadinstitute.github.io/picard/index.html#QuickStart). I have 35 BAMs and am unclear yet whether I have to sort all these w/Picard so I'll still be running it 5 years from now.

The refFlat file I downloaded from UCSC GB did not work (error: " Wrong number of fields in refFlat file hg19gtf.txt at line 1"). However, the refFlat file you provided as a link did work. Thus= you remain King of the Galaxy.

ADD REPLYlink written 23 months ago by Jake110
1

They probably need to be sorted, but whether you do it with picard or something else doesn't matter. So if you have them from Galaxy already (in which case they're already sorted), then don't bother redoing that.

ADD REPLYlink written 23 months ago by Devon Ryan1.9k
1
gravatar for y.hoogstrate
22 months ago by
y.hoogstrate460
Netherlands
y.hoogstrate460 wrote:

Support for GTF and GFF3 have been merged just a while ago (https://github.com/galaxyproject/tools-devteam/pull/395). Hopefully this will be updated at main Galaxy soon :-)

ADD COMMENTlink written 22 months ago by y.hoogstrate460
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour