Greetings, I am trying to obtain basic alignment metrics for human 50bp-paired end RNA-seq files (fastq) I processed and aligned to hg19 with Tophat2 galaxy, and have so far failed using Picard 'CollectRnaSeqMetrics' in the main/public instance of Galaxy.
The output of this tool indicates all my reads are intergenic (literally no exonic mapping), which is not correct according to the bigwigs/Bams I can visualize in UCSC GB or IGV.
I get the same result (all intergenic reads) irrespective of whether I use straight Tophat2 output (bam), or Tophat2 output2 sorted with Picard Tools in Galaxy (sorted by coordinate bam).
Ditto whether I select the first '11 fields' as output for the RefFlat file imported to Galaxy from UCSC GB (hg19 RefSeq, per instructions) or try importing a gtf file of hg19 from UCSC GB.
Can anyone tell me where I am going wrong? This must be simple and obvious, but I don't see it. Help much appreciated.
Suspicions/questions you may have about parameters run for 'CollectRnaSeqMetrics' in Picard Tools:
1) reference genome: it's hg19 canonical, same as Tophat2 alignment that generated BAM in question
2) Gene annotations in refFlat form: I selected the first 11 fields from table browser in hg19 (tool instructions, but also tried 'all fields' and '.gtf' as refFlat output to Galaxy)
3) RNA-seq library strand specificity: 'none' (paired end)
4) assume sorted: No (but whether BAM is Picard coordinate -sorted or -unsorted doesn't change wholly intergenic mapping)
5) the level(s) at which to accumulate metrics: all reads
6) remaining parameters = default
PF_ALIGNED_BASES = 3629418122 INTERGENIC_BASES = 3629418122 CODING_BASES =0 INTRONIC_BASES =0
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/galaxy-repl/main/scratch -Xmx7680m -Xms256m
p.s. After filtering by quality (>20), I aligned my fastq files to hg19 canonical, and the resulting BAM file is 7GB (> 30k reads for each F and R fastq file).
help sincerely appreciated,