Problems with annotation file

Question: Problems with annotation file

2.2 years ago by

Dear community,

I have successfully aligned with Bowtie2 my sequencing reads. Now the problem came when I was using Cufflinks to calculate FPKM value. For some reason, the program does not recognize properly the annotation file. Thinking about possibilities, maybe doing the calculation without the annotation file and annotating later could be an option? Any better suggestion?

Thank you very much!

rna-seq annotation gtf cufflinks input • 742 views

ADD COMMENT • link •

modified 2.2 years ago by Jennifer Hillman Jackson ♦ 25k • written 2.2 years ago by juliaquingo • 0

2.2 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

Most mappers are more lenient with inputs than downstream tools.

Double check for the following:

Did Bowtie2 really make proper use of the annotation?
Were the input fastq data really in fastqsanger format? If not, all sorts of issues can come up, not just poor mapping rates. https://wiki.galaxyproject.org/Support#FASTQ_Datatype_QA
Be aware the Bowtie2 does not add in XS (splice site) annotation. The Tuxedo tool suite (Cufflinks, CuffMerge, Cuffdiff, etc) makes significant use of this attribute to generate a full complement of statistics. If spliced data, use Tophat2 or HISAT2 instead. If truly working with a genome with unspliced transcripts, consider alternate RNA-seq tools designed for these types of genomes. See the Tool Shed for currently wrapped options (other tools could be wrapped): http://usegalaxy.org/toolshed
Is the datatype assigned for the reference annotation file (at all, but especially correctly)? https://wiki.galaxyproject.org/Support#Tool_doesn.27t_recognize_dataset
Is there a reference genome mismatch problem? Try converting BAM-to-SAM on the Bowtie2 output, using the "output header only" option and compare the chromosome identifiers between all inputs (BAM/SAM inputs, reference annotation, reference genome used - including custom genomes/builds).
Consider using an alternate reference annotation dataset. iGenomes is the standard. Download the tar file locally, unpack, and load just the genes.gtf file as a dataset. Meet the same content criteria in the genes.gtf file (tss_id, p_id, and optionally, gene_name) if sourced elsewhere. Otherwise, many statistics will be not be calculated.
- http://support.illumina.com/sequencing/sequencing_software/igenome.html
- http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/#cuffdiff-input-files
Be aware the when using a Custom reference genome/build, the title line(s) (">") work with tools much better when the description line content is removed and only the identifier is present & and exact match for the chromosome names in other inputs. Sort of related the immediately above comment, but with more details fo when a CG is used. How to troubleshoot and correct custom genomes/builds within Galaxy:
- https://wiki.galaxyproject.org/Support#Custom_reference_genome
- https://wiki.galaxyproject.org/Learn/CustomGenomes#Troubleshooting

Hopefully one or more of these helps to track down and resolve the problems, Jen, Galaxy team

ADD COMMENT • link written 2.2 years ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »