Question: Problems with annotation file
gravatar for juliaquingo
22 months ago by
juliaquingo0 wrote:

Dear community,

I have successfully aligned with Bowtie2 my sequencing reads. Now the problem came when I was using Cufflinks to calculate FPKM value. For some reason, the program does not recognize properly the annotation file. Thinking about possibilities, maybe doing the calculation without the annotation file and annotating later could be an option? Any better suggestion?

Thank you very much!

ADD COMMENTlink modified 22 months ago by Jennifer Hillman Jackson25k • written 22 months ago by juliaquingo0
gravatar for Jennifer Hillman Jackson
22 months ago by
United States
Jennifer Hillman Jackson25k wrote:


Most mappers are more lenient with inputs than downstream tools.

Double check for the following:

  1. Did Bowtie2 really make proper use of the annotation?

  2. Were the input fastq data really in fastqsanger format? If not, all sorts of issues can come up, not just poor mapping rates.

  3. Be aware the Bowtie2 does not add in XS (splice site) annotation. The Tuxedo tool suite (Cufflinks, CuffMerge, Cuffdiff, etc) makes significant use of this attribute to generate a full complement of statistics. If spliced data, use Tophat2 or HISAT2 instead. If truly working with a genome with unspliced transcripts, consider alternate RNA-seq tools designed for these types of genomes. See the Tool Shed for currently wrapped options (other tools could be wrapped):

  4. Is the datatype assigned for the reference annotation file (at all, but especially correctly)?

  5. Is there a reference genome mismatch problem? Try converting BAM-to-SAM on the Bowtie2 output, using the "output header only" option and compare the chromosome identifiers between all inputs (BAM/SAM inputs, reference annotation, reference genome used - including custom genomes/builds).

  6. Consider using an alternate reference annotation dataset. iGenomes is the standard. Download the tar file locally, unpack, and load just the genes.gtf file as a dataset. Meet the same content criteria in the genes.gtf file (tss_id, p_id, and optionally, gene_name) if sourced elsewhere. Otherwise, many statistics will be not be calculated.

  7. Be aware the when using a Custom reference genome/build, the title line(s) (">") work with tools much better when the description line content is removed and only the identifier is present & and exact match for the chromosome names in other inputs. Sort of related the immediately above comment, but with more details fo when a CG is used. How to troubleshoot and correct custom genomes/builds within Galaxy:

Hopefully one or more of these helps to track down and resolve the problems, Jen, Galaxy team

ADD COMMENTlink written 22 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 90 users visited in the last hour