Question: Cufflinks returns 0 FPKM
0
gravatar for Widmer, Giovanni
2.0 years ago by
US, Tufts University
Widmer, Giovanni150 wrote:

Hi, I realize that many users have posted similar questions and the galaxy team has diligently addressed dozens of similar queries. After reading whatever I could find on the subject, I am still in the dark on how to fix my annotation file, assuming that's what causes cufflinks to return FPKM of zero on all the rows. I understand that genome file and annotation file have to be matched, i.e., using the same chromosome designation. I'm copying here the first lines of the genome file (from the pig, Sus scrofa), the first row of the annotation file below, and below that the first rows of the TopHat BAM file converted to SAM.

Genome FASTA file:

susScr3_gold_GL878569.2 range=chr1:1-174389 5'pad=0 3'pad=0 strand=+ repeatMasking=none AATCGACGCCACACGCAGGCCAGTTCCGAGCTGCATCTGGGACCTGCTCC ...

GTF annotation file: chr1 susScr3_refGene start_codon 133903981 133903983 0.000000 + . gene_id "NM_214429"; transcript_id "NM_214429"; ...

TopHat SAM file: D00780:57:C8BGWANXX:8:1305:3315:34658 0 susScr3_gold_AEMK01000005.1 1 255 101M * 0 0 GGGAGCAGCAGCTGCTTCTACAGTTTGTTTTGAAATGGCGTTAGATAAAATAAGGGAATAAATAGAGGGGGTAAGAGGCAGACTCTCCATCCCCGTGTCAC BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 NH:i:1

The question I have is how do I make FASTA and GTF files compatible (assuming that's the issue)? Do I replace "chr1" with "susScr3_gold_GL878569.2 range=chr1"? BTW, I ran into a similar problem regardless if I download genome and annotation from UCSC or iGenomes.
Finally, I saw that the usegalaxy.org has the built-in pig genome, but cufflinks doesn't seem to see a cached matching annotation file, hence I used one in my history. Is it possible to install the Sus scrofa annotation file?

many thanks,

Giovanni

rna-seq cufflinks • 875 views
ADD COMMENTlink modified 2.0 years ago • written 2.0 years ago by Widmer, Giovanni150
0
gravatar for Jennifer Hillman Jackson
2.0 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

Use the built-in susScr3 genome for the mapping. This will adjust the chromosome identifiers in the Tophat output to be a match with either of the two GTF files and avoid the formatting issues with the current custom reference genome.

iGenomes is the preferred GTF. It has distinct gene_id and transcript_id values plus other attributes that these tools make use of: p_id, tss_id, and gene_name.

Use the GTF from the history along with the natively indexed susScr3 reference genome with the tool. The tool as implemented does not support indexed reference annotation.

Best, Jen, Galaxy team

ADD COMMENTlink written 2.0 years ago by Jennifer Hillman Jackson25k
0
gravatar for Widmer, Giovanni
2.0 years ago by
US, Tufts University
Widmer, Giovanni150 wrote:

Hi Jen, thanks for the quick response. After re-running Cufflinks in usegalaxy.org with the built-in susScr3 ref genome and the annotation file from iGenomes, I don't see any gene IDs in the cufflinks gene expression output file. Instead I see cuff.1, cuff.2 etc. under tracking ID and gene ID. Shoudn't I be seeing gene IDs?

The first row of my GTF file looks like this:

1 ensembl exon 159167 159952 . - . exon_id "ENSSSCE00000218515"; exon_number "8"; exon_version "1"; gene_biotype "protein_coding"; gene_id "ENSSSCG00000030218"; gene_name "ENSSSCG00000030218"; gene_source "ensembl"; gene_version "1"; p_id "P15946"; transcript_biotype "protein_coding"; transcript_id "ENSSSCT00000026690"; transcript_source "ensembl"; transcript_version "1"; tss_id "TSS17319";

My cufflink gene expression output: tracking_id class_code nearest_ref_id gene_id gene_short_name tss_id locus length coverage FPKM FPKM_conf_lo FPKM_conf_hi FPKM_status CUFF.1 - - CUFF.1 - - GL892102-1:1-148 - - 3500.52 3222.63 3778.41 OK

and these are my cufflinks input parameters:

Input Parameter Value SAM or BAM file of aligned RNA-Seq reads 22: TopHat on data 17: accepted_hits
Max Intron Length 300000
Min Isoform Fraction 0.1
Pre MRNA Fraction 0.15
Use Reference Annotation Use reference annotation guide
Reference Annotation 23: Galaxy146-[Susscrofa_genes.gtf].gtf
3prime overhang tolerance 600
Intronic overhang tolerance 50
Disable tiling of reference transcripts No
Perform Bias Correction No
Use multi-read correct No
Apply length correction Cufflinks Effective Length Correction
Global model (for use in Trackster) No dataset.
Set advanced Cufflinks options No
Job Resource Parameters no

thanks,

Giovanni

ADD COMMENTlink written 2.0 years ago by Widmer, Giovanni150

The chromosome number is not in the UCSC format in the GTF and the gene_name is an Ensembl identifier. Perhaps the wrong version was downloaded from iGenomes? You want the one associated with UCSC, named susScr3, if using the built-in index named the same.

http://support.illumina.com/sequencing/sequencing_software/igenome.html

The bold susScr3 is the one to get

Sus scrofa (Pig)
Ensembl Sscrofa10.2 Sscrofa9
NCBI Sscrofa10.2 Sscrofa10 Sscrofa9.2
UCSC susScr3 susScr2

ADD REPLYlink written 2.0 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour