P_Ids And Cufflinks

Question: P_Ids And Cufflinks

8.0 years ago by

United Kingdom

David Matthews • 630 wrote:

Just a thought, I notice that in the ensemble.gtf file the protein ids are listed as follows: chr11 protein_coding CDS 129060 129388 . - 0 gene_id "ENSG00000230724"; transcript_id "ENST00000382784"; exon_number "1"; gene_name "AC069287.3"; transcript_name "AC069287.3-201"; protein_id "ENSP00000372234" Is the p_id problem in cufflinks because the ensemble.gtf file uses the word protein_id and not p_id??? Cheers David __________________________________ Dr David A. Matthews Senior Lecturer in Virology Room E49 Department of Cellular and Molecular Medicine, School of Medical Sciences University Walk, University of Bristol Bristol. BS8 1TD U.K. Tel. +44 117 3312058 D.A.Matthews@bristol.ac.uk

rna-seq cufflinks • 2.2k views

ADD COMMENT • link •

modified 8.0 years ago by Jeremy Goecks • 2.2k • written 8.0 years ago by David Matthews • 630

8.0 years ago by

Jeremy Goecks • 2.2k

Jeremy Goecks • 2.2k wrote:

Yes, this is likely a problem. From the Cuffdiff documentation: -- Cuffdiff takes a GTF file of transcripts as input, along with two or more SAM files containing the fragment alignments for two or more samples. It produces a number of output files that contain test results for changes in expression at the level of transcripts, primary transcripts, and genes. It also tracks changes in the relative abundance of transcripts sharing a common transcription start site, and in the relative abundances of the primary transcripts of each gene. Tracking the former allows one to see changes in splicing, and the latter lets one see changes in relative promoter use within a gene. If you have more than one replicate for a sample, supply the SAM files for the sample as a single comma-separated list. It is not necessary to have the same number of replicates for each sample. Cuffdiff requires that transcripts in the input GTF be annotated with certain attributes in order to look for changes in primary transcript expression, splicing, coding output, and promoter use. These attributes are: Attribute Description tss_id The ID of this transcript's inferred start site. Determines which primary transcript this processed transcript is believed to come from. p_id The ID of the coding sequence this transcript contains. This is attribute is attached to Cuffcompare output by Cuffcompare only when it is run with a reference annotation that include CDS records. Further, differential CDS analysis is only performed when all isoforms of a gene have p_id attributes, because neither Cufflinks nor Cuffcompare attempt to assign an open reading frame to transcripts. -- Does addressing this issue prompt Cuffdiff to output data to the differential coding file? Also, you might try using gene annotation files from UCSC rather than Ensembl. Although Ensembl is mentioned in the documentation, the problems that you and others have encountered suggest that Cufflinks may have been developed using UCSC GTFs rather than Ensembl GTFs and hence UCSC GTFs may work better. J.

ADD COMMENT • link written 8.0 years ago by Jeremy Goecks • 2.2k

Can CuffDiff accept BAM as well SAM format?

ADD REPLY • link written 8.0 years ago by Loraine, Ann • 150

Yes. Cufflinks can also accept BAM files. Cuffdiff works exclusively with GTF files. J.