Question: Gene Name In Cufflink/Compare/Diff
0
gravatar for Matteo Bovolenta
7.9 years ago by
Matteo Bovolenta10 wrote:
Hi all, when I run a RNASeq analysis using tophat, cufflink, coffcompare and cuffdiff by aligning my data to the RefSeq genes I obtain tables from cufflink/compare/diff which does not include the gene name, but only the NM_. Does someone knows how I can obtain all the tables with the gene name? Thank you all very much for the support, Best Regards, Matteo -- Matteo Bovolenta, PhD Dipartimento di Medicina Sperimentale e Diagnostica Sezione di Genetica Medica Universitŕ di Ferrara Via Fossato di Mortara, 74 44100 Ferrara tel +39 0532 974449(office) tel +39 0532 974502 (lab) fax +39 0532 236157 email bvlmtt@unife.it http://www.unife.it/medicina/geneticamedica http://www.bio-nmd.eu registered in ORPHANET http://www.orpha.net NOTA DI RISERVATEZZA: ai sensi del D.Lgs. 196/2003 si precisa che le informazioni contenute in questo messaggio e nei relativi allegati sono riservate ed a uso esclusivo del destinatario. Qualora il messaggio in parola Le fosse pervenuto per errore, La invitiamo ad eliminarlo senza copiarlo, a non inoltrarlo a terzi e a non farne alcun uso, dando gentilmente comunicazione all'indirizzo del mittente: bvlmtt@unife.it Grazie. CONFIDENTIALITY NOTICE: this message together with its annexes may contain confidential, proprietary or legally privileged information and is intended only for the use of the addressee named above. No confidentiality or privilege is waived or lost by any mistransmission. If you are not the intended recipient of this message you are hereby notified that you must not use, disseminate, copy it in any form or take any action in reliance on it. If you have received this message in error please delete it and any copies of it and kindly inform the sender of this e-mail by bvlmtt@unife.it  Thank you
rna-seq cuffdiff • 2.1k views
ADD COMMENTlink modified 7.9 years ago by vasu punj360 • written 7.9 years ago by Matteo Bovolenta10
0
gravatar for vasu punj
7.9 years ago by
vasu punj360
vasu punj360 wrote:
This is a knwon issue of GTF file from Ensembl Subject: [galaxy-user] Gene Name in Cufflink/compare/diff To: galaxy-user@bx.psu.edu Date: Monday, January 24, 2011, 5:05 AM Hi all, when I run a RNASeq analysis using tophat, cufflink, coffcompare and cuffdiff by aligning my data to the RefSeq genes I obtain tables from cufflink/compare/diff which does not include the gene name, but only the NM_. Does someone knows how I can obtain all the tables with the gene name? Thank you all very much for the support, Best Regards, Matteo -- Matteo Bovolenta, PhD Dipartimento di Medicina Sperimentale e Diagnostica Sezione di Genetica Medica Universitŕ di Ferrara Via Fossato di Mortara, 74 44100 Ferrara tel +39 0532 974449(office) tel +39 0532 974502 (lab) fax +39 0532 236157 email bvlmtt@unife.it http://www.unife.it/medicina/geneticamedica http://www.bio-nmd.eu registered in ORPHANET http://www.orpha.net NOTA DI RISERVATEZZA: ai sensi del D.Lgs. 196/2003 si precisa che le informazioni contenute in questo messaggio e nei relativi allegati sono riservate ed a uso esclusivo del destinatario. Qualora il messaggio in parola Le fosse pervenuto per errore, La invitiamo ad eliminarlo senza copiarlo, a non inoltrarlo a terzi e a non farne alcun uso, dando gentilmente comunicazione all'indirizzo del mittente: bvlmtt@unife.it Grazie. CONFIDENTIALITY NOTICE: this message together with its annexes may contain confidential, proprietary or legally privileged information and is intended only for the use of the addressee named above. No confidentiality or privilege is waived or lost by any mistransmission. If you are not the intended recipient of this message you are hereby notified that you must not use, disseminate, copy it in any form or take any action in reliance on it. If you have received this message in error please delete it and any copies of it and kindly inform the sender of this e-mail by bvlmtt@unife.it  Thank you _______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
ADD COMMENTlink written 7.9 years ago by vasu punj360
Hi Matteo and Vasu, There are different ways to refer to genes. Names that start with NM_ are termed 'accession numbers,' and they are a valid way to refer to genes. Matteo, what you may want is the canonical gene name (e.g. Xkr4). If so, you'll want to use a gene annotation/reference file from UCSC; when you are getting the file, you'll want to select the table with the word 'canonical' in it. E.g. for hg19/UCSC genes, there is a table called knownCanonical that provides the canonical gene names. Thanks, J. J.
ADD REPLYlink written 7.9 years ago by Jeremy Goecks2.2k
Hello Matteo, The UCSC Genes table knownCanonical is one source for gene names, but another choice would be the RefSeq Genes track's primary table "refGene". When extracting from the UCSC Table browser, use the output file type of "all fields from selected table". The gene name will be under a column labeled as "name2" (as defined in the extended genePred format at UCSC and which is not included in BED or GTF format output file types). As the other users have pointed out, a valid GTF file format for the Cuff* programs to use is the goal. Currently, some manipulation of the available reference files from many of the common sources is necessary to get the formatting correct. The Galaxy team is aware of the issues with external files concerning format and is working on potential solutions. Thank you for your patience while we work out the kinks with the newly added tool suite, Best, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org
ADD REPLYlink written 7.9 years ago by Jennifer Hillman Jackson25k
For the ensembl annotation, you can download the gtf file from ensembl for your organism here: http://uswest.ensembl.org/info/data/ftp/index.html To use this, you need to fix it because the chromosome names are not correct (depending on your organism, it is not correct for mouse and rat at least). If you are on a mac or on a unix machine, do this from the terminal (assuming your downloaded gtf file is named ensembl.gtf): awk -F "\t" '{OFS="\t"; $1 = "chr"$1; print}' ensembl.gtf | awk -F"\t" '{OFS="\t"; if($1=="chrMT") $1="chrM"; print}' > ensembl_cleaned.gtf This changes the ensembl chromosome names from 1,2,3,4,X,MT to chr1, chr2, chr3, chrM to match the bowtie index ids. This file is unsorted, so it won't work with SAM files but it will work with the BAM files that tophat outputs. If you need to work with SAM files for some reason, this might work: sort -k 1,1 -k 4,4n infile > outfile Sorted or unsorted. if you run the reformatted gtf file in cuffcompare against itself (use it as the reference gtf and the 'test' gtf) the GTF file that is output from that cuffcompare will have all of the cds, tss all that stuff when you use it as the reference for cuffdiff. -rory
ADD REPLYlink written 7.9 years ago by Rory Kirchner40
Hi, I have run some RNASeq analysis and I am trying to get the ensembl gene annotations to show in the cuffcompare files. I have done the following: 1. Ran cufflinks analysis on the .bam files. 2. I got the .gtf file for hg19 from ensembl. Based on the email below, I replaced the the chromosome name from 1, 2, 3 etc to chr1, chr2, chr3 etc.. Then I tried to run the processed .gtf file with self through cuffcompare as recommended below, I am getting an error 3. If I try to run cuffcompare on two of my cufflinks data file and use the processed gtf file as is, I am getting the same error. Any inputs on what I am doing wrong are appreciated. I will be happy to share the history if needed. Thanks and Regards, Aarti
ADD REPLYlink written 7.8 years ago by Aarti Desai20
Hi Aarti, This is a bug and will be fixed in the next couple days. Thanks, J.
ADD REPLYlink written 7.8 years ago by Jeremy Goecks2.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 168 users visited in the last hour