Question: Cuffcompare .Tmap File
0
gravatar for Aleks Schein
7.4 years ago by
Aleks Schein60
Aleks Schein60 wrote:
Dear all, I am trying to run Cufflinks installation in Galaxy on Solexa RNAseq samples from HeLa cells. Running Cuffcompare, according to the manual, should produce a tmap file, listing FMI values for detected isoforms. However, my files only have either "100" or "0" in FMI field. And FPKM column contains only zeros. Is there something wrong with my input files, or parameter settings? Or is it rather a specific issue with Galaxy Cufflink's installation? The data in question is available here: http://main.g2.bx.psu.edu/u/aleks/h/guided-assemblyadvanced Thanks, Aleks Schein
rna-seq cufflinks • 1.8k views
ADD COMMENTlink modified 7.4 years ago by Jennifer Hillman Jackson25k • written 7.4 years ago by Aleks Schein60
0
gravatar for Jennifer Hillman Jackson
7.4 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hello Aleks, Chromosome names must be exact between all input files). Also, the SAM file and GTF file both must be sorted the same way. This FAQ may be of interest: http://main.g2.bx.psu.edu/u/jeremy/p/transcriptome-analysis-faq If still a problem, please share the history with me directly either using my email address or generate the share link and email to me (only). Use "Options -> Share or Publish", not just your sessions browser URL. Best, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org/ http://galaxyproject.org/
ADD COMMENTlink written 7.4 years ago by Jennifer Hillman Jackson25k
Hi Aleks, Thanks for sending the data link, this helped to narrow down the root cause of the issue. The UCSC-sourced GTF file has the attributes gene_id and transcript_id set to the same value (both as transcript_id). The result of this is that each transcript is interpreted by Cufflinks as a single gene, with no gene grouping (thus no isoforms). We have plans to develop a work-around. This would likely involve (for the refGene track in particular) the value in the UCSC's primary table refGene.name2 being swapped into the refGene GTF file's gene_id value. This would generate accurate gene-level statistics when the file is used as input to Cufflinks. You could do the same swap (outside of Galaxy) if you wanted to give it a try and have resource. Very sorry for the current inconvenience, Best, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/Support
ADD REPLYlink written 7.3 years ago by Jennifer Hillman Jackson25k
Dear all, I have a similar problem when using cufflinks in galaxy (net version). If I didn't select the reference annotation, I can get the FPKM values,but since no reference,I can not get the transcript or gene name. It looks like these: test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 ln(fold_change) test_stat p_value q_value significant TCONS_00000002 XLOC_000025 - chr1:33860011-33860048 q1 q2 NOTEST 1.794e+06 0 -1.79769e+308 -1.79769e+308 0.0188163 1 no There is no gene_id. However, if I use the reference annotation downloaded from ENSEMBLE.I can get the gene_ids, but there FPKM values are all "0": tracking_id class_code nearest_ref_id gene_id gene_short_name tss_id locus length coverage status FPKM FPKM_conf_lo FPKM_conf_hi ENSMUSG00000024232 - - ENSMUSG00000024232 Bambi - 18:3507954-3516402 - - OK 0 0 0 ENSMUSG00000091539 - - ENSMUSG00000091539 Vmn1r238 - 18:3122454-3123465 - - OK 0 0 0 Any thoughts? 2011/8/3 Jennifer Jackson <jen@bx.psu.edu>
ADD REPLYlink written 7.3 years ago by yao chen50
Hello Yao, The Ensembl-sourced reference annotation can often work with Cufflinks, however it does need to be in GTF format (the file samples listed here are not in GTF format). Also, you will need to alter the chromosome names once loaded into Galaxy. Specifically, Ensembl names chromosomes for human as "1", "2", "3", etc. and to have them match exactly with the Galaxy cashed human reference genome a "chr" needs to be added to create "chr1", "chr2", "chr3". A workflow to do this transformation is on the FAQ wiki here: http://main.g2.bx.psu.edu/u/jeremy/p/transcriptome-analysis-faq#faq5 Other issues with Ensembl GTF files have been known to pop up, so these data are not fully supported and we still do recommended using UCSC despite the missing gene_id information. But if you want to try, there is likely some sort of work-around that you could create on your own should a problem come up. Hopefully this helps, Jen -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/Support
ADD REPLYlink written 7.3 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 173 users visited in the last hour