Question: Regarding A Cuffdiff Output
0
gravatar for Yona Kim
5.7 years ago by
Yona Kim20
Yona Kim20 wrote:
Dear galaxy users Hello. I have a quick question about Cuffdiff analysis. I have obtained two SRA files and converted them to fastq files which were uploaded to Galaxy via FTP server. My analysis was followed by Fastq groomer, Tophat, Cufflinks, Cuffcompare, and eventually Cuffdiff. (Gene annotation was also downloaded from UCSC table browser in GTF format) I've downloaded gene differential expression testing, one of the output files of Cuffdiff, and viewed it in excel sheet. However, I have only zeros recorded for value_1, value_2, log2, test_stat and only ones recorded for p_value and q_value. Is it likely that I might have obtained wrong gene annotation file and caused this problem? Thank you Yona Kim Department of Genetics Rutgers University - New Brunswick Campus
rna-seq cufflinks • 1.4k views
ADD COMMENTlink modified 5.6 years ago by Jennifer Hillman Jackson25k • written 5.7 years ago by Yona Kim20
0
gravatar for Jennifer Hillman Jackson
5.6 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hi Yona, Yes, the GTF file is most likely the problem due to it lacking certain attributes that Cuffdiff requires to perform these calculations. You will also want to double check that the reference genome and GTF file (where you source it next) are an exact match - both the genome build and the identifier format. If either are not a match, you will not get the expected or full results that Cuffdiff can produce. This wiki has some help; http://wiki.galaxyproject.org/Support#Interpreting_scientific_results See "Tools on the Main server: Example ? RNA-seq analysis tools." The links to the Cufflinks web site explains the attributes that Cuffdiff is looking for, links to the iGenomes datasets available (best to use if your genome is represented), and a pointer to the tool's user group. Two iGenomes GTF files are also already available in Galaxy (hg19, mm9) in "Shared Data -> Data Libraries -> iGenomes". The link to our tutorial and FAQ has help about how the GTF files are used along with troubleshooting advice. Best, Jen Galaxy team -- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
ADD COMMENTlink written 5.6 years ago by Jennifer Hillman Jackson25k
0
gravatar for Jennifer Hillman Jackson
5.6 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hi Yona Kim, Are you using the iGenomes version of the GTF file? With the attributes Cuffdiff requires for generating all of the additional statistics? It appears that this is the case, but I just wanted to double check. If not using it, you can find a copy to load and use on the public server (if this is where you are working) in Shared Data -> Data Libraries -> iGenomes. Otherwise, it can be found at the Cufflinks web site. These are the two attributes that are important to have, when available: http://cufflinks.cbcb.umd.edu/manual.html#cuffdiff_input The results you are getting indicate the data coverage is sparse, which aligns with your thoughts about this mapping not being as successful as prior runs: NOTEST and LOWDATA are explained here with advice about parameter tuning: http://wiki.galaxyproject.org/Support#Tools_on_the_Main_server follow links to Cufflinks FAQ to find: http://cufflinks.cbcb.umd.edu/faq.html#notest This could be the source of the problem. Making sure that the data was groomed correctly would be a good place the start. The comments from the first run will note the detected input type (but there can be some overlap), so also use the tool "FastQC" to help determine the proper settings for "FASTQ Groomer". And if necessary, re-run from this step to see if that improves the mapping. http://wiki.galaxyproject.org/Support#Dataset_special_cases See the second bullet under "FASTQ" If your query data is short (less than around 40 bases), then tuning Tophat could also improve mapping, see the tool's web page for advice regarding mapping shorter sequences. Then test out a few different parameter options to see what produces the best results for your particular datasets/samples. There is a balance between being too sensitive and too stringent - and this is a judgement call in most cases. Trimming the reads may help if quality is an issue ("FastQC" will also give information about this). The RNA-seq example tutorial has an example of how to do basic QC: https://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise Hopefully this helps to give some new options to test out that improve the result! Jen Galaxy team -- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
ADD COMMENTlink written 5.6 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 166 users visited in the last hour