Question: same gene with multiple expression from cuffdiff
0
gravatar for shuai.gao
15 months ago by
shuai.gao0
shuai.gao0 wrote:

Hello,

I ran tophat-cufflink-cuffdiff of my RNA-seq, and got the cuffdiff "gene differential expression" file. I noticed that for some genes, they appear multiple times, and each time there will be different value in my siNTC vs. siGene, sometime higher in siNTC, sometime higher the other. For each gene, isn't it supposed to only show once with values in each experimental condition? And I have been trying to validate some targets from the analysis, but had no luck. The targets I picked have over 5 times expression difference according to the analysis, but my RT-PCR just show very little effect, using the same RNA that was run for Seq. For the gene that I K/D using siRNA, it shows about 95% knockdown efficiency, and about 90% from the RT-PCR, so that is normal. I am really confused with the analysis, I also tried the HISAT2, but same thing. Could anyone help me on this? Thanks a lot!!

Shuai

rna-seq tophat cufflinks galaxy • 587 views
ADD COMMENTlink modified 15 months ago • written 15 months ago by shuai.gao0
0
gravatar for Jennifer Hillman Jackson
15 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

Are you certain that the dataset with duplicated gene_id values is the gene differential expression output and not another of the output datasets? This isn't possible as far as I know. http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/#differential-expression-tests

The first two columns are unique in the gene differential expression output (test_id, gene_id). The third column "gene" may be a string of gene_name values sourced from the reference annotation (original "known" GTF annotation alone or the output of Cuffmerge). It is possible for an individual gene_name to be associated with more than one gene_id/locus. Maybe that the duplication you are seeing?

For the other differential expression outputs, gene_id/gene may be associated with more than one feature (transcript, tss, cds) and therefore in one or more lines of output.

If this doesn't address the question clearly enough, please share a few lines from your output dataset (quote the lines to preserve formatting), describe where you think the duplication is, and include the original name of the dataset as output by Cuffdiff (to confirm the right file is being examined). We can try to troubleshoot more from there.

RNA-seq tutorials: https://galaxyproject.org/learn/

Thanks, Jen, Galaxy team

ADD COMMENTlink written 15 months ago by Jennifer Hillman Jackson25k
0
gravatar for shuai.gao
15 months ago by
shuai.gao0
shuai.gao0 wrote:

Hello Jen,

Thanks a lot for responding to my post.

The dataset I downloaded is “Galaxy79-[Cuffdiff_on_data_28,_data_27,_and_others__gene_differential_expression_testing].tabular” file. As you mentioned, my case is “an individual gene_names associated with more than one gene_id/locus”. Please see a screen shot. There are many genes like this, making it hard to validate the results from qPCR. pten example And also, for many genes that show a decent change (4~6 fold) in cuffdiff, then I looked the accepted hits from tophat/assembled transcript from cufflink, there isn’t much change. So something is at cuffdiff step. That being said, the gene that I knocked down in this experiment (gene B3) shows consistent ~90% decrease (from cuffdiff to PCR).

Thank you very much! Shuai

ADD COMMENTlink written 15 months ago by shuai.gao0

The annotation for gene_name is sourced from the input reference annotation given to the tool. Maybe you need a different annotation source?

Note: if the attributes p_id and tss_id are not in the reference annotation, or if the gene_id and transcript_id are the same value, this can result in much more confusion in the results due to the resulting under-clustering of transcripts into genes. http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/#cuffdiff-input-files

I cannot see your data, but you can check for this information yourself in the annotation. If you are looking for a good annotation source, iGenomes is the first choice (if your genome is available) as the format was worked out with the tool authors. Use Cuffmerge in your pipeline to consolidate into genes/transcripts the knowns (sourced from iGenomes or alternative) with the experimental (from Cufflinks), if doing discovery and not just focusing on knowns.

iGenomes data: https://support.illumina.com/sequencing/sequencing_software/igenome.html Download the tar file locally, uncompress it, then upload the genes.gtf dataset to Galaxy as the reference annotation.

ADD REPLYlink modified 15 months ago • written 15 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 173 users visited in the last hour