Question: Replicate Common Gene Names
0
gravatar for kerrigab
2.7 years ago by
kerrigab0
kerrigab0 wrote:

This is somewhat Galaxy related, but more of a general question.

I did all of my mapping and initial analysis for scRNA-Seq in Galaxy using Tophat and Cufflinks and at this point, I've downloaded the data and generated an expression matrix.

However, I found that a lot of the Reference Annotation values from my .GTF file have the same common gene name (i.e. there are multiple rows in my matrix with the same common gene name). What is an appropriate way of handling this if my main interest is to look at gene expression for clustering? Should I simply sum all the rows with the same common gene name? Is there a more appropriate transformation to keep my data accurate?

If more information is needed, my samples are human, I used hg38, and the UCSC gene names for my reference.

rna-seq • 812 views
ADD COMMENTlink modified 2.7 years ago by Jennifer Hillman Jackson25k • written 2.7 years ago by kerrigab0
0
gravatar for Jennifer Hillman Jackson
2.7 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

The different rows with the same gene name annotation represent individual transcripts associated with that gene.

Summing the expression values directly from Cufflinks will create bias in the data. It is better to choose one transcript to represent each gene. Or you can use Cuffdiff. This tool will output per-gene expression data that is summed appropriately. http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/index.html#fpkm-tracking-files

Please note that GTF reference annotation from the UCSC Table Browser does not contain the all of the attributes these tools use to perform calculations (and assigned annotation). Using the reference data from iGenomes or another source that includes these values is recommended. http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/index.html#cuffdiff-input-files https://support.illumina.com/sequencing/sequencing_software/igenome.html

To use iGenomes annotation: Download the hg38 tar file locally, unpack it, locate the genes.gtf file, then upload it into Galaxy for use with these tools.

Best, Jen, Galaxy team

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 171 users visited in the last hour