Replicate Common Gene Names

Heads up! This is a static archive of our support site. Please go to help.galaxyproject.org if you want to reach the Galaxy community. If you want to search this archive visit the Galaxy Hub search

Latest

Open

RNA-Seq

ChIP-Seq

SNP

Assembly

Forum

Home

Welcome to Galaxy Biostar! User support for Galaxy! about • faq • rss

Log In

Sign Up

Question: Replicate Common Gene Names

0

2.7 years ago by

kerrigab • 0

kerrigab • 0 wrote:

This is somewhat Galaxy related, but more of a general question.

I did all of my mapping and initial analysis for scRNA-Seq in Galaxy using Tophat and Cufflinks and at this point, I've downloaded the data and generated an expression matrix.

However, I found that a lot of the Reference Annotation values from my .GTF file have the same common gene name (i.e. there are multiple rows in my matrix with the same common gene name). What is an appropriate way of handling this if my main interest is to look at gene expression for clustering? Should I simply sum all the rows with the same common gene name? Is there a more appropriate transformation to keep my data accurate?

If more information is needed, my samples are human, I used hg38, and the UCSC gene names for my reference.

rna-seq • 812 views

ADD COMMENT • link •

modified 2.7 years ago by Jennifer Hillman Jackson ♦ 25k • written 2.7 years ago by kerrigab • 0

0

2.7 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

The different rows with the same gene name annotation represent individual transcripts associated with that gene.

Summing the expression values directly from Cufflinks will create bias in the data. It is better to choose one transcript to represent each gene. Or you can use Cuffdiff. This tool will output per-gene expression data that is summed appropriately. http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/index.html#fpkm-tracking-files

Please note that GTF reference annotation from the UCSC Table Browser does not contain the all of the attributes these tools use to perform calculations (and assigned annotation). Using the reference data from iGenomes or another source that includes these values is recommended. http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/index.html#cuffdiff-input-files https://support.illumina.com/sequencing/sequencing_software/igenome.html

To use iGenomes annotation: Download the hg38 tar file locally, unpack it, locate the genes.gtf file, then upload it into Galaxy for use with these tools.

Best, Jen, Galaxy team

ADD COMMENT • link modified 2.7 years ago • written 2.7 years ago by Jennifer Hillman Jackson ♦ 25k

Please log in to add an answer.

Similar posts • Search »

Critical Feedback
This student was more adventurous. I think he actually could do more of what he tried with more e...
Gene Names From Cuffdiff Data
How does one get gene names when using cuffdiff when looking at "gene differential expression tes...
How to analyse the gene expression matrix which I downloaded from GEO by Galaxy?
I downloaded the gene expression matrix from the GEO, it is excel format. In this file, there are...
Gene Ontology Help
To Whom it May Concern: My name is Nicole McDaniels. I am a Ph.D. candidate at Syracuse Universi...
Problem With Cuffdiff
Hy guys, I am performing some RNA sequencing. I am using a gtf file as a reference annotation for...
Processing Multiple FastQ files to obtain an expression matrix
Hello, As a learning experience I am playing around with RNA Seq fastQ data. My goal is to s...
Rna-Seq Analysis
Dear all, I am using Galaxy for RNA-Seq analysis. I expect two lists: differentially expressed tr...
Use multiple inputs in same script
Hi all, I have a batch of 170 samples (represented by 170 comma-separated files) and am attempti...
Mutliple UCSC gene names in RStudio Data Table
Hi there, I am using the cummeRbund package in R Studio on my personal computer to analyze diffe...
Rna-Seq Analysis
Since the pictures are too big for the mailling list. I will upload it in seperate emails. Dear ...
Deeptools compute matrix
I love Deeptools, and so glad that galaxy has moved them to the main server. It's awesome! I can...
I am slightly confused about the input requirements for the differential expression tool - last step of Trinity
Hello all, I have a pressing question... To start with, I have read the Trinity methods paper ...
Batch conversion of ID to gene symbol
I'm an old school molecular biologist who studies gene expression but is quite new to bio-computi...
Gene And Transcript Names From Cuffdiff
Hi all, I am learning how to use Galaxy to analyze my RNA-Seq data. After running cuffdiff, one ...
Change Tracking Id to Gene name
Hello, I'm not sure if this is something that can be done in galaxy, but if anyone knows how to ...
To merge several count data into one matrix.
Hello all. I would like to merge or combine my count data into one matrix for using Degust prog...
DE analysis of miRNA after mapping with miRDeep2 quantifier
After small RNA sequencing I performed adapter, quality, and length trimming on fastq files. I th...
Questions On Cuffdiff Output And Browser Visualization
I would really appreciate someone's input on some issues I am having with my cuffdiff output for ...
comparisson of RNA-seq FPKMs across samples
Dear BioStars community, I am trying to "recycle" some RNA-seq data published as supplementary m...
stringtie output from multiple samples into an FPKM matrix
Hi all, Anyone have a good set of steps to take the stringtie transcript-level expression output...

Content

Help

About
FAQ

Access

RSS
Stats
API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by Biostar version 16.09

Traffic: 171 users visited in the last hour