Mutliple UCSC gene names in RStudio Data Table

Heads up! This is a static archive of our support site. Please go to help.galaxyproject.org if you want to reach the Galaxy community. If you want to search this archive visit the Galaxy Hub search

Latest

Open

RNA-Seq

ChIP-Seq

SNP

Assembly

Forum

Home

Welcome to Galaxy Biostar! User support for Galaxy! about • faq • rss

Log In

Sign Up

Question: Mutliple UCSC gene names in RStudio Data Table

0

3.4 years ago by

nashedm • 10

United States

nashedm • 10 wrote:

Hi there,

I am using the cummeRbund package in R Studio on my personal computer to analyze differentially expressed genes using cuffdiff output files from Galaxy.

When extracting the list of genes that are differentially expressed, I initially get only the XLOC names. To get the UCSC names, I do the following:

SigGenesData_ContCORT <- getSig(cuff_data, level = "genes",'Control','CORT', alpha = 0.05)

DiffGenes_ContCORT <- getGenes(cuff_data, SigGenesData_ContCORT)

GeneIDs_ContCORT <- featureNames(DiffGenes_ContCORT)

But I still want the actual gene symbols, not just the UCSC names so my work-around this is to download a file from the UCSC site that lists gene symbols by known UCSC ID. I then import this list into R Studio and I can merge it with my GeneIDs_ContCORT data table using the UCSC Id column, which would be common to both tables.

This works fine with the except of one problem. Some genes have multiple UCSC ID's so in my GeneIDs_ContCORT table, a good number of genes have several names separated by a comma. For example:

tracking_id gene_short_name

1 XLOC_000525 uc007csi.1,uc007csj.1,uc007csk.1

So when I merge the tables, R doesn't match this to a UCSC from the downloaded list and just gives me "NA" because it's reading those 3 names as one name and can't find a match.

Is there a way I can instruct R to remove all but the first value in the gene_short_name column so that it can be properly matched? I.e. desired output:

tracking_id gene_short_name

1 XLOC_000525 uc007csi.1

Alternatively, is there a way to merge the tables such that the multiple UCSC names in one cell are all read separately and matched in my merged table?

Any help would be appreciated.

Thanks

cummerbund ucsc gene name xloc r studio • 987 views

ADD COMMENT • link •

modified 3.3 years ago by Jennifer Hillman Jackson ♦ 25k • written 3.4 years ago by nashedm • 10

0

3.3 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

It might be easiest to back up and use a reference GTF/GFF3 file that contains the attribute "gene_name" with Cuffmerge and then Cuffdiff. One source is iGenomes. This will avoid downstream complications.

Data files can be manipulated in many ways on the command line, or within Galaxy, or with RStudio, but the above is the most direct approach.

Best, Jen, Galaxy team

ADD COMMENT • link written 3.3 years ago by Jennifer Hillman Jackson ♦ 25k

Please log in to add an answer.

Similar posts • Search »

Batch conversion of ID to gene symbol
I'm an old school molecular biologist who studies gene expression but is quite new to bio-computi...
gene Symbol from cuffdiff output
Hi. I received output from cuffdiff which included gene_id, gene and locus for each differential...
Text Editing
Hello Luce, I can explain the use of the tools "Text Manipulation". For each file independently,...
Fastq Collapse?
Hello Galaxy users, Just to follow-up on my user group question described in the list-serv e-mai...
Get Gene Name From Cuffdiff'S Output?
Hi guys, I am trying to examine gene differential expression in my mouse samples using : Cufflink...
Cufflinks Assigning separate Cuff I.D. to gene that falls within location of known gene
Hi, I am following the workflow for differential expression analysis using cuffdiff on galaxy as...
Rna-Seq Analysis
Since the pictures are too big for the mailling list. I will upload it in seperate emails. Dear ...
Identifying Genes
I am very new to Galaxy. We have performed a comparative analysis between the transcriptomes of d...
Change Tracking Id to Gene name
Hello, I'm not sure if this is something that can be done in galaxy, but if anyone knows how to ...
Tarcking ID in cuffnorm output and TestID (geneID) in cuffdiff output, are they the same?
Dear all, I found in the output of cuffnorm there was no locus information corresponding to my d...
Compare two datasets issue
Hi, I'm trying to use the compare datasets tool and can't get it to work. My first file was upl...
salmon gene quant to DESeq2
Hi again - I successfully ran salmon on my fastq files, including gene-level summary via a simpl...
Gene Names From Cuffdiff Data
How does one get gene names when using cuffdiff when looking at "gene differential expression tes...
Use multiple inputs in same script
Hi all, I have a batch of 170 samples (represented by 170 comma-separated files) and am attempti...
Htseq-count: How to add id attribute and additional attributes in Galaxy (RNA-seq)
Hello, I am working with a bacterial genome. I'm using DESeq2 for differential gene expression an...
error when upload tool in tool shed.
Hi all, I integrate the SoftSearch tool in local galaxy successfully. Then I upload my tool i...
How to obtain up/down regulation column for DEG edgeR results
Hi All, I've gone through and done all my contrasts for a rather large DE experiment using edgeR...
How to use Deseq2 to merge the biological triplicates (for two seperate conditions) for analysing differential expression of genes
I am very new to R, I have used Deseq2 package after my feature counts. I have 2 stages of contro...

Content

Help

About
FAQ

Access

RSS
Stats
API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by Biostar version 16.09

Traffic: 171 users visited in the last hour