Question: Finding Constitutive Exons Using Expression Data
0
gravatar for 7plusorminus 3
4.8 years ago by
7plusorminus 320 wrote:
Hi, I'm trying to find over the entire human genome, for each gene, which exons are the most constitutively expressed. To do this, I'd like to combine expression data (RNA-seq or Microarray) and exons data (UCSC track). Then, for each gene, I'd like to pick the 1 or 2 exons with the highest levels of expression (my proxy for constitutiveness). An additional nicety would be to somehow work in a preference for 5' exons. For example, let's say a gene has 3 exons and, with the expression data, all 3 exons are equally expressed. I'd like to selectively get the first 2 exons. I've started learning Galaxy and was able to import BED files for UCSC exons (as in the Galaxy 101 tutorial) and a BED file for Affy microarray expression data. (I tried also importing the Burge RNA-seq track as BED but couldn't get it to work). I did an inner join on genomic sequences to join the expression data with the exons and sorted them from most expressed to least. But how do I sort within genes? That is, how do I get the top 2 exons per gene (highest expressing exons per gene) and, if there are more than 2 with equally high expression, how do I preferentially get the 5` exons? I'm also open to ways to do this without using Galaxy, etc. I want to do this for an entire genome, so I figured it would be good to have a Galaxy workflow, which I could then apply to other genomes as needed. Thanks for any help
galaxy • 1.1k views
ADD COMMENTlink modified 4.8 years ago by Sébastien Vigneau50 • written 4.8 years ago by 7plusorminus 320
0
gravatar for Sébastien Vigneau
4.8 years ago by
Sébastien Vigneau50 wrote:
Hi 7plusorminus 3, One possibility is to use the "group" tool with "max" operation, to get the highest expressed exon for each gene. Then, you may use "subtract datasets" to remove the highest expressed exons from the original dataset, and iterate to get the second highest expressed exons (which are now the highest expressed exons). "Group" may also help you getting the exons with more proximal or distal start position (whether it is 5' or 3' depends on the orientation of the gene). Alternatively, if you know how to use R, you can use the function "by" (here is a good explanation: http://nsaunders.wordpress.com/2010/08/20/a-brief-introduction-to- apply-in-r/ ). Sébastien Message: 1 Date: Sun, 9 Feb 2014 16:43:14 -0500 To: galaxy-user@lists.bx.psu.edu Subject: [galaxy-user] Finding constitutive exons using expression data Message-ID: <calffdirxk1lyy6t+jhnnrcfogiqcaenubbz3j_movcmv6brusg@mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" Hi, I'm trying to find over the entire human genome, for each gene, which exons are the most constitutively expressed. To do this, I'd like to combine expression data (RNA-seq or Microarray) and exons data (UCSC track). Then, for each gene, I'd like to pick the 1 or 2 exons with the highest levels of expression (my proxy for constitutiveness). An additional nicety would be to somehow work in a preference for 5' exons. For example, let's say a gene has 3 exons and, with the expression data, all 3 exons are equally expressed. I'd like to selectively get the first 2 exons. I've started learning Galaxy and was able to import BED files for UCSC exons (as in the Galaxy 101 tutorial) and a BED file for Affy microarray expression data. (I tried also importing the Burge RNA-seq track as BED but couldn't get it to work). I did an inner join on genomic sequences to join the expression data with the exons and sorted them from most expressed to least. But how do I sort within genes? That is, how do I get the top 2 exons per gene (highest expressing exons per gene) and, if there are more than 2 with equally high expression, how do I preferentially get the 5` exons? I'm also open to ways to do this without using Galaxy, etc. I want to do this for an entire genome, so I figured it would be good to have a Galaxy workflow, which I could then apply to other genomes as needed. Thanks for any help
ADD COMMENTlink written 4.8 years ago by Sébastien Vigneau50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour