Question: Constitutive Exon Workflow?
0
7plusorminus 3 • 20 wrote:
Hi folks, I'm trying to find over the entire human genome, for each
gene,
which exons are the most constitutively expressed. To do this, I'd
like to
combine expression data (RNA-seq or Microarray) and exons data (UCSC
track). Then, for each gene, I'd like to pick the 1 or 2 exons with
the
highest levels of expression (my proxy for constitutiveness).
An additional nicety would be to somehow work in a preference for 5'
exons.
For example, let's say a gene has 3 exons and, with the expression
data,
all 3 exons are equally expressed. I'd like to selectively get the
first 2
exons.
I've started learning Galaxy and was able to import BED files for UCSC
exons (as in the Galaxy 101 tutorial) and a BED file for Affy
microarray
expression data. (I tried also importing the Burge RNA-seq track as
BED but
couldn't get it to work). I did an inner join on genomic sequences to
join
the expression data with the exons and sorted them from most expressed
to
least. But how do I sort within genes? That is, how do I get the top 2
exons per gene (highest expressing exons per gene) and, if there are
more
than 2 with equally high expression, how do I preferentially get the
5`
exons?
I'm also open to ways to do this without using Galaxy, etc. I want to
do
this for an entire genome, so I figured it would be good to have a
Galaxy
workflow, which I could then apply to other genomes as needed.
Thanks for any help. jim