Question: Extract Alignment For A Set Of Genes
gravatar for Vincent Joseph Lynch
7.5 years ago by
Vincent Joseph Lynch40 wrote:
To Whom It May Concern, Sorry to bother you with what is likely a fairly simple problem, but I have trying to figure this out myself for several days and just can't figure out how to do it. I have a set of 8766 genes that I would like to test for positive selection in using various other programs (HyPhy for example). To do this I obviously need an alignment of these genes across various species, but I just can't figure out how to get the alignment in a fasta format. For example, I have a BED12 file from UCSC with the data for the 8766 genes, I thought the easiest way was to use the "Stitch Gene blocks" option and then select locally cached alignments as the MAF source for the species I care about. However, because these 8766 genes have multiple transcripts I end up with 23,581 regions. Is there a way to merge the multiple regions for each gene into a single region for the longest transcript? Then I should have 8766 regions and can use Stitch Gene blocks". (Unless there is a more economical way to do this.)\ Thanks Vinny Vincent J. Lynch, Associate Research Scientist Department of Ecology and Evolutionary Biology & Yale Systems Biology Institute Yale University "There is a grandeur in this view of life, with its several powers, having been originally breathed into a few forms or into one; and that whilst this planet has gone on cycling according to the fixed laws of gravity, from so simple a beginning endless forms most beautiful and most wonderful have been, and are being, evolved." -C. Darwin, 1859 (Walker, Wisconsin, Madison, Maddow, Tea Party, Obama, global warming)
ADD COMMENTlink modified 7.5 years ago by Jennifer Hillman Jackson25k • written 7.5 years ago by Vincent Joseph Lynch40
gravatar for Jennifer Hillman Jackson
7.5 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hi Vinny, One option is to filter for a single representative transcript in your BED file from UCSC as a first step or to use that sort of list as a filter for your final result (if the data is still labeled by transcriptIDs). If using the "UCSC Genes" track, the table is called "knownCanonical". Another option is to consider the tools in "Operate on Genomic Intervals" and to if any meet your criteria. Merge or Cluster may be what you want. Note: this can result in gene models that are not represented by a single transcript in the primary query species. If you have more questions, please let us know, and kindly keep the cc to galaxy-user so that the Galaxy team and community can offer input, Best, Jen Galaxy team -- Jennifer Jackson
ADD COMMENTlink written 7.5 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour