Question: Isochore As Genome Coordinates ?
5.5 years ago by
Abdullah Al Mahmud20 wrote:
Hi, In my account I have uploaded a file name iso_mm10.bed. The bed files contains coordinates of 6018 isochores of mouse genome mm10. I want to extract GC% of each scores with the list of genes present in each isochores. I tried using extract features, geecee, and many other tools from galaxy. But every time either it said error or no peak. I will be grateful to you if you kindly give me an idea about how to solve this problem. Abdullah -- Abdullah Al Mahmud, PhD Postdoctoral fellow, University of Montreal, Lab. of Dr. Jacques Michaud CHU Sainte-Justine Research Center, Montreal, Quebec, Canada.
written 5.5 years ago by Abdullah Al Mahmud20
5.5 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hello Abdullah, The tool geecee will take fasta sequence as input. I am not sure if you just have the bed coordinates of the regions of interest or already have the coordinates of genes contained within these regions yet. If you need the genes, then one choice is to extract a track from the UCSC table browser to obtain transcripts in bed12 format with the tool "Get Data: UCSC Main". Tracks in the group "Genes and Gene Predictions" are most likely what you will want. You can read about the choices at UCSC, but common selections include UCSC Genes, Refseq Genes, etc. You can get them all, them use tools in the group "Operate on Genomic Intervals" to limit the group to just those that fit within the isochore coordinate bounds. For a list of associated gene identifiers, related tables to most gene tracks at UCSC contain that sort of information. Do a separate extract operation to obtain a file that contains the gene and transcript identifiers, then join the data together with the transcript you obtain after performing the above filtering, to link in the gene name. Once you have the transcript coordinates, fasta sequence can be obtained in two ways. If you want to do the GC counts off of the mRNA, use the transcript identifiers in the UCSC Table browser again, choose sequence output (not bed), and this time extract "mRNA" when prompted (not genomic). If genomic sequence is fine, the tool "Fetch Sequences -> Extract Genomic DNA" can be used. Then use the fasta sequeces as input to the "geecee" tool - the problems you were having were most likely with giving the tool the wrong type of input. This is a lot of steps, and how you decided to organzize the data before running geecee will affect how the summary stats are calculated. Really, any stretch of nucleotide fasta sequence can be used for input (I do not know of an upper length bound, but there probably is one, so just watch for that - if an error comes up, work with smaller regions). You could also just convert the fasta sequence to tabular, and add up the total bases, count Gs, count Cs, etc. then perform a calculation on your own. See also "Regional Variation -> Feature coverage", "Graph/Display Data", and "BEDTools*"*, each may be helpful, for different reasons. There are several tutorials that do many of these same basic operations as part of the analysis or tool demos. Reviewing them will help you to know how to structure inputs, use particular tools, etc, if you would like the guidance. Under "Shared Pages": pls see Galaxy 101 and Using Galaxy 2012 for the introduction tutorials. Best, Jen Galaxy project -- Jennifer Hillman-Jackson Galaxy Support and Training
written 5.5 years ago by Jennifer Hillman Jackson25k
