Question: Problems Accessing The Sequences Through Genome Browser
0
gravatar for pande
8.1 years ago by
pande110
pande110 wrote:
Dear Galaxy, I have some genomic intervals for the human genome and want to extract the same for the mouse.So, I used the MAF pairwise alignment tool. But, interestingly, the associated sequences are not to be found at all through the genome browser, for file that galaxy generated for me. Here are a few first alignments from the sample: ##maf version=1 a score=5002 s hg18.chr1 1820410 11 + 247249719 GGATCCAG-------ATG s mm9.chr4 769022 18 - 155630120 GAAACAAGGTGTTCCATG a score=20688 s hg18.chr1 2077163 10 + 247249719 TTTTCTTTTC s mm9.chr4 969004 10 - 155630120 CTTTCTTACC a score=15289 s hg18.chr1 2316453 90 + 247249719 CTCAGGTGAATTCCTCATGGCATCACAGCAGTGTTGAAA-----TAGGAGCAGATACG-TTACCTCCGC --TTGCCAGATAAGAAACTGGGACGCAGA s mm9.chr4 1172933 98 - 155630120 TTACAGTGAATTCTGCCTGGGATCCGTGCAGCATTGGAAATGGCTAGGGGCAGATAGGGTCACCTTCACA GTTGCTAGATAAGAAACAGGGTCGCGGA a score=20716 s hg18.chr1 2526225 31 + 247249719 CTTCCT-CTGGGCTTGGTCATCCTTCAAAGTC s mm9.chr4 1369046 32 - 155630120 CTTCCTCCTGGTCCCAACCATCTGTCAGATCC I just want the sequences from the mouse genome and extended up to 120 base pairs from the co-ordinate mentioned in the maf file. It only generates NNNNNNNNNNNNN....I was wondering how could Galaxy retrieve it, while I can't see the same in the genome browser. Kindly help. Amit.
galaxy • 926 views
ADD COMMENTlink modified 8.1 years ago by Jennifer Hillman Jackson25k • written 8.1 years ago by pande110
0
gravatar for Jennifer Hillman Jackson
8.1 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hi Pande, Perhaps there is a problem interpreting the MAF format? MAF is different than BED or Interval in that it only has a start coordinate. it is also 1-based, which may or may not be problem since it is not clear how you are viewing or extracting sequences. Here is the MAF FAQ from UCSC: http://genome.ucsc.edu/FAQ/FAQformat.html#format5 To work with MAF data in Galaxy, use the tools in Convert Formats. Another option is to use the LiftOver tool. It is based on the same source data from UCSC and is a more direct method to convert coordinates between genomes. To do this: 1) load the human coordinates 2) liftOver human -> mouse (minmatch can be lowered to 0.10 for cross-species lifts, but if you find that you are getting to many "multiple matches" raise this back up. Sometimes being strict, then less strict through a progressive cycles with failed regions will yield the best overall results). 3) use Text Manipulation: Compute an expression on every row to expand the mouse intervals, if you want to expand the ranges 4) use Fetch Sequences if you want the fasta mouse genome sequence for the intervals Please let us know if you continue to have problems. Sharing a history with the problem datasets/operations would be a great way to explain the issue. Thanks! Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org
ADD COMMENTlink written 8.1 years ago by Jennifer Hillman Jackson25k
Correction: MAF format has a 0-based start (not 1-based) Very sorry to be confusing, Jen which may or may not be problem since it is not clear how you -- Jennifer Jackson http://usegalaxy.org
ADD REPLYlink written 8.1 years ago by Jennifer Hillman Jackson25k
It's possible that the problem is due to the fact that all the alignments are on the reverse strand of mouse. MAF counts reverse strand alignments in the opposite direction. This is called "counting along the reverse strand" in lastz's description here: http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README .lastz-1.02.00a.html#adv_coords Bob H
ADD REPLYlink written 8.1 years ago by Bob Harris190
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 167 users visited in the last hour