Question: Stitch maf Blocks Returns All Gaps
0
gravatar for smrupp16
2.8 years ago by
smrupp1610
United States
smrupp1610 wrote:

I am trying to use stitch gene blocks on a pairwise alignments I generated using progressivecactus, but no matter what I try, it only returns sequences composes of only gaps. I am using a bed file from UCSC, but since my reference genome was from Ensembl, I removed the "chr" and "chrUN_" prefixes from the chromosome/scaffold positions.

Here a few lines from the maf file:

##maf version=1 scoring=N/A
# hal (cobra:1,anole:1)Anc0;

a
s	anole.1	0	1000	+	263920458	GTGTATTCGAATGATATAAACAATAGAAATAAGCAGTAGAAAACATTTGATATaggacgaagttcacaacatctggggaatccaatatagaagtggggttcaagcagcatgatttcacaattcatgaagttagcccaacgatttgaaattcatacaacagcgagtcatgagtgtgtccaaatacatagaatatgaaaattcacaaatacatgaggagaaagagttcataaggaattcatagagatggcatggggaaggggcacatatgggttagtaagtctttggaggtataggatttcataagttccaggggtgggtgtggggaagagtgttcttctctttcataaaatcacaaaagtatgcatgaaacgtggagtgcacccgtctgtcaccccctggcagctgtagagggtgagggtccttggagaactatgtctcccccgcgagagagcgatccccccatccccaattcacagaaaggggctagggaaagccattccgcgagtcgcggggagaggaaaacccgggataaggcagctgcttggcttgaaaggagctgtcggtcccgttttgacagtcagctgatgaggtgccgaaaactggaccgattccggttctgctggttgtcttcgagtcaggagccttagaaagggttaagactaaaagaggagcgttctaggggtaattttgatagagatatgagccaatttatatcgaccgccatgttaatctatggcagggaatctcagccggagttaaaaggacgggagggagagagagcgattgcatgcgaaggaaggagtcagggaaagatacaaaataaccagatagagagtgttattgttaaaataaatgctacttttattggggggatttgttacatagttcagggaagaggaaaatgaaggaaaaaaggtcttttgataatagtctgttgcggtcccgctccgttcttttggagagtaatctatggaactctctgctgcgttttcaaatcaat

a
s	anole.1	1000	154	+	263920458	ttgaaagcgggggcgacggttatcatnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnctgtgtcaccttaaaggggtataaattTGTCATAGGCTTTTGCGGACTTCAGCACAGTTTTTCAGAAGTACAG

a
s	anole.1	1154	1000	+	263920458	AAAAGTAAGAAATGACTGGAGAAAGCTGCATATTTCTCAATCCAGCAAGTCTGCACTGgactactttgactgagaaattacttctcccattttgatatattctacactttggcccagatccttgttttagtctcctgtttttaacattttatgctgtatgttgatttttatgatggttttattgatattgatgttttactgttggaataattgttttatcgttttattgctgtatgtttcgggctcagtccccatgtaagccactccgagtccccactggggagatgggccggggtataaaaataaagnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnaAAACCAAACTCAACTAATGCTTTAGTCCATTTTCATTCTTTTATGGATTTGCCTGAGGGATACCTATTTCCTTATGATTTGGCTTTCAAAACCAAAAGTCAATACATAAGTCGTGATGGCCTGGGTTCAGCCTTAACTTTAACTTGAAACGAGCCCTCCATGGAGTTGAGCAGGAAAGGGGATATATAACAttcagagagagggtcaatgtcttggatagttcctttaaagttcaggctaaaacacagagtgtattcattgatccactgacatggaagacattggatgtatcAGAGTGGCTTTGCTAGGcacacgcagcagaagtcgctgactaggaagctctggataaataacaacgctgcagtcttcagttatctatcaaaagtttacttacgaacggaattctacaagacgatacacagctctcacgcaggcacacatgggaaggggacagagataggaAGTagaggctatttatctcatcctctgctgtctgatgcaatccaagtttaaccctttacacactggcatagtaagcagcacacagtgtatgatgcaatctccagcacacattgcacaaccatgactcaattacatttaaacaactctatatacaatcattctcacttc
s	Anc0.Anc0refChr2104	0	50	+	8475	AGAAGTAAGAAAAGACT----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------TTTCAGAGCCAAAAGCAAATACATAAGTCGTGA----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
s	cobra.AZIM01003421.1	80817	50	-	85889	AGAAGTAAGAACAGACT----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------TTTCAGAGCCAAGAGCAAATACCAAAGTCATGA----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 

I presume that this problem is occurring because the maf file does not contain any scores (one of my coworkers is having the same problem and her file is also missing scores). So, does anyone know how to get scores from cactus, or is there any other tool that I might be able to use?

 

Thank you for any help.

ADD COMMENTlink modified 2.8 years ago by Jennifer Hillman Jackson25k • written 2.8 years ago by smrupp1610
0
gravatar for Jennifer Hillman Jackson
2.8 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

This is basically the same reply as for your co-worker, who posted here: Fetch alignments in galaxy (using my own MAF file)

The MAF format appears to be within specification. (See examples here: http://genome.ucsc.edu/FAQ/FAQformat.html#format5)

Note that scores are an optional field. But testing that with the tool and actual inputs would be needed to confirm that is true with this usage.

Overall, this also sounds like a data mismatch problem. If you want to share a history from http://usegalaxy.org that replicates the issue, we can take a look, since there could be more going on and reviewing a snippet of the input bed/interval file will probably not help enough. Using difference reference genomes and/or making adjustments to try to have these work together is tricky. The identifiers could be a problem as could the version of the reference genome (if the bed/interval dataset and MAF are not from the same genome version, then empty data could result). 

Generate the history share link, make sure all datasets involved (input and output) are undeleted, a link to the public source of the base reference genome used to create the MAF, a link to what you believe is the UCSC version of that base genome (the same "dbkey" database name used to extract from the Table browser - for example, in human this could be hg19 or hg38 or .. et cetera), and email the all the linked data along with a link to this post to: galaxy-bugs@lists.galaxyproject.org.

https://wiki.galaxyproject.org/Learn/Share

Thanks, Jen, Galaxy team

 

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by Jennifer Hillman Jackson25k

Here's a link to my history. It's number 98.

 

https://usegalaxy.org/u/shawnrupp/h/anolesubstitutions

ADD REPLYlink written 2.8 years ago by smrupp1610

I still need to know the data sources. One of you sent in an email to the galaxy-bugs address. I'll be looking at that, but this is much more difficult without knowing the data sources and could be inconclusive. 

Jen

ADD REPLYlink written 2.8 years ago by Jennifer Hillman Jackson25k

The alignment is from progressiveCactus run on default settings. The bed file is from UCSC, but I removed  the "chr" and "chrUN_" prefixes since my reference genome is from Ensembl.

ADD REPLYlink written 2.8 years ago by smrupp1610

Please see this reply: C: Fetch alignments in galaxy (using my own MAF file)

Check to see if your regions exist in the MAF. I also strongly suggest using the original reference genome as a custom genome/build (not a built-in index on the server). The chromosome identifiers much be the same between all three inputs: The reference genome, the MAF, and the BED/Interval. 

The built-in reference genome used (anoCar2) will have chromsome identifiers in UCSC format. That will be a mismatch for other other data. Use a Custom Build. (See other post for many more details like this).

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 171 users visited in the last hour