Question: Fetch alignments in galaxy (using my own MAF file)
0
gravatar for pooja.narang
2.8 years ago by
United States
pooja.narang0 wrote:
I have a MAF file which I am using to fetch alignments in Galaxy (please see a few lines from MAF file below).

I have tried to use both Extract MAF block feature (which generates an emplty file) and Stitch MAF blocks feature (which generates file with null bases). 

I am aware that the file has missing score filed (as pointed in another posting as well: https://biostar.usegalaxy.org/p/15929/)  

I want to make sure if just the absence of score if the problem in my case, OR if there is anything else wrong in the MAF file. It would be great if I can get some feedback on this.

Thank you!

 

Few lines from MAF file:

##maf version=1 scoring=N/A
# hal ((Chilo_suppressalis_CsuOGS1:1,Plodia_interpunctella_v1:1)Anc1:1,(Bombyx_mori_v1:1,Manduca_sexta_Msex_1:1)Anc2:1)Anc0;

a
s	Bombyx_mori_v1.BABH01080424	0	565	+	565	ACATATGTGGGTATAATAGGTATTCCTGGAAGCAATATATTTTCACCTCGAAACTTGCCATTTAAACTGGTGCCTTCGATAACTTTTTTCATTATTTTTTtaatgactaatcgcgtaccgttgcatagccgtggcgggttcaaattacgaagcaaaataattggagatccaaccttcaattgtaagttatgtggtggcatgcctggcaaatccagtgagttcaaaaactctgttggaaaatttacagcttcaatgccatcgcaaactgtatcaatagatctatatgaTACCAAGTtccCTGGCAATAACTGTTGTATCTTCAGATTTAAAATGTCAACGTCCACATTTTTTGCAGCTAACATTGctctttctgcAAGCCTCTCGAGATTTATGTAATTCGTGTGTACATCGGGAAATATTTGTTCAATGAGAGCATCTTGCGAATCAATGATAGTGCAGAAATCATTCGCTAATTTTATGTATCCAGTTTCAGCTATAGTGACTTTTCCATCACCGATATCTAAGAGTTGTTTAGAGAATGTTTCAGCGGATGGATCTTGAAGCA

a
s	Bombyx_mori_v1.BABH01063440	0	35	+	678	ACCCATTACGGGGCTACGTCCATCGCTCCCTCGTC

a
s	Bombyx_mori_v1.BABH01063440	35	643	+	678	TGTCCCGCCCTGCGGCTGCTCTGGTGCCGCGGCGGGAGCTGactcaccggctgccctttttgcagccggcttctttttctttccgcccctcctcgtttttgtagaggagggggcggatttacacgccgggcccccggcccggtggtcggccttccttttggccgcgtcgcacaagacgcagtgcggcgcggctgtggtacaggaggctgccttgtgccccggttggccgcagcggaaacacaagccgctgcggtccacggtcgaggggcacttggcgaggccgtggccggtgccgaagcaccgaagacaacgccacggcctggcatcctgcagctgcacgtgggccactacccagcccacgcgcagccttcctgggctgtcggaaggccgcccctgtggaggggtggccaggagggtcgccgttgcaatcgggcaacgcgcccacgccgtccgggttccagaatatgtcacccggagctctccgactttgacgtcggcgagggcgcagttgccctgcgacgcaatggctgcggcgacctcctcctttgtggcgcactcgtcgaggcccgtgattttaacttcacccatcttcacgggccgcgctatgcgcaccatctcggggtccggcagaatttctcgaagc
s	Bombyx_mori_v1.BABH01060737	0	643	+	698	TGTCCCGCCTTGCGGCTGCTCTGGTGCCGCGGCGGGAGCTGACTCACcggctgccctttttgcagccggcttctttttctttccgcccctcctcgtttttgtagaggagggggcggatttacacgccgggcccccggcccggtggtcggccttccttttggccgcgtcgcacaagacgcagtgcggcgcggctgtggtgcaggaggctgccttgtgccccggttggccgcagcggaaacacaagccgctgcggtccacggtcgaggggcacttggcgagaccgtggccggtgccgaagcaccgaagacagcgccacggcctggcatcctgcagctgcacgtgggccactacccagcccacgcgcagccttcctgggctgtcggaaggccgcccctgtggaggggtggccaggagggtcgccgttgcaatcgggcaacgcgcccacgccgtccgggttccggaatatgtcacccggagctctccgactttgacgtcggcgagggcgcagttgccctgcgacgcgatggccgcggcgacctcctccttcgtggcgcactcgtcgaggcccgtgattttgacttctcccatcttcacgggccgcgcgatgcgcaccatctcggggtccggcagaatctcccgaagc
s	Anc2.Anc2refChr551	0	643	+	643	TGTCCCGCCTTGCGGCTGCTCTGGTGCCGCGGCGGGAGCTGactcaccggctgccctttttgcagccggcttctttttctttccgcccctcctcgtttttgtagaggagggggcggatttacacgccgggcccccggcccggtggtcggccttccttttggccgcgtcgcacaagacgcagtgcggcgcggctgtggtacaggaggctgccttgtgccccggttggccgcagcggaaacacaagccgctgcggtccacggtcgaggggcacttggcgaggccgtggccggtgccgaagcaccgaagacaacgccacggcctggcatcctgcagctgcacgtgggccactacccagcccacgcgcagccttcctgggctgtcggaaggccgcccctgtggaggggtggccaggagggtcgccgttgcaatcgggcaacgcgcccacgccgtccgggttccggaatatgtcacccggagctctccgactttgacgtcggcgagggcgcagttgccctgcgacgcaatggccgcggcgacctcctcctttgtggcgcactcgtcgaggcccgtgattttgacttctcccatcttcacgggccgcgctatgcgcaccatctcggggtccggcagaatttctcgaagc

a
s	Bombyx_mori_v1.BABH01057177	0	733	+	733	GCCCCCGAGCCGGTccctagggcctcgaaggccaagctccagacccggccgtgaacgccgtcagggccgggggccgcgtccttcgctctcattctggacacggccgcatggatctccgcccccgtgatagggggagggggctcctcggcagcgggaacgctggggcgacgcggaggcgtgccccacgtggcgtccatctggggaggctcaaagtccccccccgctgccggcgggaagagtgccgcaacaatttcccgcagctgccgaggctggagccgctcggtcaccgggagcgcccacggttgcagtttcctgcgaaccatcttgtatgggcgcccccagggatcttcgtcgagcgactccaggagagtcttcatgctctgcttcttggcctcgccgatggccagctgcagcgccgtctgcttttgacgacagtcagcgtgcagctgggccgccgtctccgcgaacgcagcgtcgcgacgacggcggcggcggtggcgtgcgctccggcggcgcgccctcacgcactcctcgcggagtcttgcgatctcgggcgaccaccagaacgcacccccacgtggtgctcgggggccgacccggggcatggcggcatcgcaaatgtttgccatggtgccccggaaccagtcgacctccgcatccacgtccgcgagccgcgcgggttttagtgcccacgcagcaaCAGCGGGCGCCTCCATCAGCAACTCCTTATTCA

a
s	Bombyx_mori_v1.BABH01060338	0	701	+	701	GCTGAAATAGCCTCTCAAGGCTAGgatctcacggatcgtttggaacctctgggtctgcggagggacttcggttccctctgtattttgtaccgtatgttccatggggagtgctctgaggaattgttcgagatgataccggcatctcgtttttaccatcgcaccgcccgccaccggagtagagttcatccatactacctggagccactgcggtcatccacagtgcgtttccagaggtcttttttgccacgtaccatccggctatggaatgagctcccctccacggtgtttcccgagcgctatgacatgtccttcttcaaacgaggcttgtggagagtattaagcggtaggcagcggcttggctctgcccctggcattgctgaagtccatgggcgacggtaaccactcaccatcaggtgggccgtatgctcgtctgtctacaagggcaataaaaaaaaaaaaaaaaaaaaaaaaaaaggtagcataggtagggaaaAAAACGTGAACGGTGTATGGTGTTTGAGGACTGAGCGAGTTTATATATGTGTTTTAGGCATTTATGACGTAAAAAAATCCCGGATTTTGGACGTCGAATATGGAAAGTTAATTTGCTCTAGGTTTTTTATATTCTAAATTAAAAAAAAACTCCATTTTACTTTAGCACTTTGTTTTGCACTTTACGTATCAATTTTTAAATAGGTAAA
s	Bombyx_mori_v1.scaf18	5199276	250	-	5904300	gctgaaatagcctctcaaggcta-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ggtagcataggtagggaaaaaaacgTGAACGGTGTATGGTGTTTGAGGACTGAGCGAGTTTATATATGTGTTTTAGGCATTTATGACGTAAAAAAATCCCGGATTTTGGACGTCGAATATGGAAAGTTAATTTGCTCTAGGTTTTTTATATTCTAAATTAAAAAAAAACTCCATTTTACTTTAGCACTTTGTTTTGCACTTTACGTATCAATTTTTAAATAGGTAAA
s	Anc2.Anc2refChr4888	1634	250	-	4361	gctgaaatagcctctcaaggcta-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ggtagcataggtagggaaaaaaacgTGAACGGTGTATGGTGTTTGAGGACTGAGCGAGTTTATATATGTGTTTTAGGCATTTATGACGTAAAAAAATCCCGGATTTTGGACGTCGAATATGGAAAGTTAATTTGCTCTAGGTTTTTTATATTCTAAATTAAAAAAAAACTCCATTTTACTTTAGCACTTTGTTTTGCACTTTACGTATCAATTTTTAAATAGGTAAA

 

 
maf alignment galaxy • 711 views
ADD COMMENTlink modified 2.8 years ago by Jennifer Hillman Jackson25k • written 2.8 years ago by pooja.narang0
0
gravatar for Jennifer Hillman Jackson
2.8 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

The MAF format appears to be within specification. (See examples here: http://genome.ucsc.edu/FAQ/FAQformat.html#format5)

This sounds like a data mismatch problem. If you want to share a history from http://usegalaxy.org that replicates the issue, we can take a look, since there could be more going on and reviewing a snippet of the input bed/interval file will probably not help enough. 

Generate the history share link, make sure all datasets involved (input and output) are undeleted, and email the history link along with a link to this post to: galaxy-bugs@lists.galaxyproject.org.

https://wiki.galaxyproject.org/Learn/Share

Thanks, Jen, Galaxy team

ADD COMMENTlink written 2.8 years ago by Jennifer Hillman Jackson25k

The MAF format is correct (it is parsed by the tools correctly.

What I found examining the shared history sent into galaxy-bugs:

1. The chromosome identifiers in the BED are a match for those in the MAF. This is good.

2. The chromosome identifiers in the BED/MAF differ from the built-in reference genome "Bombyx_mori_p50T_2.0". While the data source for your data and this are putatively from the same genome/build (Bombyx_mori_v1 is noted as an alternate genome alias/build name for Bombyx_mori_p50T_2.0 ). This is what I suspected the problem might be and why I asked for the data sources in the other post.

3. In this particular analysis, I am not convinced from some quick checks that the single BED region in dataset 2 is included in the MAF in dataset 3. 

My guess is that the original source of the genome was not the same or that the chromosomes were renamed in the fasta file used on your side to build up the MAF data. It could also may have come from a difference source to start with where they changed the identifiers. But I do think these are likely the same build. 

To solve #2, load up the reference genome fasta file Bombyx_mori_v1, used to create the MAF, into your history. Use it as a custom genome build. More specifically, promote it to a custom build so that it can be assigned to the BED input datasets. Make sure the MAFs have the same custom build assigned.

https://wiki.galaxyproject.org/Support#Custom_reference_genome
https://wiki.galaxyproject.org/Learn/CustomGenomes#Custom_Builds

If the region exists in the MAF, and the base reference genomes are in fact the same, a result should be produced. Small item to note: the bed dataset does not have any strand assignment - this means that the tool will automatically assume (+). You could miss results this way. Consider adding strand as a 4th column and changing the datatype to interval, unless you want to pad out the bed format. Review the metadata (under the pencil icon) to make certain that column assignments are correct.

If there is still not output, a tool like "Extract genomic regions" can be used to fetch the sequence from the genome itself for comparison. The custom build genome, not the bulit in index.

For reference, these are the chromosomes included in the built-in index for Bombyx_mori_p50T_2.0. These are clearly a mismatch for the chromosomes in the MAF dataset. The genome name is also a mismatch (in the data), but I am not sure if that is a factor. BUT, it could be. Having all data use the same exact reference genome from the start and using for all steps is critical. More about detecting reference genome problems and some remedies:

https://wiki.galaxyproject.org/Support#Reference_genomes

This solution is also one for the other shared history from your lab. Using a Custom Build is most likely solution, along with including strand.

Test for #3: There is always the possibility that the target regions do not exist in the MAFs, but you both can check that by converting MAF to Interval then executing the tool Join the intervals of two datasets side-by-side using the original BED regions in dataset 2.

Thanks, Jen

ADD REPLYlink written 2.8 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 176 users visited in the last hour