Question: 100-way multiZ alignments (with hg38 as reference)
1
gravatar for pkalungi
2.5 years ago by
pkalungi10
pkalungi10 wrote:

Hi I am currently using GRCh/hg38 as my reference. I would like to inquire if there is 100-way multiZ alignment (with hg38 as reference) available in Galaxy. And if not, what are the other options available?

alignment • 926 views
ADD COMMENTlink modified 2.5 years ago by ana1620 • written 2.5 years ago by pkalungi10
1
gravatar for ana16
2.5 years ago by
ana1620
ana1620 wrote:

Hi pkalungi, generally the names associated with the chromosomes are haplotypes, partial chromosomes or alternate reference loci. Hope that helps!

ADD COMMENTlink written 2.5 years ago by ana1620
0
gravatar for Jennifer Hillman Jackson
2.5 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

The current MAF/multiz alignment to use is based off of hg19. There are older builds, yet this is the latest for human.

Updating to hg38 is not an immediate goal, but I will add it to the request list (scroll down to see the post: https://github.com/galaxyproject/galaxy/issues/1470).

Thanks, Jen, Galaxy team

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Jennifer Hillman Jackson25k
0
gravatar for pkalungi
2.5 years ago by
pkalungi10
pkalungi10 wrote:

Hello, As someone who has been using hg38 as the reference, which advice do you have for me in order to see progress? Thanks.

ADD COMMENTlink written 2.5 years ago by pkalungi10

The link I shared has a list of to-do items for data additions and now the MAF for hg38 is added in. You can follow progress there.

I should let you know that there is one other option - using MAF data from the history with the tool. This involves obtaining the hg38 MAF data from the UCSC Downloads area (found under Human, hg38, "Conservation") and loading the files into Galaxy. I do not know how large it is uncompressed, so it may or may not fit into the 250 GB account quota at http://usegalaxy.org unless you clear out (permanently delete) other work. But, it is a choice you could explore. I just testing this functionality out yesterday (for a different test) using just MAF data from a single chromosome and the MAF tools functioned without issue.

Do not attempt to extract this data from the UCSC Table Browser as the data is too large and will be truncated for most chromosomes. Locate the data in the UCSC Downloads area and load by URL or download locally then load using FTP.

This is where exactly to get it. The files you want are those named like chrNNN.maf.gz. Once in your history, use the tool Concatenated to create a single reference MAF dataset (and the per-chrom datasets perm deleted to recover space, after a successful data merge is confirmed).

This could all be done on a local/cloud Galaxy as well, given sufficient resources.

Best, Jen, Galaxy team

ADD REPLYlink written 2.5 years ago by Jennifer Hillman Jackson25k
0
gravatar for pkalungi
2.5 years ago by
pkalungi10
pkalungi10 wrote:

Thank you for your help and explaining in detail.

From the UCSC download link, I am trying to download the multiz100way alignments. In the "maf" folder, I can see several maf.gz files for all the chromosomes. Some of the them are like chr22.maf.gz and so on... and some of them are like chr22_GL383583v2_alt.maf.gz and chr22_KI270731v1_random.maf.gz. Please see the screenshot of files listed in the folder (attached below). There is nothing in README to explain what is in these files. Do I just need the chromosome files (chr22.maf.gz) or all the files (every file related to the chromosome)?

Your help is much appreciated.

Screenshot of the files in maf folder

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by pkalungi10

Just use the primary MAFs unless your query bed dataset contains these additional chromosome variants. The data will only link if the chromosome name in your query is an exact match for the chromosome name in the MAF. Meaning, you could get all, but only those that are a match will be part of the analysis.

Example: for data coordinates based on chr22, use the chr22.maf.gz file.

ADD REPLYlink written 9 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour