Question: Error while generating index files from MAFs for local galaxy
0
gravatar for pooja.narang
2.0 years ago by
United States
pooja.narang0 wrote:

I am trying to create index files for the MAFs (for GRCH38 human reference) in the local Galaxy instance as described here: https://wiki.galaxyproject.org/Admin/ReferenceMAFs.

I have used the script maf_build_index.py (https://bitbucket.org/james_taylor/bx-python/src/tip/scripts/maf_build_index.py?fileviewer=file-view-default) from the bx-python distribution (https://bitbucket.org/james_taylor/bx-python/wiki/Home). I am able to create index for all the chromosomes except for chr1, chr2 and chr3 (their sizes are approx 70GB, 70 GB and 57GB respectively). I am running it on cluster and on high memory nodes. I get the following error for these three chromosomes:


Traceback (most recent call last): File "/home/pnarang2/anaconda2/pkgs/bx-python-0.7.3-np110py27_1/bin/maf_build_index.py", line 4, in <module> __import__('pkg_resources').run_script('bx-python==0.7.3', 'maf_build_index.py')

File "/home/pnarang2/anaconda2/lib/python2.7/site-packages/setuptools-21.2.1-py2.7.egg/pkg_resources/__init__.py", line 719, in run_script

File "/home/pnarang2/anaconda2/lib/python2.7/site-packages/setuptools-21.2.1-py2.7.egg/pkg_resources/__init__.py", line 1504, in run_script

File "/home/pnarang2/anaconda2/lib/python2.7/site-packages/bx_python-0.7.3-py2.7-linux-x86_64.egg-info/scripts/maf_build_index.py", line 83, in <module>

if __name__ == "__main__": main()

File "/home/pnarang2/anaconda2/lib/python2.7/site-packages/bx_python-0.7.3-py2.7-linux-x86_64.egg-info/scripts/maf_build_index.py", line 80, in main indexes.write( out )

File "/home/pnarang2/anaconda2/lib/python2.7/site-packages/bx/interval_index_file.py", line 332, in write write_packed( f, ">I", base )

File "/home/pnarang2/anaconda2/lib/python2.7/site-packages/bx/interval_index_file.py", line 463, in write_packed f.write( pack( pattern, *vals ) ) struct.error: 'I' format requires 0 <= number <= 4294967295


Can someone please suggest how can I make the script work for these three chromosomes.

galaxy • 786 views
ADD COMMENTlink modified 23 months ago • written 2.0 years ago by pooja.narang0

Hello, Our team will be getting back to you shortly (I asked the authors to help investigate the problem and provide troubleshooting help). Your patience is appreciated - our entire team has been involved in intensive prep this last week and is now in travel for the yearly Galaxy Community conference starting today and through next Friday (June 30th). This is a busy time for all of us and many in the core community of developers.

That said, I have this bookmarked and will track so that you get assistance (possibly during the Hackathons at the start of the conference). Please know we want to help you and will as soon as possible.

Jen, Galaxy team

ADD REPLYlink written 24 months ago by Jennifer Hillman Jackson25k

Hi Jen,

I wanted to connect with you again regarding the issue generating maf index files (https://biostar.usegalaxy.org/p/18196/#18701). My update is that I checked the mdsums of the MAF files and they match. Also tried downloading the files again and running maf_build_index.py, but get the same error each time.

This part is very important to work we are trying to accomplish and I was wondering if someone has reproduced the same problem.

Any help related to this is appreciated.

Best, Pooja

ADD REPLYlink written 23 months ago by pooja.narang0

In the meantime, we have our systems administrator look into the error.

They suggested that there seems to be a bug with python 2.7.x zipfile code. Below is what they did:

"Zipfiles in excess of 4gb seem to be the culprit so I tracked down this thread http://bugs.python.org/issue9720 and applied the patch to your local installation."

Even after rerunning the jobs after the patch update, I still get the same error for these files. So I am not sure how to fix this and generate the Maf index files for these three chromosomes.

Any help is appreciated.

ADD REPLYlink written 23 months ago by pooja.narang0

If this is a problem locally (it seems to be, perhaps memory related) - reporting this back to your admin would be a good idea.

ADD REPLYlink written 23 months ago by Jennifer Hillman Jackson25k

These files do need additional memory. If the error is due to memory, the error message indicates it. Running the job with additional memory still generates the above error for these three chromosomes.

ADD REPLYlink written 23 months ago by pooja.narang0

I checked the mdsums of the MAF files and they match. Also tried downloading the files again and running maf_build_index.py, but get the same error each time.

ADD REPLYlink modified 23 months ago • written 23 months ago by pooja.narang0
1
gravatar for Daniel Blankenberg
23 months ago by
Daniel Blankenberg ♦♦ 1.7k
United States
Daniel Blankenberg ♦♦ 1.7k wrote:

This appears to be a limitation of the bx-python library.

I've submitted an issue here: https://github.com/bxlab/bx-python/issues/8

ADD COMMENTlink written 23 months ago by Daniel Blankenberg ♦♦ 1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 110 users visited in the last hour