I am trying to create index files for the MAFs (for GRCH38 human reference) in the local Galaxy instance as described here: https://wiki.galaxyproject.org/Admin/ReferenceMAFs.
I have used the script maf_build_index.py (https://bitbucket.org/james_taylor/bx-python/src/tip/scripts/maf_build_index.py?fileviewer=file-view-default) from the bx-python distribution (https://bitbucket.org/james_taylor/bx-python/wiki/Home). I am able to create index for all the chromosomes except for chr1, chr2 and chr3 (their sizes are approx 70GB, 70 GB and 57GB respectively). I am running it on cluster and on high memory nodes. I get the following error for these three chromosomes:
Traceback (most recent call last): File "/home/pnarang2/anaconda2/pkgs/bx-python-0.7.3-np110py27_1/bin/maf_build_index.py", line 4, in <module> __import__('pkg_resources').run_script('bx-python==0.7.3', 'maf_build_index.py')
File "/home/pnarang2/anaconda2/lib/python2.7/site-packages/setuptools-21.2.1-py2.7.egg/pkg_resources/__init__.py", line 719, in run_script
File "/home/pnarang2/anaconda2/lib/python2.7/site-packages/setuptools-21.2.1-py2.7.egg/pkg_resources/__init__.py", line 1504, in run_script
File "/home/pnarang2/anaconda2/lib/python2.7/site-packages/bx_python-0.7.3-py2.7-linux-x86_64.egg-info/scripts/maf_build_index.py", line 83, in <module>
if __name__ == "__main__": main()
File "/home/pnarang2/anaconda2/lib/python2.7/site-packages/bx_python-0.7.3-py2.7-linux-x86_64.egg-info/scripts/maf_build_index.py", line 80, in main indexes.write( out )
File "/home/pnarang2/anaconda2/lib/python2.7/site-packages/bx/interval_index_file.py", line 332, in write write_packed( f, ">I", base )
File "/home/pnarang2/anaconda2/lib/python2.7/site-packages/bx/interval_index_file.py", line 463, in write_packed f.write( pack( pattern, *vals ) ) struct.error: 'I' format requires 0 <= number <= 4294967295
Can someone please suggest how can I make the script work for these three chromosomes.
Hello, Our team will be getting back to you shortly (I asked the authors to help investigate the problem and provide troubleshooting help). Your patience is appreciated - our entire team has been involved in intensive prep this last week and is now in travel for the yearly Galaxy Community conference starting today and through next Friday (June 30th). This is a busy time for all of us and many in the core community of developers.
That said, I have this bookmarked and will track so that you get assistance (possibly during the Hackathons at the start of the conference). Please know we want to help you and will as soon as possible.
Jen, Galaxy team
Hi Jen,
I wanted to connect with you again regarding the issue generating maf index files (https://biostar.usegalaxy.org/p/18196/#18701). My update is that I checked the mdsums of the MAF files and they match. Also tried downloading the files again and running maf_build_index.py, but get the same error each time.
This part is very important to work we are trying to accomplish and I was wondering if someone has reproduced the same problem.
Any help related to this is appreciated.
Best, Pooja
In the meantime, we have our systems administrator look into the error.
They suggested that there seems to be a bug with python 2.7.x zipfile code. Below is what they did:
"Zipfiles in excess of 4gb seem to be the culprit so I tracked down this thread http://bugs.python.org/issue9720 and applied the patch to your local installation."
Even after rerunning the jobs after the patch update, I still get the same error for these files. So I am not sure how to fix this and generate the Maf index files for these three chromosomes.
Any help is appreciated.
If this is a problem locally (it seems to be, perhaps memory related) - reporting this back to your admin would be a good idea.
These files do need additional memory. If the error is due to memory, the error message indicates it. Running the job with additional memory still generates the above error for these three chromosomes.
I checked the mdsums of the MAF files and they match. Also tried downloading the files again and running maf_build_index.py, but get the same error each time.