Question: Problem merging five 13.9 GB .BAM files
2
gravatar for rickwhite23
6 months ago by
rickwhite2310
rickwhite2310 wrote:

I keep getting this error when I try to merge five 13.9 GB .BAM file. Any ideas on how I can get it to work?

Fatal error: Matched on error: Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/galaxy-repl/main/jobdir/016/042/16042355/_galaxy_tmp -Xmx7680m -Xms256m [Wed Jun 07 11:31:14 CDT 2017] net.sf.picard.sam.MergeSamFiles INPUT=[/galaxy-repl/main/files/020/211/datase

samtools bam • 128 views
ADD COMMENTlink modified 6 months ago • written 6 months ago by rickwhite2310
0
gravatar for Jennifer Hillman Jackson
6 months ago by
United States
Jennifer Hillman Jackson23k wrote:

Hello,

The job might be exceeding the memory allocated to the tool or there could be a problem with the inputs. Specifically, the BAMs may need to be re-sorted or there is a reference genome mismatch problem that should be corrected.

Please see: https://galaxyproject.org/support/#troubleshooting

Thanks! Jen, Galaxy team

ADD COMMENTlink written 6 months ago by Jennifer Hillman Jackson23k
0
gravatar for rickwhite23
6 months ago by
rickwhite2310
rickwhite2310 wrote:

Thanks for you quick response I have a couple additional questions:

On memory - The files I have up loaded only equal 69.5 GB I delete all other files. I assume that the merged file should only be 69.5 GB in total size combined the the data would only take up 139 GB of the 250 GB allocated. Is this right?

As far as the mismatched reference genome - all files have the same reference

With respect to the "re-sort" - I listed the file from 1-5 are you suggesting that I try with 5-1?

Thanks again for your help

Rick

ADD COMMENTlink written 6 months ago by rickwhite2310

Hi Rick,

Thanks for sending more feedback. The memory used to process jobs is distinct from the available space in your account. Sorting means to sort the BAM datasets (help in link below). Also, just to double check - you examined the BAM headers to make certain that they are each identical? If so, based on this extra info, the solution might be one of these:

  • Try sorting the BAM datasets, then filtering out unmapped reads (to reduce size), then execute a merge.
  • Try a few reruns to see if a different cluster node is able to process all of the data in batch as one job.
  • Merge fewer BAMs in any one job, then try to merge those results to produce the final result.
  • Convert BAM-to-SAM format (without headers for all, then with just the header for one of the inputs as all should be identical), use the tool Concatenate to merge the header with the SAM lines, convert SAM-to-BAM, then sort that final result BAM (coordinate sort for most use cases is best).

See the help sections for sorting inputs and checking chromosome identifiers for details: https://galaxyproject.org/support/#getting-inputs-right-

If after this is done the job still fails, and you would like a second opinion on the content/format before moving to a local/cloud Galaxy to process the data (the final option) - a bug report from one of the failed jobs can be sent in. Please leave all of the original, intermediate, and error datasets undeleted so we can review. Including a link to this Biostars post in the comments will help us to associate that report with this question. You can also add details in the history comments section.

Thanks! Jen

ADD REPLYlink modified 6 months ago • written 6 months ago by Jennifer Hillman Jackson23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 94 users visited in the last hour