Question: Downsampling BAM file
0
gravatar for jshearstone
4.3 years ago by
jshearstone0 wrote:

Hello,

I am trying to downsample a BAM file to a defined number of reads. For example, the .bam file currently has 36 million aligned reads and I'd like to create a new .bam file that contains 28 million reads. I have not been able to find a tool on the usegalaxy.org Tools section that allows me to do this easily, but I am probably just missing something. Could someone point me in the right direction?

Thank you,

Jeff

chip-seq • 3.0k views
ADD COMMENTlink modified 4.3 years ago by Jennifer Hillman Jackson25k • written 4.3 years ago by jshearstone0
1
gravatar for Jennifer Hillman Jackson
4.3 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

An alternative (all steps can be performed on the public Main Galaxy instance, and with minimal tool install on a local/cloud):

  1. Convert using tool "NGS: SAM Tools -> BAM-to-SAM", preserving headers.
  2. Use tool "Filter and Sort -> Select" to filter for lines that are and are not headers (start with the character "@").
    > Using pattern "^@" (without quotes). Run twice, this will create two new datasets.
  3. On the dataset containing the mapping lines, use tool "Text Manipulation -> Select random lines from a file". Change to "tabular" datatype first if needed (pencil icon -> Datatype tab -> modify and save).
  4. Add the dataset containing the headers to the result with tool "Text Manipulation -> Concatenate datasets tail-to-head". Change to "sam" datatype" if this produces tabular.
  5. Convert using tool "NGS: SAM Tools -> SAM-to-BAM".

Remove unmapped lines at the start, if wanted/needed, with one of the "NGS: SAM Tools -> Filter SAM (BAM)" tools. There are variations on the above, but all have about the same number of steps. Just permanently delete intermediate files once done to regain disk space. After creating a workflow from your history, if you think you might do this again.

Best, Jen, Galaxy team

ADD COMMENTlink written 4.3 years ago by Jennifer Hillman Jackson25k
0
gravatar for Bjoern Gruening
4.3 years ago by
Bjoern Gruening5.1k
Germany
Bjoern Gruening5.1k wrote:

Hi Jeff!

Not for BAM files but FASTQ files: http://toolshed.g2.bx.psu.edu/view/peterjc/sample_seqs

Cheers,

Bjoern

ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by Bjoern Gruening5.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 171 users visited in the last hour