Question: Multiple input for samtools mpileup (-b option) for GBS pipeline
0
gravatar for fiebig
2.7 years ago by
fiebig30
fiebig30 wrote:

Hello, I'm working on an GBS analysis workflow which was previously designed for command line usage. It should analyse more than 50 samples. Steps:

  1. Trimming of fastq-input
  2. Mapping =>multiple sorted and indexed BAM files, one per sample

My problem starts here:

I want to use the BAM files as input for variant detection in samtools mpileup. On command line, I simply read the path of all BAM files I have created in the previous steps to a list of BAM files and hand this file to mpileup using the -b bamlist.txt option. Doing so, I receive one VCF file storing information of multiple samples I completely failed to reproduce this result in Galaxy. The only way..so far..is to state every BAM file using the mpileup interface "by hand". This will produce the desired file. With two files I got the following Log file

[mpileup] 2 samples in 2 input files <mpileup> Set max per-file depth to 4000

But it is not practicable for more than 10 samples...

So far, I tried Galaxys "multiple input" option as well as the "data list collection" - still every BAM input is treated as a single input file resulting in one VCF per BAM instead of one VCF covering multiple samples.

Anybody here who went into the same problems and knows the trick? Is there a possibility to hand over BAM input to mpileup dynamically?

Every help would be appreciated. Maybe I did not understand the problem. I have the same trouble trying to merge BAM files...

Best regards, Anne

ADD COMMENTlink modified 2.7 years ago by Jennifer Hillman Jackson25k • written 2.7 years ago by fiebig30
1
gravatar for Jennifer Hillman Jackson
2.7 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

I suggest running each BAM dataset individually through this tool (multiple or collection). Next, merge the resulting VCF files using "VFCsort" followed by "VCFcombine".

Once you have a working analysis path, consider placing the tools into a workflow for re-use.

Best, Jen, Galaxy team

ADD COMMENTlink written 2.7 years ago by Jennifer Hillman Jackson25k
1

Ok, I gave it a try and basically it will do the job. There's only one small difference in the results file between mpileup -b and vcfsort: If one sample has no variants for a distinct position I got an "0,0,0" previously. vcfsort will set it to "." - a minor issue, that can be fixed easely.

Thanks a lot for your helpful suggestion!

ADD REPLYlink written 2.7 years ago by fiebig30

Update: The tool has been modified. If read groups are included in the input BAM datasets (@RG) multiple inputs will result in a combined output.

From the tool form:

What it does

Report variants for one or multiple BAM files. Alignment records are grouped by sample identifiers in @RG header lines. If sample identifiers are absent, each input file is regarded as one sample.

ADD REPLYlink written 9 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour