Hello,
I am using "samtools mpileup" to generate a pileup file from my .bam dataset and then pass it to VarScan2 for variant calling.
Everything works fine until I try to generate a file with multiple samples. The pileup process works as expected and generates a file with more columns to adjust for the second sample. But when I pass this pileup file to VarScan2 it always throws an error.
My Pileup file looks like that (excerpt):
1 2 3 4 5 6 7 8 9 10 11
chr1 183619 G 1 ^?. ? ? 0 * * *
chr1 774701 G 0 * * * 2 ^3.^2. EE 32
chr1 890319 c 2 ., nG EI 3 .,, nGG E71
chr1 890320 c 2 ., mG EI 3 .,, mGG E71
chr1 890321 g 2 ., jH EI 3 .,, iHG E71
chr1 890322 g 2 ., jH EI 3 .,, gHE E71
I get the following error when running VarScan2 (I supplied two sample names separeted with a comma)
Fatal error: Tool exception
Got the following sample list:
C5 F1
Only SNPs will be reported
Min coverage: 8
Min reads2: 10
Min var freq: 0.25
Min avg qual: 15
P-value thresh: 0.99
Reading input from /home/galaxy/galaxy/database/files/000/dataset_782.dat
Parsing Exception on line:
chr1 183619 G 1 ^?. ? ? 0 * * *
For input string: "?"
Galaxy outputs the following commandline command:
echo C5,F1 | awk -F ',' '{ for (i = 1; i <= NF; i++) { print $i; } }' > samples_list.txt && varscan mpileup2snp /home/galaxy/galaxy/database/files/000/dataset_782.dat --min-coverage 8 --min-reads2 10 --min-avg-qual 15 --min-var-freq 0.25 --min-freq-for-hom 0.75 --p-value 0.99 --output-vcf 1 > /home/galaxy/galaxy/database/files/000/dataset_788.dat --vcf-sample-list samples_list.txt
I need two samples in one pileup/varscan file, because I want to compare the samples directly and see if a mutation is "unique" in just one sample or is present in both.
How can I get rid of this error? What am I doing wrong?
Thank you for your support
Jan