Hi,
I need help with a tool I'm working on. It takes a data collection list of bam files as input. It is part of a workflow, that starts with a list of fastq files. The .sh script contains a pipeline that enables multi-threading for samtools mpileup and bcftools. It takes the number of cores, the reference fasta for samtools and all bams from the collection as input. The initial loop is required for linking .bam and the .bai index. Thanks to Björn Grüning for that snippet.
#for $bam in $input_bam:
ln -s $bam '${bam}.bam' && ln -s $bam.metadata.bam_index '${bam}.bai' &&
#end for
gbs_pileup_bcf_parallel.sh $threads $ref "
#for $bam in $input_bam:
${bam}
#end for
" >${vcf_out} 2>$log
It works quite well, except for one problem: by handing the input ($bam) to bash, I loose track of the sample names, instead the path in the working directory is printed (see{}).
CHROM |POS |ID|REF| ALT| QUAL |FILTER|INFO |FORMAT |{/path/to/dataset_2416.dat} | {/path/to/dataset_2418.dat} chr1H_part2|1470552|. | G | A |75.975| . |DP=3;[...] |GT:PL:DP:DV:GQ| 1/1:108,9,0:3:3:11 | 1/1:0,0,0:0:0:4
Running the normal toolshed samtools mpileup in single thread-mode, I get the following, more convenient output (containing the sample names from the collection):
CHROM |POS |ID|REF| ALT| QUAL |FILTER|INFO |FORMAT |{ETC1_R2.mini.trim.fq}|{ETC1_R1.mini.trim.fq} chr1H_part2|1470552|. | G | A |75.975| . |DP=3;[...] |GT:PL:DP:DV:GQ| 1/1:108,9,0:3:3:11 | 1/1:0,0,0:0:0:4
I need to find a way to replace "/path/to/dataset_2416.dat" with original sample name "ETC1_R2.mini.trim.fq" as part of the collection. Since I already have to softlink .bam and .bai, I could easily do this at this stage, but I miss the commands how to address the metadata of a data collection, more specific the input name of all samples in a collection. After linking I would expect a name like "/path/to/ETC1_R2.mini.trim.fq". That would be ok for me.
Any hints? Or better ideas? ;)
Anne
Ok, problem fixed by using the Readgroup information. Nevertheless, I would like to know how to access single elements metadata of a collection. I just realised, I can't even change the names of single elements via the interface. Only the name of the collection itself can be changed. (just wan't to read the name from the collection items...not change them, of course)
Can be closed.