Hi,
I have a command line script that does GATK analysis - I am now trying to set up a similar workflow on galaxy. Unfortunately, this is my first time using galaxy and it seems a bit daunting loading/organizing the data and the analysis! So, any help would be highly appreciated!!
I have my illumina fastq WGS data organized in sample folders, e.g.:
Sample_1/C45GOXX_s1_1_xx.fastq Sample_1/C45GOXX_s1_2_xx.fastq Sample_1/C45GOXX_s2_1_xx.fastq Sample_1/C45GOXX_s2_2_xx.fastq Sample_1/C45GOXX_s3_1_xx.fastq Sample_1/C45GOXX_s3_2_xx.fastq . .
In my script, I loop over all the files in the sample so that I can do the following for each pair (steps derived from Broad GATK 'Best Practises') :
i) Align each sequence (command line equivalent - baw aln) ii) Combine paired end reads (bwa sampe) iii) SAM TO BAM (samtools view) iv) Fix mate pair information (samtools fixmate) v) Set MAPQ to 0 for unmapped reads (samtools view) vi) Sort bam file (samtools sort)
My questions:
How can I set up a loop (or some other workflow/structure) so that, for each sample, I get a set of bam files (output of step vi)?
Here are the commands that I think I can use for each of the steps above
i) NGS: Mapping > Map with BWA for Illumina ii) ?? iii) NGS: SAMtools > SAM-to-BAM iv) ?? v) ?? vi) NGS: SAMtools > Sort
-- which tools/commands should I be using for ii,iv & v ?
many thanks!