I would like some advice on the workflow I'm using.
1- I have WGS data from bacteria, I've uploaded my reference genome and fastq files through FTP.
2- FastQC analysis.
3- FastQ Groomer to get my fastq files in fastqillumina format prior to mapping
4- Mapping to ref. genome with BWA for Illumina
5- Picard alignment summary metrics (not sure how to interpret all of the output yet, but I'll get there).
Now I think I should get rid of unpaired reads and remove duplicates . Or should I have done that before mapping ?
Can I use the "create matched paired end dataset workflow" for that ? (I can only select FastQ groomer files as RAW files. Shouldn't I use the original fastq files ?)
Any feedbacks would be much appreciated.