Dear Biostars;
To start, I love your FastQC wrapper. In fact, I found performance differences between your version (v0.5.2) and the modern (v0.11.2) versions, where Galaxy's performs better. Specifically, I wonder why the Galaxy version picks up over-represented sequences that the current version does not. I have included two fastQC report outputs, run on the same dataset (one locally with v0.11.2 and one online with Galaxy). For space, I removed possible source column (all are Illumina seq primers).
v0.11.2 (modern): Overrepresented sequences
No overrepresented sequences
v0.52 (galaxy): Overrepresented sequences
Sequence | Count | Percentage | Possible Source |
---|---|---|---|
GATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGGATCAGATCTCGTA | 6632022 | 10.933354026186844 | |
GATCGGAAGAGCGGTTCAGCAGGAATGCCGAGGATCGGAAGAGCGGTTCA | 1131736 | 1.8657462765021882 | |
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT | 196415 | 0.32380392149686615 | |
TATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGGATCAGATCTCGTA | 165001 | 0.27201573632820514 | |
AATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGGATCAGATCTCGTA | 150939 | 0.2488335417703102 |
To clarify, I do not want to know why/how to deal with sequence contaminations. I simply want to get my local (v0.11.2) to perform as well as Galaxy's. Thank you for your time!