Hi, I am a beginner to chip-seq analysis using Galaxy. I have received my data and did fastqc quality check. I was wondering if I could get some help in trying to understand what is needed to be done in order to correctly prep the raw data for mapping and peak calling. Also, is it necessary to trim adapters before doing mapping as some of my datasets have one or two over-represented sequences which match the Truseq adapter sequences. I tried trim galore! and trimmomatic but when i run fastqc on the trimmed dataset, the length distribution changes from 101 all sequences to a range of 0-101.


The sequence length distribution will change with the adaptors removed. If the adaptor represents most of a sequence, it wouldn't map anyway (or map well). To better understand the impact, you could try mapping with both raw and trimmed data and compare the results - or go further and review the differences in peaks called using both mapping results. If adaptor content is the only QA problem addressed, I wouldn't expect much of a difference, but compare to find out yourself if this is true for your data.

The Galaxy tutorials have QA/QC help and an example for ChIP-seq analysis:

Review these first:

Thanks! Jen, Galaxy team

