I have acquired ChIP-seq data from an Illumina HiSeq 3000. I have followed the analysis workflow from tutorials and other Galaxy users, but still seem to be having trouble with the output data. The Peak calling is poor and unable to successfully be run through annotation programs. I am unsure where exactly the issue is, but I think it is somewhere before the mapping step. Does anyone know if the new HiSeq 3000 read data has a different format that is not accurately recognized by the standard mapping and/or peak calling programs? Or any advice on how to manipulate the fastq files (in Galaxy) so that I can have effective alignment and peak calling? I am working with the drosophila reference genome, so programs that have dm6 already in it would be preferable, but I can always upload the reference genome from elsewhere
Fastq files produced by a HiSeq 3000 are no different from those produced by any other HiSeq in the last ~5 years. Perhaps you need to trim your data a bit or you have significant contamination (e.g., from Wolbachia). What sort of alignment metrics are you getting?
the summary statistics for most alignments were as follows
38963284 reads; of these: 38963284 (100.00%) were unpaired; of these: 1554561 (3.99%) aligned 0 times 30307818 (77.79%) aligned exactly 1 time 7100905 (18.22%) aligned >1 times 96.01% overall alignment rate
the FastQC results also looked decent, though I have run some of the samples through trimmomatic and fastQgroomer, but the bowtie statistics are not drastically improved, only 0.5-2% improvement