I have paired-end data (in two separate files). I have groomed my files, and would like to filter the reads for quality. However, the read lengths for some of the paired reads are not equal. If I filter the files independently, the mapping fails as the two files contain different numbers of reads. If I use FASTQ Joiner to merge the data, then filter, I cannot use FASTQ Splitter as some reads (~44%) can't be split due to unequal read lengths. Any help gratefully received. For example, is there a way to trim reads so that the paired reads are the same length? I should say that I am a Galaxy newcomer, so go easy...!
What mapper are you using? Is the data Illumina? That has been previously manipulated to produce variable length reads?
Fastq files can be filtered so that both the forward and reverse inputs contain the same exact reads, but this is usually not necessary at the mapping step.
Thank you for the reply Jennifer. I'm using Bowtie2 to map. The data is Illumina. I think I have the raw reads (sent from collaborator in Japan). I have used FASTQ groomer only, i'm not aware of any other manipulation. Thank you for your help.
Using Bowtie2, the content of the two fastq input files for paired-end mapping does not need to be identical.
Perform QA steps before the mapping run on the individual datasets.
Then filter the resulting BAM dataset after the run for properly paired mapped reads, etc.
If Bowtie2 is giving you an error - it is likely a format problem with fastq data itself. If you want help about that, let us know.
Thanks, Jen