Question: Combining The Paired Reads From Illumina Run
7.7 years ago by
Surya Saha80
Surya Saha80 wrote:
Hi, I have two fastq files with the forward(/1) and reverse(/2) paired reads. The reads are not in same order in either file, some pairs are absent/missing and the files are 8 GB each with abt 30 mill reads each. I am trying to pull out all the paired reads for which both fwd and rev exist. Can I use a combination of fastq tools in Galaxy to do this? Thanks! -Surya
7.7 years ago by
Penn State
Anton Nekrutenko1.7k wrote:
Are these illumina or solid reads? Tx, anton Anton Nekrutenko
These are Illumina reads -S.
You can try converting fastq to tabular (NGS: QC and Manipulation). Jointing (Join, Subtract and Group) the two files on ids (provided they do not have /1 and /2). Splitting into two files with cut (Text manipulation), and going back into fastq with tabulat-to-fastq (NGS: QC and Manipulation). With 30 mil reads this will likely take some time though. Thanks, anton Anton Nekrutenko
Hi Anton, Thank you for the tip. The sequence names do end in /1 and /2 but that can be fixed using Manipulate FASTQ tool, right? -Surya Jointing (Join, Subtract and Group) the two files on ids (provided they do not have /1 and /2). Splitting into two files with cut (Text manipulation), and going back into fastq with tabulat-to-fastq (NGS: QC and Manipulation). With 30 mil reads this will likely take some time though. reads. The reads are not in same order in either file, some pairs are absent/missing and the files are 8 GB each with abt 30 mill reads each. exist. Can I use a combination of fastq tools in Galaxy to do this?
In a hacky way, where you translate "/1" into something else such as two spaces " ", or your favorite chemical element such as "He" ;) a. Anton Nekrutenko
7.7 years ago by
Surya Saha80
Surya Saha80 wrote:
Hi Tony, Yes, that should work too. I have written up a BioPerl hack that indexes the reads and pulls out the pairs that is chugging away right now. If that does not work out somehow, I will give your idea a shot. Thanks! Best, Surya
7.7 years ago by
Florent Angly370
Florent Angly370 wrote:
Hi Surya, I made Galaxy scripts, FASTQ interlacer and de-interlacer, to do exactly what you are describing: The tools extend the Galaxy Python API and therefore need Galaxy to work. Unfortunately, FASTQ interlacer and de-interlacer are still waiting to be committed to the Galaxy development repository by a Galaxy maintainer. Florent
Hi Florent, This looks great. Hope it gets committed into the repository soon. Best, Surya
