Combining The Paired Reads From Illumina Run

Question: Combining The Paired Reads From Illumina Run

7.7 years ago by

Surya Saha • 80 wrote:

Hi, I have two fastq files with the forward(/1) and reverse(/2) paired reads. The reads are not in same order in either file, some pairs are absent/missing and the files are 8 GB each with abt 30 mill reads each. I am trying to pull out all the paired reads for which both fwd and rev exist. Can I use a combination of fastq tools in Galaxy to do this? Thanks! -Surya

galaxy • 2.6k views

ADD COMMENT • link •

modified 7.7 years ago by Florent Angly • 370 • written 7.7 years ago by Surya Saha • 80

7.7 years ago by

Anton Nekrutenko ♦ 1.7k

Penn State

Anton Nekrutenko ♦ 1.7k wrote:

Are these illumina or solid reads? Tx, anton Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org

ADD COMMENT • link written 7.7 years ago by Anton Nekrutenko ♦ 1.7k

These are Illumina reads -S.

ADD REPLY • link written 7.7 years ago by Surya Saha • 80

You can try converting fastq to tabular (NGS: QC and Manipulation). Jointing (Join, Subtract and Group) the two files on ids (provided they do not have /1 and /2). Splitting into two files with cut (Text manipulation), and going back into fastq with tabulat-to-fastq (NGS: QC and Manipulation). With 30 mil reads this will likely take some time though. Thanks, anton Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org

ADD REPLY • link written 7.7 years ago by Anton Nekrutenko ♦ 1.7k

Hi Anton, Thank you for the tip. The sequence names do end in /1 and /2 but that can be fixed using Manipulate FASTQ tool, right? -Surya Jointing (Join, Subtract and Group) the two files on ids (provided they do not have /1 and /2). Splitting into two files with cut (Text manipulation), and going back into fastq with tabulat-to-fastq (NGS: QC and Manipulation). With 30 mil reads this will likely take some time though. reads. The reads are not in same order in either file, some pairs are absent/missing and the files are 8 GB each with abt 30 mill reads each. exist. Can I use a combination of fastq tools in Galaxy to do this?

ADD REPLY • link written 7.7 years ago by Surya Saha • 80

In a hacky way, where you translate "/1" into something else such as two spaces " ", or your favorite chemical element such as "He" ;) a. Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org

ADD REPLY • link written 7.7 years ago by Anton Nekrutenko ♦ 1.7k

7.7 years ago by

Surya Saha • 80

Surya Saha • 80 wrote:

Hi Tony, Yes, that should work too. I have written up a BioPerl hack that indexes the reads and pulls out the pairs that is chugging away right now. If that does not work out somehow, I will give your idea a shot. Thanks! Best, Surya

ADD COMMENT • link written 7.7 years ago by Surya Saha • 80

7.7 years ago by

Florent Angly • 370

Florent Angly • 370 wrote:

Hi Surya, I made Galaxy scripts, FASTQ interlacer and de-interlacer, to do exactly what you are describing: https://bitbucket.org/fangly/galaxy-central/changeset/3fa11cf2730d The tools extend the Galaxy Python API and therefore need Galaxy to work. Unfortunately, FASTQ interlacer and de-interlacer are still waiting to be committed to the Galaxy development repository by a Galaxy maintainer. Florent

ADD COMMENT • link written 7.7 years ago by Florent Angly • 370

Hi Florent, This looks great. Hope it gets committed into the repository soon. Best, Surya

ADD REPLY • link written 7.7 years ago by Surya Saha • 80

Please log in to add an answer.

Similar posts • Search »