Question: Combining The Paired Reads From Illumina Run
0
gravatar for Surya Saha
7.7 years ago by
Surya Saha80
Surya Saha80 wrote:
Hi, I have two fastq files with the forward(/1) and reverse(/2) paired reads. The reads are not in same order in either file, some pairs are absent/missing and the files are 8 GB each with abt 30 mill reads each. I am trying to pull out all the paired reads for which both fwd and rev exist. Can I use a combination of fastq tools in Galaxy to do this? Thanks! -Surya
galaxy • 2.6k views
ADD COMMENTlink modified 7.7 years ago by Florent Angly370 • written 7.7 years ago by Surya Saha80
0
gravatar for Anton Nekrutenko
7.7 years ago by
Penn State
Anton Nekrutenko1.7k wrote:
Are these illumina or solid reads? Tx, anton Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org
ADD COMMENTlink written 7.7 years ago by Anton Nekrutenko1.7k
These are Illumina reads -S.
ADD REPLYlink written 7.7 years ago by Surya Saha80
You can try converting fastq to tabular (NGS: QC and Manipulation). Jointing (Join, Subtract and Group) the two files on ids (provided they do not have /1 and /2). Splitting into two files with cut (Text manipulation), and going back into fastq with tabulat-to-fastq (NGS: QC and Manipulation). With 30 mil reads this will likely take some time though. Thanks, anton Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org
ADD REPLYlink written 7.7 years ago by Anton Nekrutenko1.7k
Hi Anton, Thank you for the tip. The sequence names do end in /1 and /2 but that can be fixed using Manipulate FASTQ tool, right? -Surya Jointing (Join, Subtract and Group) the two files on ids (provided they do not have /1 and /2). Splitting into two files with cut (Text manipulation), and going back into fastq with tabulat-to-fastq (NGS: QC and Manipulation). With 30 mil reads this will likely take some time though. reads. The reads are not in same order in either file, some pairs are absent/missing and the files are 8 GB each with abt 30 mill reads each. exist. Can I use a combination of fastq tools in Galaxy to do this?
ADD REPLYlink written 7.7 years ago by Surya Saha80
In a hacky way, where you translate "/1" into something else such as two spaces " ", or your favorite chemical element such as "He" ;) a. Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org
ADD REPLYlink written 7.7 years ago by Anton Nekrutenko1.7k
0
gravatar for Surya Saha
7.7 years ago by
Surya Saha80
Surya Saha80 wrote:
Hi Tony, Yes, that should work too. I have written up a BioPerl hack that indexes the reads and pulls out the pairs that is chugging away right now. If that does not work out somehow, I will give your idea a shot. Thanks! Best, Surya
ADD COMMENTlink written 7.7 years ago by Surya Saha80
0
gravatar for Florent Angly
7.7 years ago by
Florent Angly370
Florent Angly370 wrote:
Hi Surya, I made Galaxy scripts, FASTQ interlacer and de-interlacer, to do exactly what you are describing: https://bitbucket.org/fangly/galaxy-central/changeset/3fa11cf2730d The tools extend the Galaxy Python API and therefore need Galaxy to work. Unfortunately, FASTQ interlacer and de-interlacer are still waiting to be committed to the Galaxy development repository by a Galaxy maintainer. Florent
ADD COMMENTlink written 7.7 years ago by Florent Angly370
Hi Florent, This looks great. Hope it gets committed into the repository soon. Best, Surya
ADD REPLYlink written 7.7 years ago by Surya Saha80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 181 users visited in the last hour