Question: Pre-Processing Of Illumina Rna-Seq Paired End Data
0
gravatar for Ravi Karra
6.8 years ago by
Ravi Karra10
Ravi Karra10 wrote:
Hello, I have Illumina 76bp paired end data for a zebrafish RNA-seq experiment and am basically stuck while trying to pre-process my data prior to using Tophat/CuffDiff. For each sample, I have a read1 fastq file and a paired read2 fastq file. After using FASTQ Groomer, I trimmed the ends using FASTQ quality trimmer with a threshold quality score of 20 ans a window size of 1 (I think that will essentially lop off the end of the read until the quality score is >= 20). Next, I trimmed the adapters using Clip. What I am left with is a modified read1 fastq file and a modified read2 file, where the pairs are not in the same order and some reads are left without pairs. From what I have read, I don't think TopHat can incorporate paired end data that is out of order.. I tried to get around the ordering issue using FASTQ joiner, but this tool is not able to join the reads (return is 0 joined reads). I am not really sure why FASTQ joiner didn't work for me and am looking for suggestions of what to try next. Thanks! ravi
rna-seq cuffdiff • 4.0k views
ADD COMMENTlink modified 6.8 years ago by Victor Ruotti90 • written 6.8 years ago by Ravi Karra10
0
gravatar for Sameet Mehta
6.8 years ago by
Sameet Mehta10
Sameet Mehta10 wrote:
Hi, I think you need to first remove the adaptors and then trim the reads. That is probably the correct way. As for the second part of the question, you could try a rudimentary way to actually search for a sequence header. I have seen this different sizes in the r1 and r2 read files, but taken together almost 90% turn out to be true the paired reads. Hope this helps, Sameet -- Sameet Mehta, Ph.D., Phone: (301) 842-4791
ADD COMMENTlink written 6.8 years ago by Sameet Mehta10
0
gravatar for SHAUN WEBB
6.8 years ago by
SHAUN WEBB70
SHAUN WEBB70 wrote:
Hi Ravi, I got around this problem by using the fastq interlacer to join reads in to a single file, then use deinterlacer to output only reads that have a pair in the correct order. You may need to alter read IDs first by adding /1 and /2 to the end (see interlacer help text). I used unix command line sed but I'm sure you can use galaxy tools to do this. Shaun Quoting Ravi Karra <ravi.karra@gmail.com> on Wed, 22 Feb 2012 12:29:18 -0500: -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
ADD COMMENTlink written 6.8 years ago by SHAUN WEBB70
0
gravatar for Victor Ruotti
6.8 years ago by
Victor Ruotti90 wrote:
Hi, I hope someone can help me on how to implement this into a wrapper. We would like to add an option so the user can set a sample name which then be used for the prefix of the output files names. For example, Is it possible to provide a sample name that can be used to prefix the output files? For example, could I specify a sample name "S" and have the output files be "S.gene_abundances" "S.isoform_abundances", "S.rsem_log", and "S.bam"? I know we can name the files from the xml, but is the a away to allow the user to pass this prefix without having to do a recursive conditional in the xml file to set this prefix? Or any other way people are doing this? Thanks in advance. Victor
ADD COMMENTlink written 6.8 years ago by Victor Ruotti90
This kind of question is normally redirected to the galaxy-dev list. You have no control over the file names at all - Galaxy will assign something like database/files/000/dataset_547.dat automatically. The user never sees the file names anyway. Are you asking about how to control the description/caption shown to the user in Galaxy? Peter
ADD REPLYlink written 6.8 years ago by Peter Cock1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 168 users visited in the last hour