Question: Combining Reads from 2 Lanes
gravatar for nashedm
3.5 years ago by
United States
nashedm10 wrote:


I'm running an RNA-seq analysis to look for differentially expressed genes. I'm using sing-end reads at 50bp from an an Illumina HiSeq Rapid V2 machine. 

I have 3 conditions with 6 biological replicates in each. Also, for each biological replicate, I have 2 FASTQ files, one from each lane. I guess these are akin to technical replicates, except that each lane has unique reads that are cumulative. This is from the technician that ran the sequencing, for clarification:

"We loaded a single tube of pooled libraries on the HiSeq, which then deposits equal volumes on each lane, and the library fragments hybridize randomly across the surface. The two lanes are more like subsets of the whole dataset than replicates of each other."

Therefore, suppose the FASTQ file from lane 1 contains 5million reads, and the file from lane 2 contains another 5million reads, the idea is that the combined file should contain 10million cumulative reads.

I'm wondering what is the appropriate way (and at what step) should I combine these data. From what I understand, I should do quality control on the separate lane files first. But after that, I'm not sure if I should combine (somehow) and then align with Tophat, etc. or if should align and create my BAM files on Cufflinks first, then combine the data from each lane for each sample (using Sam Tools > Merge BAM files?).

Any help would be appreciated.



rna-seq tophat galaxy samtools bam • 2.8k views
ADD COMMENTlink modified 3.5 years ago by Philipe Moncuquet40 • written 3.5 years ago by nashedm10
gravatar for Philipe Moncuquet
3.5 years ago by
Philipe Moncuquet40 wrote:



If you have no interest in the technical replication itself I would suggest to merge files at the very beginning of your analysis. You could do so with something as simple as 'Concatenate datasets' in the 'Text manipulation' section I guess. Happy to be corrected.



ADD COMMENTlink written 3.5 years ago by Philipe Moncuquet40

Thanks for the reply, Philip. 

I'm not sure that's right though. The concatenate datasets tool merges "tail to head". I actually used this tool as well because I for each lane I had multiple FASTQ files that were basically continuations of each other- as the machine sequenced it created separate files after every 800mb of data for some reason so samples had 2-5 files for each lane. So I think the application for that tool is more to combine sequences that follow each other. In my case with the 2 lanes, I have 2 full sequences that I basically want to sum. So I imagine this needs a different kind of merging? 


ADD REPLYlink written 3.5 years ago by nashedm10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour