Combining Reads from 2 Lanes

Question: Combining Reads from 2 Lanes

3.5 years ago by

United States

nashedm • 10 wrote:

Hi,

I'm running an RNA-seq analysis to look for differentially expressed genes. I'm using sing-end reads at 50bp from an an Illumina HiSeq Rapid V2 machine.

I have 3 conditions with 6 biological replicates in each. Also, for each biological replicate, I have 2 FASTQ files, one from each lane. I guess these are akin to technical replicates, except that each lane has unique reads that are cumulative. This is from the technician that ran the sequencing, for clarification:

"We loaded a single tube of pooled libraries on the HiSeq, which then deposits equal volumes on each lane, and the library fragments hybridize randomly across the surface. The two lanes are more like subsets of the whole dataset than replicates of each other."

Therefore, suppose the FASTQ file from lane 1 contains 5million reads, and the file from lane 2 contains another 5million reads, the idea is that the combined file should contain 10million cumulative reads.

I'm wondering what is the appropriate way (and at what step) should I combine these data. From what I understand, I should do quality control on the separate lane files first. But after that, I'm not sure if I should combine (somehow) and then align with Tophat, etc. or if should align and create my BAM files on Cufflinks first, then combine the data from each lane for each sample (using Sam Tools > Merge BAM files?).

Any help would be appreciated.

Thanks

rna-seq tophat galaxy samtools bam • 2.8k views

ADD COMMENT • link •

modified 3.5 years ago by Philipe Moncuquet • 40 • written 3.5 years ago by nashedm • 10

3.5 years ago by

Philipe Moncuquet • 40

Australia

Philipe Moncuquet • 40 wrote:

Hi,

If you have no interest in the technical replication itself I would suggest to merge files at the very beginning of your analysis. You could do so with something as simple as 'Concatenate datasets' in the 'Text manipulation' section I guess. Happy to be corrected.

Philip

ADD COMMENT • link written 3.5 years ago by Philipe Moncuquet • 40

Thanks for the reply, Philip.

I'm not sure that's right though. The concatenate datasets tool merges "tail to head". I actually used this tool as well because I for each lane I had multiple FASTQ files that were basically continuations of each other- as the machine sequenced it created separate files after every 800mb of data for some reason so samples had 2-5 files for each lane. So I think the application for that tool is more to combine sequences that follow each other. In my case with the 2 lanes, I have 2 full sequences that I basically want to sum. So I imagine this needs a different kind of merging?

Mina

ADD REPLY • link written 3.5 years ago by nashedm • 10

Similar posts • Search »