Question: Multiple FASTQ files per replicate
1
gravatar for nashedm
3.5 years ago by
nashedm10
United States
nashedm10 wrote:

Hi There,

I'm running an RNA-seq analysis using FASTQ data from an Illumina HiSeq Rapid V2 machine (single reads at 50bp). I don't have experience with UNIX coding so I am using Galaxy, and specifically the Tuxedo applications to align/map my reads, then preform differential analysis (probably with Cuffdiff). 

For my data, I have 3 conditions with 6 biological replicates in each. In addition each biological replicate was run on 2 lanes so I have 2 technical replicates per sample. In addition to all that, the way I received data for each technical replicate was in 2 separate FASTQ files. The technician mentioned something about the machine automatically creating a new file when it hits about 200mb or something.

So I'm wondering about the proper method of combining these files. First of all, how do I combine the 2 FASTQ files for each technical replicate? I imagine this has to be done early prior to mapping with Tophat? Secondly, at which step should I combine the technical replicates? At the differential expression analysis step I should only be comparing biological replicates- including separate technical replicates would be pseudoreplication. So I imagine combining my technical replicates happens prior to the Cuffdiff step, but I'm not sure if this happens prior to mapping or after.

Any help would be greatly appreciated!

rna-seq tophat alignment galaxy • 2.8k views
ADD COMMENTlink modified 3.5 years ago by Jennifer Hillman Jackson25k • written 3.5 years ago by nashedm10
3
gravatar for Jennifer Hillman Jackson
3.5 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

Merge multiple fastq files representing a single sample by using the tool Concatenate datasets tail-to-head. It is okay to do QC first (in order to narrow down where lab issues may have occurred), but then merge before doing anything else. These are paired end? Merge the forward datasets together, then merge the reverse datasets together. Run each sample's pair through a mapping tool like Tophat.

More about RNA-seq is in the Galaxy wiki and many other places, including the home page for the Tuxedo pipl line. 

Best, Jen, Galaxy team

ADD COMMENTlink written 3.5 years ago by Jennifer Hillman Jackson25k

Thanks Jen,

These are single-end sequences.

I clarified with the technician and she would not treat sequences from 2 lanes for one sample as technical replicates (because data was not collected twice from the same prep). Rather, she said to treat the data from the 2 lanes as subsets of the same sequence. So they should be combined cumulatively such that if each lane produced 5million reads, the combined sequence would be 10million reads. What tool would be appropriate to use to perform this type of merge? Would it be the "concatenate two datasets into one dataset" option under "operate on genomic interval"? 

Thanks Again.

 

 

ADD REPLYlink written 3.5 years ago by nashedm10

Thanks Jen,

These are single-end sequences.

I clarified with the technician and she would not treat sequences from 2 lanes for one sample as technical replicates (because data was not collected twice from the same prep). Rather, she said to treat the data from the 2 lanes as subsets of the same sequence. So they should be combined cumulatively such that if each lane produced 5million reads, the combined sequence would be 10million reads. What tool would be appropriate to use to perform this type of merge? Would it be the "concatenate two datasets into one dataset" option under "operate on genomic interval"?

Thanks Again.

ADD REPLYlink written 3.5 years ago by nashedm10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 175 users visited in the last hour