Hi There,
I'm running an RNA-seq analysis using FASTQ data from an Illumina HiSeq Rapid V2 machine (single reads at 50bp). I don't have experience with UNIX coding so I am using Galaxy, and specifically the Tuxedo applications to align/map my reads, then preform differential analysis (probably with Cuffdiff).
For my data, I have 3 conditions with 6 biological replicates in each. In addition each biological replicate was run on 2 lanes so I have 2 technical replicates per sample. In addition to all that, the way I received data for each technical replicate was in 2 separate FASTQ files. The technician mentioned something about the machine automatically creating a new file when it hits about 200mb or something.
So I'm wondering about the proper method of combining these files. First of all, how do I combine the 2 FASTQ files for each technical replicate? I imagine this has to be done early prior to mapping with Tophat? Secondly, at which step should I combine the technical replicates? At the differential expression analysis step I should only be comparing biological replicates- including separate technical replicates would be pseudoreplication. So I imagine combining my technical replicates happens prior to the Cuffdiff step, but I'm not sure if this happens prior to mapping or after.
Any help would be greatly appreciated!