Question: Rna-Seq Galaxy Workflow For Pe Barcoded Samples?
Whyte, Jeffrey20 wrote:
Hello, I posted to the seqanswers forum, but have not received any feedback. I am working with RNA-seq Illumina data files in Galaxy ( The two files are 100bp paired-end reads, multiplexed with barcoding to distinguish samples. The barcodes are the first four bases of the sequences in the s_7_1_sequence.txt file. Would the following Galaxy workflow be correct? 1. Upload both s_7_1_sequence.txt and s_7_2_sequence.txt to Galaxy with the reference genome selected 2. Run NGS: QC and manipulation --> FASTQ Groomer on each file to convert to Sanger FASTQ 3. Run NGS: QC and manipulation --> FASTQ joiner to combine the data from the two files 4. Run FASTX-TOOLKIT FOR FASTQ DATA --> Barcode Splitter to generate separate FASTQ files for each barcode group 5. Run NGS: RNA Analysis --> Tophat to map the reads from each group to the reference genome The problem I am having is that if I select paired-end for the library in Tophat, it requests two FASTQ files. Would I have to use FASTQ Splitter to separate the joined FASTQ files? If there is a more standard way to handle these types of barcoded files, I would appreciate hearing about this workflow. Thanks very much in advance, jjw P.S. Galaxy is an incredibly useful resource. Thanks!
Jennifer Hillman Jackson24k wrote:
Hello Jeffrey, Yes, you have this correct, please use the Barcode splitter/Splitter tool as you describe. Creating a workflow (if you haven't already) from your history after running on one dataset would be a way to simplify running the same analysis on future datasets. Apologies for the delay in reply, Best, Jen Galaxy team -- Jennifer Jackson
