Question: Rna-Seq Galaxy Workflow For Pe Barcoded Samples?
0
Whyte, Jeffrey • 20 wrote:
Hello,
I posted to the seqanswers forum, but have not received any feedback.
I am working with RNA-seq Illumina data files in Galaxy
(http://main.g2.bx.psu.edu/). The two files are 100bp paired-end
reads, multiplexed with barcoding to distinguish samples. The barcodes
are the first four bases of the sequences in the s_7_1_sequence.txt
file.
Would the following Galaxy workflow be correct?
1. Upload both s_7_1_sequence.txt and s_7_2_sequence.txt to Galaxy
with the reference genome selected
2. Run NGS: QC and manipulation --> FASTQ Groomer on each file to
convert to Sanger FASTQ
3. Run NGS: QC and manipulation --> FASTQ joiner to combine the data
from the two files
4. Run FASTX-TOOLKIT FOR FASTQ DATA --> Barcode Splitter to generate
separate FASTQ files for each barcode group
5. Run NGS: RNA Analysis --> Tophat to map the reads from each group
to the reference genome
The problem I am having is that if I select paired-end for the library
in Tophat, it requests two FASTQ files. Would I have to use FASTQ
Splitter to separate the joined FASTQ files? If there is a more
standard way to handle these types of barcoded files, I would
appreciate hearing about this workflow.
Thanks very much in advance,
jjw
P.S. Galaxy is an incredibly useful resource. Thanks!
ADD COMMENT
• link
•
modified 7.4 years ago
by
Jennifer Hillman Jackson ♦ 25k
•
written
7.6 years ago by
Whyte, Jeffrey • 20