Can someone explain to me the best way to prepare NCBI SRA data for subsequent mapping to a custom genome? I uploaded WGS paired-end sequence reads (8 files) from NCBI SRA directly into Galaxy Main by choosing the FastQ format/option.
Since each file contains both Forward (F) and Reverse (R) reads, as shown below, 1) is it necessary to separate the reads into separate F and R folders so as to be able to successfully map these reads to my custom genome using BWA-MEM within Galaxy? 2) If so, how do I do this? After reading this 4.5 year-old thread https://biostar.usegalaxy.org/p/4988/ on a similar topic, I tried to split the reads within Galaxy, using this information, but I have just cancelled the job since it had been running for >16 hours to process just one file.
@1/1
GATTCCAGCAAAGCACTCCCAAGGGGGCCTGACAGTGGTCAAGAGAA
+SRR5110008.1 1 length=151
AAAFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
@1/2
AATCAGTCCTGGCTGGTGTTAAGCCCTCAGGGGCAGGAGGGTGAAGT
+SRR5110008.1 1 length=151
AAFFFJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
@2/1
AATAAAATTTTTAAAAAGTTATAAAGGAATACCTTTTCCAAAAGACC
+SRR5110008.2 2 length=151
AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
@2/2
TGTACGGAAAAGGGTCAGGACCTTCTCTAGACTGGGAGTTGCAAGCT
+SRR5110008.2 2 length=150
AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
@3/1
TGAAGTTGAGAGGGATCCATGGAAAGAGCTGGCATTCTCACTGTGAA
+SRR5110008.3 3 length=151
AAAFAFAFJF<fja7-fj7faa-f-f-fffaff-fjjjjjjfjjjjj <br="">
@3/2
AAAGAAGGAAACACATATACCTGGCTTCTGTCAACTTAGCTAAGCTG
+SRR5110008.3 3 length=151
(The reads are ~ 150 bp but I truncated them from the right) Thanks in advance.