Question: Separating (F and R) reads uploaded from NCBI SRA & mapping them using BWA-MEM
0
gravatar for tendai
4 months ago by
tendai20
United States
tendai20 wrote:

Can someone explain to me the best way to prepare NCBI SRA data for subsequent mapping to a custom genome? I uploaded WGS paired-end sequence reads (8 files) from NCBI SRA directly into Galaxy Main by choosing the FastQ format/option.

Since each file contains both Forward (F) and Reverse (R) reads, as shown below, 1) is it necessary to separate the reads into separate F and R folders so as to be able to successfully map these reads to my custom genome using BWA-MEM within Galaxy? 2) If so, how do I do this? After reading this 4.5 year-old thread https://biostar.usegalaxy.org/p/4988/ on a similar topic, I tried to split the reads within Galaxy, using this information, but I have just cancelled the job since it had been running for >16 hours to process just one file.

@1/1
GATTCCAGCAAAGCACTCCCAAGGGGGCCTGACAGTGGTCAAGAGAA
+SRR5110008.1 1 length=151
AAAFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
@1/2
AATCAGTCCTGGCTGGTGTTAAGCCCTCAGGGGCAGGAGGGTGAAGT
+SRR5110008.1 1 length=151
AAFFFJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
@2/1
AATAAAATTTTTAAAAAGTTATAAAGGAATACCTTTTCCAAAAGACC
+SRR5110008.2 2 length=151
AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
@2/2
TGTACGGAAAAGGGTCAGGACCTTCTCTAGACTGGGAGTTGCAAGCT
+SRR5110008.2 2 length=150
AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
@3/1
TGAAGTTGAGAGGGATCCATGGAAAGAGCTGGCATTCTCACTGTGAA
+SRR5110008.3 3 length=151
AAAFAFAFJF<fja7-fj7faa-f-f-fffaff-fjjjjjjfjjjjj <br=""> @3/2
AAAGAAGGAAACACATATACCTGGCTTCTGTCAACTTAGCTAAGCTG
+SRR5110008.3 3 length=151
(The reads are ~ 150 bp but I truncated them from the right) Thanks in advance.

ADD COMMENTlink modified 4 months ago by scholtz60 • written 4 months ago by tendai20
0
gravatar for scholtz
4 months ago by
scholtz60
Hungary
scholtz60 wrote:

This is what I've been using, following the suggestion of the Galaxy team - you will end up with separate forward/reverse data files:

  1. Upload the SRR ID as a "list", (tabular file) using the NCBI SRA Tools - Extract reads in FASTQ/A format. Input type: List of SRA accession, one per line

    1. Once uploaded, click on the "hide hidden" in the History panel

    2. Unhide the forward/reverse data files

Proceed with the analysis as usual.

Hope this helps,

Beata

ADD COMMENTlink written 4 months ago by scholtz60

Thanks! This solved my problem.

ADD REPLYlink written 4 months ago by tendai20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 82 users visited in the last hour