Question: Pileup file using NCBI-SRA toolkit
gravatar for ppurkayastha2010
7 months ago by
ppurkayastha201030 wrote:

I have a pileup format file for a particular accession SRR1011475.

This pileup format file was obtained using NCBI sratoolkit.

MT_H37RV_BRD_V5 750 C   1   ^~, #
MT_H37RV_BRD_V5 751 C   1   ,   #
MT_H37RV_BRD_V5 752 G   1   ,   #
MT_H37RV_BRD_V5 753 C   1   ,   #
MT_H37RV_BRD_V5 754 G   1   ,   #

But I am unable to figure out which reference sequence was used to build up this pileup format file?

When I downloading the bam file for this particular accession SRR1011475 and try to generate the pileup file format using [Generate pileup from BAM dataset (Galaxy Version 1.1.2), using samtool], with my reference then I get N bases.

MT_H37RV_BRD_V5 750 N   1   ^~c #   ~
MT_H37RV_BRD_V5 751 N   1   c   #   ~
MT_H37RV_BRD_V5 752 N   1   g   #   ~
MT_H37RV_BRD_V5 753 N   1   c   #   ~
MT_H37RV_BRD_V5 754 N   1   g   #   ~

why is this difference coming in both the files? please suggest a way

ngs sratoolkit pileup galaxy • 301 views
ADD COMMENTlink modified 7 months ago by Jennifer Hillman Jackson25k • written 7 months ago by ppurkayastha201030
gravatar for Jennifer Hillman Jackson
7 months ago by
United States
Jennifer Hillman Jackson25k wrote:


For your workflow, a step is missing (genome mapping). BAM datasets from this source contain reads only - no mapping results. Some Galaxy wrapped tools accept BAM sequence inputs but most do not. Download the Fastq data instead directly with the tool Download and Extract Reads in FASTA/Q format from NCBI SRA, map yourself with BWA MEM -- using a custom genome as needed, then run Mpileup.

Support FAQs:

Galaxy tutorials:

For the data from NCBI: You could ask NCBI to be certain, but the genome the data was mapped to is probably the one mentioned on the Sample description form.

Be aware that if you are not using that same exact genome for mapping, and the same exact tool versions/parameters (whatever tools they used to map with and generate the pileup data), the results will differ. I didn't find the processing description at NCBI for this one with a quick search, but you could review the study and any related publications in more detail. Or ask the authors about the workflow/tools used. That is the only way to truly replicate any experiment. This is one of many good reasons to do the data processing yourself in Galaxy -- you'll have a complete history of what tools/parameters were used, be able to tune parameters to get the best result, and can extract what you did into a workflow for later reuse or to share it with others.

Thanks, Jen, Galaxy team

ADD COMMENTlink modified 7 months ago • written 7 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 175 users visited in the last hour