Question: Depth of Coverage
0
gravatar for mariano.avino
4.2 years ago by
European Union
mariano.avino0 wrote:

I am noticing that DP of those sites that will become eventually later variant calls are very much reduced after certain steps of my workflow (between SAM-to-BAM file and MarkDups file for example) and this is evident also from the fact that the file is reduced from 1.5 Giga becomes 195 mega. I was wondering wether there is an automatic reduction of reads made by Galaxy and wether there is a way to recover the real DP after the variant calling process. Thanks a lot

 

 

 

snp • 1.3k views
ADD COMMENTlink modified 4.2 years ago by Jennifer Hillman Jackson25k • written 4.2 years ago by mariano.avino0
0
gravatar for Jennifer Hillman Jackson
4.2 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hi,

SAM->BAM will reduce file size because of compression, so you can't directly compare based on file size. Skipping the mark dups step will only skew analysis results. Not sure which exact tools/workflow you are using, but you can always run a variant caller that is more direct and compare to the current workflow (such as 'Naive Variant Caller' then 'Variant Annnotator' to filter).

An example that compares variant tools is in this tutorial:
http://usegalaxy.org/u/galaxyproject/p/galaxy-101-ngs-variant

To gather stats (actual counts of reads contained), see the SAMtools and Picard tool groups. You can also try a tool like 'Mpileup' to do a direct count-based variant call but this is for DNA samples. Or run tools like 'Depth of Coverage' or 'Create a BedGraph of genome coverage' to just find coverage of a particular region (all, not just variant locations). These tools are better than examining file size.

Also check original data vs mapped data. If concordant alignment pairs are low after mapping, then that is the root of the issue. Could be valid scientifically, or indicate an issue in processing (meaning, tools or parameters should be tuned). Start by checking back through earlier steps to determine where the data loss was introduced. Then try to confirm if it is an actual property of sample versus an issue with how the data was prepped prior to mapping or during mapping (was QC too much or too little; were the quality scores scaled correctly to .fastqsanger; best mapping tool used with parameters that fit data). Maximizing concordant alignments is key in most NGS analysis (if paired input).

Hopefully this helps, Jen, Galaxy team

ADD COMMENTlink written 4.2 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour