DP>10 | DP<10 | |
local sensitive variants | 68398 | 922254 |
very local sensitive variants | 71153 | 904490 |
As the quality of sequence is good why i am getting less variants in DP>10?
DP>10 | DP<10 | |
local sensitive variants | 68398 | 922254 |
very local sensitive variants | 71153 | 904490 |
As the quality of sequence is good why i am getting less variants in DP>10?
Hello,
The header of your vcf file will define what the info attribute DP means. It will look similar to this:
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth (only filtered reads used for calling)">
In short, it is read depth at the variant position. If the depth is permitted to be shallower, then the number of calls will increase. This is an interpretation across the two original variant calling parameter settings.
If you are wondering why there are less "very local sensitive variants" compared to "local sensitive variant" with "DP <10", it is probably because the "very local" option is more selective to start with (resulting in fewer calls with depth < 10).
Hopefully this helps, Jen, Galaxy team
Thank u jen... But my question is that quality of the sequence is good but why I am getting more variants in Dp<10 compared to DP>10
I'll try to explain a bit more.
When coverage is used as a filter, regions that do not meet that coverage threshold will be excluded. Especially for short regions corresponding to variant positions, regions with deeper coverage would be expected to be less numerous than regions with shallow coverage. High quality sequence (reads that map better) should increase the number of regions with both shallow and deep coverage. Whether or not those shallow and deep regions contain a variant is an independent factor. True variants exist at fixed positions regardless of where the reads that are attempting to detect them happen to map in depth (this is probably true to some extent even with highly targetted sequencing). That said, coverage can help reduce the noise from poor base calls or be used to filter out variants that are only supported by a few reads (and are therefore assumed to be less robust calls, but that is relative to the experiment as a whole). It is entirely possible for quality calls to be derived from shallow regions.
How to filter is up for you to decide. It seems that you believe that coverage is not a big factor due to the quality of the sequence. You might be looking for calls that are challenging to detect (perhaps are in a difficult to sequence region) and/or your experiment can tolerate some amount of incorrect calls based on sparse evidence as a way to enhance discovery. If this is true, then deciding if a call should be retained based on coverage might not be the best filter (or not the best stand-alone filter). Coverage is just one piece of evidence.
Ps: I understand that with higher quality sequence there is an expectation that only valid variants supported by many reads would pass the other initial criteria used to make the calls. But shallow regions could generate calls with some statistical significance (meaning, enough significance to be reported) with certain tools/settings. In your case, the initial call criteria seems to have been permissive enough (sensitive enough) to include shallow coverage regions as significant.
Thank you jen for your patience.. i want some clarification that if possible.
we are calling variants for clinical sample, in this case we are trying to apply filters like DP and GQ. We got variants, for DP<10 (118488) and for DP>10 (48470 variants).
1.If a filter DP>10, almost 60% variants will be loss so is it k for clinical samples? or
2. DP<10 as a filter for our samples is it good?