Question: Megablast
Scott Tighe200
Hi Galaxy users When Magablasting 1)....what does the "identity value -p" mean it percent identity? I want my megablast results to be reported form only a 100% match. I do not see a place for % alinement concordance. 2) form my Illumina Hiseq reads, are the adaptor sequences filtered during the filter step?
galaxy • 913 views
Hello Scott, For #1, option "-p": Here is a link to some megablast parameter documentation online: (the primary paper for the Galaxy tool is noted at the bottom of the tool form, but this is convenient) Quote: Table 3.30 Parameter -p Function Specifies the percentage identity cut-off Default 0 Input format [Real] Example To set percent id cutoff to 75%, use: -p 75 Note: The input value range is between 0 and 100, with 0 meaning no cutoff. It only works on the aligned region or individual HSPs. For #2, there are a few ways to interpret filter. If you mean will megablast consider the adapter part of the sequence in calculations, the answer is that it does for some and doesn't for others. The part of the sequence that is adapter wouldn't align to the genome, and percent identity is only based on HSPs (high scoring pairs - one part of the pair is the DNA query and the other is the genome target, for that alignment region only). So, adapter sequence wouldn't be involved in percent identify calculations (or be expected to!). But, these unaligned regions could become a problem if coverage or certain other statistics were part of your analysis. Learning about the statistics you choose to use, to see if query length is part of the calculation, will let you know if clipping is necessary. If important, removing adapters can be done with tools in "NGS: QC and manipulation" (perform a tool search on keywords "trim" or "clip". Best, Jen Galaxy team -- Jennifer Jackson
