Question: Suppress Reporting Hit Number
Hsin-l (Sam) Chiang wrote:
Hi, I used the Megablast function (in the NGS: Mapping\ROCHE-454\) to analyze my FASTA sequences against nt database and it worked fine for me. However, it generated 56,804 hits although my query has only 1000 sequences. I am wondering is there any way to suppress the number of reported alignments to just one best hit per sequence? (In the local BLAST there are parameters such as -K1 -v 1 -b 1 to do so, but I can't find similar options in Galaxy). Many thanks! Sam
United States
Jennifer Hillman Jackson wrote:
Hello Sam, When running Megablast, filtering by identity or evalue can help reduce the hits (the default values are all fairly permissive, if you are performing the query vs the same species target genome and the query has been filtered for base calling quality). Filtering out low-complexity would also be a big help, as a guess, considering the number of hits generated from your initial data. There is also the "Parse blast XML output" tool. Modifying the data into interval format would allow the use of the "Operate on Genomic Intervals -> Cluster the intervals of a dataset". This is based on coverage, if that is one of your criteria (could be, if the threshold for identity is a range you consider to be candidate choices for "best"). Identity & coverage are commonly combined to identify "best", but this is just a suggestion. The same type of logic could be used with top scoring evalue matches combined with coverage (would likely be similar as using evalue alone, if the identity is set to be high). The idea to add a filter for "single best" is a good one, but has some complexity associated with it. I will pass it along to the team as an enhancement request to consider. Hopefully this helps! Jen Galaxy team -- Jennifer Jackson
