Filtering BAM files from HISAT2

HISAT2 summary stats: Total pairs: 32249562 Aligned concordantly or discordantly 0 time: 3651029 (11.32%) Aligned concordantly 1 time: 26084216 (80.88%) Aligned concordantly >1 times: 1290968 (4.00%) Aligned discordantly 1 time: 1223349 (3.79%) Total unpaired reads: 7302058 Aligned 0 time: 3915231 (53.62%) Aligned 1 time: 2996094 (41.03%) Aligned >1 times: 390733 (5.35%) Overall alignment rate: 93.93%

4 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

The alignment rates are very good, including the low number of unpaired. The important parts (usually) are these metrics:

Aligned concordantly 1 time: 26084216 (80.88%)

Overall alignment rate: 93.93%

For more details about evaluating NGS reads in general and the types of filtering/QA to do for specific analysis workflows are covered in the Galaxy tutorials here, along with links to external resources (publications, discussions):

https://galaxyproject.org/learn/
Start here >> NGS logistics - this is an introduction to Galaxy's functionality for the analysis of Next Generation Sequencing data. https://galaxyproject.org/tutorials/ngs/

Hope that helps! Jen, Galaxy team

ADD COMMENT • link written 4 months ago by Jennifer Hillman Jackson ♦ 25k

Thanks for the help! That answers the first part of my question. However, I have looked at those tutorials and nothing there addresses the second part of my question. Is it appropriate to skip trimming and filter bam files based on mapq scores after alignment? Pro and cons of filtering out unpaired reads and multiple-mapped reads. How to filter out multiple-mapped reads in SAM-tools?

ADD REPLY • link written 4 months ago by dexter.myrick • 40

Yes, those portions are more of a judgment-based decision and one answer is not definitive across all analysis workflows. Still, here is a bit more info:

Trimming versus post alignment filtering: Trimming can help sequences to get aligned but is not always necessary (sequences that are all/mostly artifact would fall out during alignment anyway). Maybe run FastQC on the original data, then Trimmomatic > FastQC on the same data, compare FastQC results, then map both and compare the alignment rates/quality.
Filter by MAPQ: Yes, do this, especially if calling variants. How-to in the context of example analysis is covered in the variant analysis tutorials.
Unpaired reads: Some tools consider these during execution and some do not. Others require that the inputs are strictly paired to start with. For tools that utilize unpaired, these orphan reads can produce spurious results. Now, sometimes that is Ok, for example: one is data mining in a specific region and all available evidence is wanted for a human to review and make decisions about the result. Again, you could try both with whatever tools you are using and compare the differences.
Multi-mapped reads: Alignment tools retained multiple hits because each is considered just as "good" as the others (if only primary alignments are reported, more below). Try filtering by properly paired mapped reads with Filter BAM (and other features, if desired. Tool: NGS: SAMtools >> Filter SAM or BAM, output SAM or BAM files on FLAG MAPQ RG LN or by region

If an unpaired read is multi-mapping, this could be an example of what is probably a spurious result, e.g. non-specific hit with only one evidence point (where paired-data has two to start with, then if properly paired a third). Properly paired reads that have more than one hit are mapping to a duplicated (or near duplicated) genome region. The Filter SAM or BAM tool can filter on the bitwise flags - example: only pick the primary alignment(s). You can mark duplicates with the NGS Picard >> Mark Duplicates tool (covered in the variant analysis tutorials).

HISAT2 is a good mapper for most use cases. See the advanced options, especially Reporting options >> Primary alignment. Secondary alignments are already filtered out by default (but that can be adjusted).

ADD REPLY • link modified 4 months ago • written 4 months ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »