I am using hisat2 for alginment and then take the aligned file as htseq-count input for counting. In this case, is it preferable to set --dta parameter in hisat2 (all other parameters are default)? I know this parameter is designed for transcriptome assembly and require longer anchor for novel splice junctions, but I am not sure how it affects results on rnaseq counting tools such as htseq-count.
My guess if that since htseq-count uses annotation for counting extensions beyond feature coordinates should not affect the result.
In the HISAT2 manual, it says setting --dta will leads to "to fewer alignments with short-anchors" I performed two runs, with and without setting --dta. The htseq-count results look highly correlated, but not exactly the same. By setting --dta, the alignment slightly drops from 89.43% to 88.58%.
I was wondering since HISAT2 authors recommend using --dta for down-streaming assembly (though mainly for computational and memory considerations as they mentioned in the manual), should it be "better" if we make it consistent by keeping --dta when running htseq-count (even if we are counting against known GTF files and not doing transcriptome assembly)? I am building a pipeline for transcriptome analysis and I have two sub-work flows for gene/isoform counting, which are: 1. hisat2->htseq-count->desqe2 (mainly for gene level analysis) 2. hsiat2->stringtie->ballgown (isoform analysis) I just want to know if the behavior of hisat2, which is optimized for stringtie (set --dta), is also suitable for htseq-count. If this is true, I can share the alignment files between two pipelines. From my personal observation, the difference seems to be very small (way smaller than the difference between two different alignment algorithms). Just want to know anyone has more theoretical advices.
Thank you