What is the best mapping algorithm to use? My reference genome is about 70,000 sequences, each 20 bp long. My reads are mostly longer, > 100 bp. Thank you.
Hello,
Mapping reads against genome/transcriptome/exon fragments, especially when the reads are longer than the target, is not going to work with NGS mapping tools like HISAT2, BWA, etc.
You could try Blast+ blastn instead. I am not sure if makeblastdb will work with this many target sequences as implemented at Galaxy Main https://usegalaxy.org, and using a fasta from the history will almost certainly be problematic.
You'll need to test it out, tune parameters, retest. Tools can run into memory problems with this type of target database with the resources available at Galaxy Main. If you do run into memory errors, those do not need to be reported as a bug - it would be an expected failure.
Galaxy tutorials: https://galaxyproject.org/learn/
Thanks! Jen, Galaxy team