2.5 years ago by
United States
Hello,
The numbers look a bit low. It could be a known data quality issue. Run FastQC to check first. If very poor, you could check with the data authors to see if a known.
Overzealous trimming can also lead to poor mapping by eroding the sequence content. Perhaps try mapping a sample without any QA. Then add some QA back in and test to see which QA increases/decreases alignment rates.
But, this could also be because of a datatype assignment issue (I see this quite often). Fastq data must have quality scores scaled appropriately and be assigned the datatype "fastqsanger". Many times when this comes up, the input type to the Fastq Groomer tool was not a match for the data or fastqsanger was directly assigned to another fastq type (such as fastqilluminia). Here is how to check your sequences and assign the correct type/scaling as needed: https://wiki.galaxyproject.org/Support#FASTQ_Datatype_QA
And finally, if the sequences are less than 50 bases long, there is a specific parameter that can be modified in Tophat to remove bias and potentially help with alignment rates. The value should be at least one half of the length of the shortest sequence. (Yet beware about dropping this too low - instead just accept that any very short sequences not twice this value may not map and factor that in when reviewing mapping rates).
TopHat settings to use > Full parameter list > Minimum length of read segments
Best, Jen, Galaxy team