3.8 years ago by
United States
Hello,
I am assuming that you are working on the public Main Galaxy server at http://usegalaxy.org. Although most of this advice could apply when running within Galaxy anywhere (or even command-line).
The order of the sequence data is not a factor with this tool. And to let you know, if you use Tophat2, the first output dataset will provide additional statistics directly from the tool. Perhaps try this if you are using "Tophat for Illumina" currently?
Performing too much QA/QC can impact paired rates when there is not enough of the sequence data left that meets the mapping criteria set on the Tophat/Tophat2 tool form. Double check the parameters against the sequences content (length, quality). In particular, I am wondering if the option "Minimum length of read segments:" is set too high. This is found under "full parameters". The value needs to be no less that one half of the total length of the reads in order to map without bias. This is true for Tophat and Tophat2 (it is not clear which you are using). The other parameters can be reviewed in the tool's 3rd party documentation - the link is on the tool form.
Other items to consider:
1. Are you certain that the sequences have correctly scaled quality scores and the appropriate "datatype" assigned? This generally presents as lower mapping rates and pairs. You want the data in "fastqsanger" format. This wiki explains how to check, rescale if necessary, and set the metadata:
http://wiki.galaxyproject.org/Support#FASTQ_Datatype_QA
2. Perhaps try a run with data that has less aggressive trimming (and other manipulations) done prior to mapping. Just trimming at a sequence quality of 20 is good place to start and run a test mapping job. The goal with this tool set is to get as much of the data mapped as possible. Less manipulation is generally better.
Best, Jen, Galaxy team