We have developed a workflow using a local galaxy server instance including the following steps:
- Input dataset collection
- Trim Galore (trimming)
- Tophat (alignment) ...
The workflow starts running fine and when step 2 finishes, the output files generated using Trim Galore are stored as .dat files on the galaxy instance. Galaxy considers the .dat extension for all files to be stored on disk. The ".dat" file is compressed in this case and when Tophat starts running, it generates an error as it is not able to recognize that this is only a ".gz" file which it can handle.
Tophat tool execution generates the following messages:
[2018-07-31 15:14:47] Beginning TopHat run (v2.1.0) ----------------------------------------------- [2018-07-31 15:14:47] Checking for Bowtie Bowtie version: 18.104.22.168 [2018-07-31 15:14:47] Checking for Bowtie index files (genome).. [2018-07-31 15:14:47] Checking for reference FASTA file [2018-07-31 15:14:47] Generating SAM header for /home/bioinfotools/galaxy/tool-data/cJaponicaGenome/bowtie2_index/cJaponicaGenome/cJaponicaGenome **Error: cannot determine record type in input file /home/bioinfotools/galaxy/database/files/001/dataset_1100.dat**
Internally, if the .dat files are de-compressed or the extension is just changed to .gz, Tophat works fine. However, what is the best practice to fix this issue?
I have came through this post which requires certain changes to the source code. While this is a possible workaround, it may affect future updates of the local instance.
Please let me know if there might be any other way to overcome such issue.