Hi,
I am regularly uploading huge gzipped files (WGS fastq data) to our local instance of Galaxy. The software that we have wrapped in Galaxy can deal with gzipped fastq data, so there is no need to decompress them.
By default, gzipped files will get extracted during the import, but since these are large files I'm importing them as linked datasets anyway, which also prevents them from being decompressed.
However, Galaxy still tries to figure out the format of the imported datasets, so even though I am just trying to generate links, it inspects the contents of the file to auto-detect the format, which makes the import very slow.
In the end, it decides that it can't do anything with the dataset, sets its format to "data", but also removes the .gz extension from its name.
This naming issue can be solved afterwards by just editing the dataset information and "data" is ok for me as format, but isn't it possible somehow to:
1) declare the file format as "data" right-away instead of going through auto-detection ? ("data" is not offered as a format in the import dialogue, and declaring some arbitrary type just to avoid auto-detection seems odd)
2) prevent Galaxy from stripping the '.gz' from the file name ?
or alternatively:
Could you define a new format that when selected prevents Galaxy from doing anything with the dataset and just makes it import the file as is?
Thanks for your help,
Wolfgang