We are experimenting with setting up our own cloud-based Galaxy server using the Cloudman tools and a 1 year grant of AWS credits we received recently. We've managed to stand up the server and are doing some testing. In the past, when we upload fastq data derived from Illumina sequencing runs to the PUBLIC usegalaxy.org) suite of tools, they seem to be immediately available to all the applications we wish to use, including Trinity. No conversion required. In contrast, in the private Galaxy sever we established, when we upload the SAME FILES, they are not recognized by any of the tools. All the tools seem to be insisting on "fastqsanger" files. We've tried changing the file type, renaming the files with a ".fastqsanger" extension, etc, but to no avail. The only thing that works is a multi-step process of using Fastq.info to convert first to a fasta file and a qual file and then use "make fastq" to convert it to an "official" fastq file. Is there some customization of the PUBLIC site that does this to files upon upload? Why is this necessary in the private server? Is there a tool we can install so this happens automatically? Apologies! If it isn't obvious already, we're not experts.
I think I understand it now. tldr: your inconsistencies should go away if you update your Galaxy to 17.09 or 18.01.
Since 17.09 the default fastq format has been changed to fastqsanger - which is what all new data should have and this is why you see different behavior on Main (which is on 18.01). The easiest way to handle this on your version of Galaxy is to properly set the datatype on upload (to fastqsanger or fastqsanger.gz). You can also change the datatype of already uploaded datasets using the 'pencil' icon menu.
You do not need to use fastqGroomer tool (that is mostly just useful if you have non-fastqsanger data which at this point would be ~8 years old)
That worked! I have a smallish fastq.gz file that I've successfully uploaded and analyzed in the past via the Main Galaxy instance at usegalaxy.org. Using this file as a test subject, I uploaded it to our private Cloudman Galaxy instance, setting the file type to "fastqsanger" in the upload tool interface (rather than "Autodetect") and that seems to works. Interestingly, even though it's a .gz compressed file, if I set it to fastqsanger.qz, rather than simply "fastqsanger" it doesn't work. Only works if I call the compressed file "fastqsanger". It unpacks almost immediately after appearing in the history.
Here is my next question (keeping in mind we're all a bunch of molecular biologist struggling to get this done while simultaneously looking to hire a bioinformatician): How does one update the version of Galaxy we're running? We've been following instructions on the Galaxy Cloudman pages for initial set up and use, but I haven't found anything about updating that doesn't require some command line coding. I'm capable of some of that, but don't want to get too far out ahead of myself.
REALLY appreciate your help, by the way!
I gave this a try and was initially impressed by the interface and excited to use the latest release of Galaxy; however, after getting it set up, I immediately encountered an error "right out of the box". The Cluster info log is reporting over and over the following error:
R_COMM channel basic get exception: [Errno 32] Broken pipe
This seems to be preventing me from spooling up and using a spot instance as a worker node. The main Galaxy node is running, but that's it.