We are experimenting with setting up our own cloud-based Galaxy server using the Cloudman tools and a 1 year grant of AWS credits we received recently. We've managed to stand up the server and are doing some testing. In the past, when we upload fastq data derived from Illumina sequencing runs to the PUBLIC usegalaxy.org) suite of tools, they seem to be immediately available to all the applications we wish to use, including Trinity. No conversion required. In contrast, in the private Galaxy sever we established, when we upload the SAME FILES, they are not recognized by any of the tools. All the tools seem to be insisting on "fastqsanger" files. We've tried changing the file type, renaming the files with a ".fastqsanger" extension, etc, but to no avail. The only thing that works is a multi-step process of using Fastq.info to convert first to a fasta file and a qual file and then use "make fastq" to convert it to an "official" fastq file. Is there some customization of the PUBLIC site that does this to files upon upload? Why is this necessary in the private server? Is there a tool we can install so this happens automatically? Apologies! If it isn't obvious already, we're not experts.
I think I understand it now. tldr: your inconsistencies should go away if you update your Galaxy to 17.09 or 18.01.
Since 17.09 the default fastq format has been changed to fastqsanger - which is what all new data should have and this is why you see different behavior on Main (which is on 18.01). The easiest way to handle this on your version of Galaxy is to properly set the datatype on upload (to fastqsanger or fastqsanger.gz). You can also change the datatype of already uploaded datasets using the 'pencil' icon menu.
You do not need to use fastqGroomer tool (that is mostly just useful if you have non-fastqsanger data which at this point would be ~8 years old)
That worked! I have a smallish fastq.gz file that I've successfully uploaded and analyzed in the past via the Main Galaxy instance at usegalaxy.org. Using this file as a test subject, I uploaded it to our private Cloudman Galaxy instance, setting the file type to "fastqsanger" in the upload tool interface (rather than "Autodetect") and that seems to works. Interestingly, even though it's a .gz compressed file, if I set it to fastqsanger.qz, rather than simply "fastqsanger" it doesn't work. Only works if I call the compressed file "fastqsanger". It unpacks almost immediately after appearing in the history.
Here is my next question (keeping in mind we're all a bunch of molecular biologist struggling to get this done while simultaneously looking to hire a bioinformatician): How does one update the version of Galaxy we're running? We've been following instructions on the Galaxy Cloudman pages for initial set up and use, but I haven't found anything about updating that doesn't require some command line coding. I'm capable of some of that, but don't want to get too far out ahead of myself.
REALLY appreciate your help, by the way!
Sorry, I forgot you are on cloudman. I think cloudman instances are not easily updateable until the new cloudman release is published, so proper setting of datatype is your best option now. fastqsanger.gz is also available only since 17.09 so you have to make due with just
fastqsanger. Galaxy will automatically unpack the archive for you though.
I just learned that you can use the latest GVL image instead of the CloudMan and it has Galaxy 17.09 included - https://launch.usegalaxy.org/catalog/appliance/genomics-virtual-lab
I gave this a try and was initially impressed by the interface and excited to use the latest release of Galaxy; however, after getting it set up, I immediately encountered an error "right out of the box". The Cluster info log is reporting over and over the following error:
R_COMM channel basic get exception: [Errno 32] Broken pipe
This seems to be preventing me from spooling up and using a spot instance as a worker node. The main Galaxy node is running, but that's it.
I should have been more specific... We gave "GVL" at try and it seems to have some hangups right out of the box...
Thanks for reporting this, I made an issue for it at https://github.com/galaxyproject/cloudlaunch/issues/140 so the responsible people can analyse it. Feel free to add any details you know there.
Have replied at the github link but just posting an update here. Initially, we thought that this was a firewall issue, and indeed, there was a slight misconfiguration of the firewall, but that hasn't fixed the problem. It's a rabbitmq connectivity issue, but why it's being triggered is not clear. Still looking into this and will report back.
This should be fixed now. There's a bit more info in the github issue (https://github.com/galaxyproject/cloudlaunch/issues/140 ).
This is interesting, I do not know what is happening yet though. Let me ask you a few questions first:
formatis shown on the dataset in the history?
Note that there is a fastqGroomer tool that can convert between variou fastq formats.
Thanks for the response!
I just discovered the fastqGroomer tool, and that works like a charm. Still, when using the public Galaxy tools @ usegalaxy.org, I don't normally need to do this.
So, to answer your questions:
Basically, it seems to be a problem with files we upload not being designated as fastqsanger files and instead assigned a generic "fastq" data type. It seems necessary to use fastqGroomer convert the datatype to fastqsanger, but when I view a few pages of the generic fastq and fastqGroomer converted to fastqsanger files side-by-side, they look identical. I've also tried changing the file type in the metadata from generic "fastq" to "fastqsanger" manually, but that doesn't seem to fix the problem.
Appreciate you giving this some thought!
Since 17.01 Galaxy understands fastq.gz files and won't try to decompress them on upload (unless you turn it off), since some tools can operate on them directly. However any tool that accepts fastq files should be able to accept fastq.gz inputs since Galaxy will automatically decompress them on execution.
Since fastqGroomer works for you it seems that Galaxy is not setting yout fastq format correctly (but that cannot be automated). I am puzzled how this would work on usegalaxy.org differently. Did you try this with the very same file?
For anyone else having trouble with fastq data and the different formats/datatypes, the first few FAQs here cover current help/usage (for Galaxy 17.09 forward): https://galaxyproject.org/support/#getting-inputs-right