Private vs Public Galaxy: different outcomes following Fastq upload

Question: Private vs Public Galaxy: different outcomes following Fastq upload

8 months ago by

lhp_sf4dem • 30 wrote:

We are experimenting with setting up our own cloud-based Galaxy server using the Cloudman tools and a 1 year grant of AWS credits we received recently. We've managed to stand up the server and are doing some testing. In the past, when we upload fastq data derived from Illumina sequencing runs to the PUBLIC usegalaxy.org) suite of tools, they seem to be immediately available to all the applications we wish to use, including Trinity. No conversion required. In contrast, in the private Galaxy sever we established, when we upload the SAME FILES, they are not recognized by any of the tools. All the tools seem to be insisting on "fastqsanger" files. We've tried changing the file type, renaming the files with a ".fastqsanger" extension, etc, but to no avail. The only thing that works is a multi-step process of using Fastq.info to convert first to a fasta file and a qual file and then use "make fastq" to convert it to an "official" fastq file. Is there some customization of the PUBLIC site that does this to files upon upload? Why is this necessary in the private server? Is there a tool we can install so this happens automatically? Apologies! If it isn't obvious already, we're not experts.

fastq upload galaxy • 738 views

ADD COMMENT • link •

modified 8 months ago • written 8 months ago by lhp_sf4dem • 30

This is interesting, I do not know what is happening yet though. Let me ask you a few questions first:

Are the files you upload compressed (.gz, .bz, .zip etc.)?
What format is shown on the dataset in the history?
What version of Trinity do you have installed?
What version of Galaxy are you using?

Note that there is a fastqGroomer tool that can convert between variou fastq formats.

ADD REPLY • link modified 8 months ago • written 8 months ago by Martin Čech ♦♦ 4.9k

Thanks for the response!

I just discovered the fastqGroomer tool, and that works like a charm. Still, when using the public Galaxy tools @ usegalaxy.org, I don't normally need to do this.

So, to answer your questions:

We download files from our Illumina BaseSpace account. Those files are in the .gz format. We then upload the compressed files via FTP still in the .gz format. Once uploaded with the "upload" tool and visible in the history, the files seem to be automatically decompressed to .fastq files almost immediately thereafter. Usually takes just a few minutes.
Once the files are decompressed, if one checks the file type in the history, it's simply listed as "fastq", not "fastqsanger".
I've installed Trinity revision "b3fe7c4ca5aa" from "artbio"; however, this problem of the uploaded fastq files not being recognized extends to all other tools we've tested including "concatenate", "Time Galore!", etc. Only exception we've found thus far is the FastQC tool.
Galaxy is at revision: f7ba729510 (release_17.05 branch) from 19 Jun 2017

Basically, it seems to be a problem with files we upload not being designated as fastqsanger files and instead assigned a generic "fastq" data type. It seems necessary to use fastqGroomer convert the datatype to fastqsanger, but when I view a few pages of the generic fastq and fastqGroomer converted to fastqsanger files side-by-side, they look identical. I've also tried changing the file type in the metadata from generic "fastq" to "fastqsanger" manually, but that doesn't seem to fix the problem.

Appreciate you giving this some thought!

ADD REPLY • link written 8 months ago by lhp_sf4dem • 30

Since 17.01 Galaxy understands fastq.gz files and won't try to decompress them on upload (unless you turn it off), since some tools can operate on them directly. However any tool that accepts fastq files should be able to accept fastq.gz inputs since Galaxy will automatically decompress them on execution.

Since fastqGroomer works for you it seems that Galaxy is not setting yout fastq format correctly (but that cannot be automated). I am puzzled how this would work on usegalaxy.org differently. Did you try this with the very same file?

ADD REPLY • link written 8 months ago by Martin Čech ♦♦ 4.9k

For anyone else having trouble with fastq data and the different formats/datatypes, the first few FAQs here cover current help/usage (for Galaxy 17.09 forward): https://galaxyproject.org/support/#getting-inputs-right

ADD REPLY • link written 8 months ago by Jennifer Hillman Jackson ♦ 25k

8 months ago by

Martin Čech ♦♦ 4.9k

United States

Martin Čech ♦♦ 4.9k wrote:

I think I understand it now. tldr: your inconsistencies should go away if you update your Galaxy to 17.09 or 18.01.

Since 17.09 the default fastq format has been changed to fastqsanger - which is what all new data should have and this is why you see different behavior on Main (which is on 18.01). The easiest way to handle this on your version of Galaxy is to properly set the datatype on upload (to fastqsanger or fastqsanger.gz). You can also change the datatype of already uploaded datasets using the 'pencil' icon menu.

You do not need to use fastqGroomer tool (that is mostly just useful if you have non-fastqsanger data which at this point would be ~8 years old)

ADD COMMENT • link modified 8 months ago • written 8 months ago by Martin Čech ♦♦ 4.9k

8 months ago by

lhp_sf4dem • 30

lhp_sf4dem • 30 wrote:

That worked! I have a smallish fastq.gz file that I've successfully uploaded and analyzed in the past via the Main Galaxy instance at usegalaxy.org. Using this file as a test subject, I uploaded it to our private Cloudman Galaxy instance, setting the file type to "fastqsanger" in the upload tool interface (rather than "Autodetect") and that seems to works. Interestingly, even though it's a .gz compressed file, if I set it to fastqsanger.qz, rather than simply "fastqsanger" it doesn't work. Only works if I call the compressed file "fastqsanger". It unpacks almost immediately after appearing in the history.

Here is my next question (keeping in mind we're all a bunch of molecular biologist struggling to get this done while simultaneously looking to hire a bioinformatician): How does one update the version of Galaxy we're running? We've been following instructions on the Galaxy Cloudman pages for initial set up and use, but I haven't found anything about updating that doesn't require some command line coding. I'm capable of some of that, but don't want to get too far out ahead of myself.

REALLY appreciate your help, by the way!

ADD COMMENT • link written 8 months ago by lhp_sf4dem • 30

Sorry, I forgot you are on cloudman. I think cloudman instances are not easily updateable until the new cloudman release is published, so proper setting of datatype is your best option now. fastqsanger.gz is also available only since 17.09 so you have to make due with just fastqsanger. Galaxy will automatically unpack the archive for you though.

ADD REPLY • link written 8 months ago by Martin Čech ♦♦ 4.9k

I just learned that you can use the latest GVL image instead of the CloudMan and it has Galaxy 17.09 included - https://launch.usegalaxy.org/catalog/appliance/genomics-virtual-lab

ADD REPLY • link written 8 months ago by Martin Čech ♦♦ 4.9k

8 months ago by

lhp_sf4dem • 30

lhp_sf4dem • 30 wrote:

I gave this a try and was initially impressed by the interface and excited to use the latest release of Galaxy; however, after getting it set up, I immediately encountered an error "right out of the box". The Cluster info log is reporting over and over the following error:

R_COMM channel basic get exception: [Errno 32] Broken pipe

This seems to be preventing me from spooling up and using a spot instance as a worker node. The main Galaxy node is running, but that's it.

ADD COMMENT • link written 8 months ago by lhp_sf4dem • 30

I should have been more specific... We gave "GVL" at try and it seems to have some hangups right out of the box...

ADD REPLY • link written 8 months ago by lhp_sf4dem • 30

Thanks for reporting this, I made an issue for it at https://github.com/galaxyproject/cloudlaunch/issues/140 so the responsible people can analyse it. Feel free to add any details you know there.

ADD REPLY • link written 8 months ago by Martin Čech ♦♦ 4.9k

Have replied at the github link but just posting an update here. Initially, we thought that this was a firewall issue, and indeed, there was a slight misconfiguration of the firewall, but that hasn't fixed the problem. It's a rabbitmq connectivity issue, but why it's being triggered is not clear. Still looking into this and will report back.

ADD REPLY • link modified 8 months ago • written 8 months ago by Nuwan Goonasekera • 10

This should be fixed now. There's a bit more info in the github issue (https://github.com/galaxyproject/cloudlaunch/issues/140 ).

ADD REPLY • link written 8 months ago by Enis Afgan • 690

Please log in to add an answer.

Similar posts • Search »