Question: Fastq.gz files in shared Data Libraries
0
gravatar for rudigarude
3.4 years ago by
rudigarude0
United Kingdom
rudigarude0 wrote:

Hi,

I have a production local version of galaxy running at our institution. After our illumina spits out the bcl files we run bcl2fastq and put it into our storage archive. What I want to be able to do is add these raw fastq.gz files as data libraries into galaxy without copying into galaxy. So far this works fine by mounting the storage archive in galaxy and adding the directories as datasets from filesystem paths.

The issue I've got is that bcl2fastq stores them as fastq.gz files and adding these as a data library in this way means that the autodetect file type assigns them as 'data'. I don't really want galaxy to attempt to uncompress anything like it does if you do a standard upload, and a lot of tools can work with fastq.gz files (e.g. fastqc and trinity). However since the files are assigned as format 'data', nothing works with them and there is no fastq.gz file type in the datatypes_conf.xml.

Has anyone dealt with a similar issue with working with fastq.gz's in data libraries or have any advice how to integrate these more easily?

Many thanks,

Martin

ADD COMMENTlink modified 3.4 years ago by Hotz, Hans-Rudolf1.8k • written 3.4 years ago by rudigarude0
1
gravatar for Hotz, Hans-Rudolf
3.4 years ago by
Switzerland
Hotz, Hans-Rudolf1.8k wrote:

Hi Martin

Have you tried setting "File Format:" to 'fastq' or fastqsanger' instead of 'Auto-detect'?

We upload gzipped fastq files like this (with 'Upload files from filesystem paths', and 'Link to files without copying into Galaxy'). The files will then apear as "Data type: fastq" and "Peek: gzipped file". And we can use them in tools accepting gzipped fastq files.

 

Hope this helps, Hans-Rudolf

 

ADD COMMENTlink written 3.4 years ago by Hotz, Hans-Rudolf1.8k

Hi Hans-Rudolf,

I did try that but it didn't quite have the desired effect. It worked for programs that work with fastq.gz but then fools users into thinking they'll just work with programs that only accept fastq files.

So, having a bit of a play with it I found the 'archive_datatypes' data source in the tool shed that adds the fastq.gz datatype. So that allows the auto detect to work fine. I've now written a wrapper for gunzip so that these can then be uncompressed for programs that only work with uncompressed fastq files, and then added fastq.gz as an accepted file type for the few programs that work with the compressed version.

I think that gives a half decent solution to my problem at this point.

Cheers,

Martin

 

ADD REPLYlink written 3.4 years ago by rudigarude0

We are basically doing the same thing on our instance. The issue is ofcourse that with each tool update you will have to add the support for fastq.gz again.

I'm not sure if it helps, but maybe upvoting this ticket might highlight the need for this: https://trello.com/c/3RkTDnIn/345-666-support-gzipped-gz-compressed-versions-of-standard-datatypes

 

ADD REPLYlink written 3.4 years ago by Jelle Scholtalbers360

Great idea, I've upvoted.
 

ADD REPLYlink written 3.4 years ago by rudigarude0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 165 users visited in the last hour