Question: Concatenate datasets command selects the wrong files
2
gravatar for thomas.sorgeloos
6 months ago by
thomas.sorgeloos20 wrote:

Dear Galaxy Team,

I try to concatenate 4 Fastq files from different technical replicates using the concatenate command. While the command was successfully invoked. Galaxy mentioned that concantnation will occur with the first file and three times the fourth file. For example if I try to concatenate files 166 -167 -168 and 169, Galaxy reports:

206: Concatenate datasets on data 169 and data 166.

The information icon confirms this behaviour:

Concatenate Dataset 166: RF-1A_S13_L001_R1_001.fastq
Select 169: RF-1A_S13_L004_R1_001.fastq
Select 169: RF-1A_S13_L004_R1_001.fastq
Select 169: RF-1A_S13_L004_R1_001.fastq

Does Galaxy actually concatenate the right files while reporting the wrong ones?

Many thanks for your help in this matter.

Thomas

ADD COMMENTlink modified 6 months ago by Jennifer Hillman Jackson25k • written 6 months ago by thomas.sorgeloos20
0
gravatar for Jennifer Hillman Jackson
6 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

Is there a reason why you are concatenating technical replicate fastq data? Replication is an advantage for many use cases. Examples are in the Galaxy tutorials: https://galaxyproject.org/learn/

If the goal is to simplify the data in order to run the analysis in batch, using Dataset Collections and/or Workflows are a better choice. How to is also covered in the tutorials linked above.

That said, I can reproduce the tool problem with the Concatenate tool. It doesn't always happen but it does often.

Selected inputs are not being consumed correctly by the tool. The choices flip back to the default selection when submitted. In your case, this involved four different input selections, but only two of those ended up being actual input. The job name is a match for the inputs on the job details page, but not what was selected on the tool form before job submission (or wasn't for me with my tests). The tool would normally fail if given duplicated inputs, so I am not quite sure what is going on yet.

In any case, the listed inputs on the "i" job details page should be the datasets input to the tool. This is a bug and the results may or may not contain all four inputs (I am still testing that part). I would consider this tool buggy and avoid it.

This is a new problem, I'll be doing a test/reporting, and will link an issue ticket back here once done. It impacts both versions of the Concatenate tool at https://usegalaxy.org so far and possibly other tools.

As a workaround, (if you really need/want to concatenate technical replicates or any other data), consider doing this work at Galaxy mirror server running the earlier release: https://usegalaxy.eu or https://usegalaxy.org.au.

Thanks for reporting the problem! Jen, Galaxy team

ADD COMMENTlink written 6 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 168 users visited in the last hour