Question: Loss of Columns in Comparison of Tabular Datasets
0
gravatar for joanna
19 months ago by
joanna70
United Kingdom
joanna70 wrote:

We have observed that when there is white space at the end of a row in a tabular dataset, and that dataset is used in a "compare two datasets", any trailing columns that contained whitespace or an empty string in the first row disappear from every row in the output.

For example, if we compare the following dataset against itself, keeping all the rows in the first dataset...

line 1  A   B   C                   
line 2  A   B   C   D               
line 3  A   B   C   D   E   F   G   H
line 4  A   B   C   D   E   F   G   H

The result we see is:

line 1  A   B   C
line 2  A   B   C
line 3  A   B   C
line 4  A   B   C

We would like to be able to join and compare datasets that will contain empty strings as in this examp Please let me know if you need any more information about the bug we are seeing.

Many thanks Jo

galaxy • 355 views
ADD COMMENTlink modified 19 months ago by Jennifer Hillman Jackson24k • written 19 months ago by joanna70
0
gravatar for Jennifer Hillman Jackson
19 months ago by
United States
Jennifer Hillman Jackson24k wrote:

Hello,

The problem might be with the tabular format of the input dataset. I suspect that the skipped lines do not have the same column assignments as expected (some lines have data that were grouped during upload).

Click on the "eye" icon to visualize how the columns are interpreted. Lines of unequal column type/counts can be problematic. Trailing spaces that are "grouped" during upload with other/all data values can be problematic.

There is some unexpected behavior with the upload tool and data of variable column count. I opened a ticket for that issue here for review (it can be followed to resolution): https://github.com/galaxyproject/galaxy/issues/2602

Thanks for reporting the problem, Jen, Galaxy team

ADD COMMENTlink written 19 months ago by Jennifer Hillman Jackson24k

Hello and thank you.

I'll have a look at this ticket.

Just to add more information to this, when looking at the first dataset above with the eye icon the display is correct (it doesn't resemble the disruption caused by unequal columns in tabular format).

We have also seen this with datasets produced by tools that we use internally that produce tabular files, so it doesn't seem isolated to datasets that have been uploaded.

Kind regards

ADD REPLYlink written 19 months ago by joanna70

Thanks for the feedback. I'll add the link to the post to the ticket so whoever works on it has all the extra info. Jen

ADD REPLYlink written 19 months ago by Jennifer Hillman Jackson24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 111 users visited in the last hour