Question: Loss of Columns in Comparison of Tabular Datasets
0
gravatar for joanna
2.4 years ago by
joanna70
United Kingdom
joanna70 wrote:

We have observed that when there is white space at the end of a row in a tabular dataset, and that dataset is used in a "compare two datasets", any trailing columns that contained whitespace or an empty string in the first row disappear from every row in the output.

For example, if we compare the following dataset against itself, keeping all the rows in the first dataset...

line 1  A   B   C                   
line 2  A   B   C   D               
line 3  A   B   C   D   E   F   G   H
line 4  A   B   C   D   E   F   G   H

The result we see is:

line 1  A   B   C
line 2  A   B   C
line 3  A   B   C
line 4  A   B   C

We would like to be able to join and compare datasets that will contain empty strings as in this examp Please let me know if you need any more information about the bug we are seeing.

Many thanks Jo

galaxy • 564 views
ADD COMMENTlink modified 2.4 years ago by Jennifer Hillman Jackson25k • written 2.4 years ago by joanna70
0
gravatar for Jennifer Hillman Jackson
2.4 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

The problem might be with the tabular format of the input dataset. I suspect that the skipped lines do not have the same column assignments as expected (some lines have data that were grouped during upload).

Click on the "eye" icon to visualize how the columns are interpreted. Lines of unequal column type/counts can be problematic. Trailing spaces that are "grouped" during upload with other/all data values can be problematic.

There is some unexpected behavior with the upload tool and data of variable column count. I opened a ticket for that issue here for review (it can be followed to resolution): https://github.com/galaxyproject/galaxy/issues/2602

Thanks for reporting the problem, Jen, Galaxy team

ADD COMMENTlink written 2.4 years ago by Jennifer Hillman Jackson25k

Hello and thank you.

I'll have a look at this ticket.

Just to add more information to this, when looking at the first dataset above with the eye icon the display is correct (it doesn't resemble the disruption caused by unequal columns in tabular format).

We have also seen this with datasets produced by tools that we use internally that produce tabular files, so it doesn't seem isolated to datasets that have been uploaded.

Kind regards

ADD REPLYlink written 2.4 years ago by joanna70

Thanks for the feedback. I'll add the link to the post to the ticket so whoever works on it has all the extra info. Jen

ADD REPLYlink written 2.4 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 173 users visited in the last hour