Question: Bug in Galaxy Dataset preview for large data
3
gravatar for joanna
18 months ago by
joanna70
United Kingdom
joanna70 wrote:

Hello Galaxy Developers.

We have come across an issue that is quite significant to our users.

The Preview mode chunks are duplicating rows when viewing large datasets.

We see duplicate lines in the preview that are not in the dataset when downloaded or used as input to other tools, they seem to be because the chunks used to load large data to the dataset preview overlap by 1 line. (the last in one is the first in the next)

I have an example file that reliably reproduces this issue if you require it. We have tried this on our own instances and on usegalaxy and found the same issue.

I'd appreciate any information you may have on this issue. Kind regards Jo

software error galaxy • 467 views
ADD COMMENTlink modified 17 months ago by manelybelton64950 • written 18 months ago by joanna70

It appear that the size of large files are increase when they are uploaded to Galaxy. Do you also see the difference in size of file from original upload?

ADD REPLYlink written 18 months ago by rob.costa123410

Hello,

Would you please share a link to the history that contains the example where display is overlapping/problematic? Send to galaxy-bugs@list.galaxyproject.org, from the same email used for your account, include the dataset number and a link to this post, and please be sure that the data is in an active state (not deleted) or it cannot be fully viewed/tested.

Thanks and we will investigate. Jen, Galaxy team

ADD REPLYlink written 18 months ago by Jennifer Hillman Jackson23k

Thank you. I have sent this information to you.

Many thanks. Jo

ADD REPLYlink written 18 months ago by joanna70

Hello Jennifer,

I was wondering if you could confirm that you received the mail I sent containing the details you requested?

Many thanks Jo

ADD REPLYlink written 18 months ago by joanna70

I'm having trouble finding the email in the galaxy-bugs internal list, could you resend it or send to me directly? I'd like to take a look at this one.

ADD REPLYlink written 18 months ago by Dannon Baker3.7k

I didn't look specifically at the size of the file, rather the number of lines. My file in Galaxy was correct (downloading it was identical to the source), however the display showed more lines than it stated were present in the metadata.

ADD REPLYlink written 18 months ago by joanna70

Joanna, the number of lines in the preview is an estimation. Counting lines for very large file is expensive so Galaxy guesses the amount of lines. For small files it should be accurate.

ADD REPLYlink written 18 months ago by Bjoern Gruening4.8k
1

Hi Bjoern, We found that the line count shown was consistant with our input files so this estimation was fine, it was that some lines are duplicated in the display. So copying and pasting the file from the preview display does not produce a reliable copy of the data. It displays more lines than are truely there.

ADD REPLYlink written 18 months ago by joanna70
5
gravatar for Dannon Baker
18 months ago by
Dannon Baker3.7k
United States
Dannon Baker3.7k wrote:

This was definitely a Galaxy bug that should be resolved with https://github.com/galaxyproject/galaxy/pull/2527/.

Thanks for reporting it, and for your patience.

ADD COMMENTlink modified 18 months ago • written 18 months ago by Dannon Baker3.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 101 users visited in the last hour