Bug in Galaxy Dataset preview for large data

Question: Bug in Galaxy Dataset preview for large data

2.5 years ago by

joanna • 70

United Kingdom

joanna • 70 wrote:

Hello Galaxy Developers.

We have come across an issue that is quite significant to our users.

The Preview mode chunks are duplicating rows when viewing large datasets.

We see duplicate lines in the preview that are not in the dataset when downloaded or used as input to other tools, they seem to be because the chunks used to load large data to the dataset preview overlap by 1 line. (the last in one is the first in the next)

I have an example file that reliably reproduces this issue if you require it. We have tried this on our own instances and on usegalaxy and found the same issue.

I'd appreciate any information you may have on this issue. Kind regards Jo

software error galaxy • 812 views

ADD COMMENT • link •

modified 2.4 years ago by manelybelton6495 • 0 • written 2.5 years ago by joanna • 70

It appear that the size of large files are increase when they are uploaded to Galaxy. Do you also see the difference in size of file from original upload?

ADD REPLY • link written 2.5 years ago by rob.costa1234 • 10

Hello,

Would you please share a link to the history that contains the example where display is overlapping/problematic? Send to galaxy-bugs@list.galaxyproject.org, from the same email used for your account, include the dataset number and a link to this post, and please be sure that the data is in an active state (not deleted) or it cannot be fully viewed/tested.

Thanks and we will investigate. Jen, Galaxy team

ADD REPLY • link written 2.5 years ago by Jennifer Hillman Jackson ♦ 25k

Thank you. I have sent this information to you.

Many thanks. Jo

ADD REPLY • link written 2.5 years ago by joanna • 70

Hello Jennifer,

I was wondering if you could confirm that you received the mail I sent containing the details you requested?

Many thanks Jo

ADD REPLY • link written 2.4 years ago by joanna • 70

I'm having trouble finding the email in the galaxy-bugs internal list, could you resend it or send to me directly? I'd like to take a look at this one.

ADD REPLY • link written 2.4 years ago by Dannon Baker ♦ 3.7k

I didn't look specifically at the size of the file, rather the number of lines. My file in Galaxy was correct (downloading it was identical to the source), however the display showed more lines than it stated were present in the metadata.

ADD REPLY • link written 2.5 years ago by joanna • 70

Joanna, the number of lines in the preview is an estimation. Counting lines for very large file is expensive so Galaxy guesses the amount of lines. For small files it should be accurate.

ADD REPLY • link written 2.5 years ago by Bjoern Gruening ♦ 5.1k

Hi Bjoern, We found that the line count shown was consistant with our input files so this estimation was fine, it was that some lines are duplicated in the display. So copying and pasting the file from the preview display does not produce a reliable copy of the data. It displays more lines than are truely there.

ADD REPLY • link written 2.5 years ago by joanna • 70

Similar posts • Search »