Question: Compare two datasets issue
0
gravatar for s1579899
23 months ago by
s15798990
s15798990 wrote:

Hi,

I'm trying to use the compare datasets tool and can't get it to work.

My first file was uploaded in .fasta format, which I then converted to tabular and split the name column into multiple. All my fasta sequences are named ">hsa-circ-GENE_NAME-antisense.1", so I used the "convert" tool under text manipulation to convert "-" (dashes) to tab, which resulted in a 6-column tabular file with column 3 being the gene name.

The second file is a list of genes which I want to scan for the presence of in my first file. This is a .txt file, which I uploaded and then used the same convert tool (changing white-spaces to tabs) to change this to tabular form.

Then I went to the compare datasets tool and tried to compare column 3 of file 1 to column 1 of file 2, but it doesn't return genes from file 1 which I know are present in both datasets. For some reason it only returns genes which have very short names (eg: F2), and even then the list is very short.

I would very much appreciate some help with this!

galaxy • 648 views
ADD COMMENTlink modified 23 months ago • written 23 months ago by s15798990

Are the values in the fields being compared identical? If so, I am wondering if the longer name is problematic.

Where are you using Galaxy? http://usegalaxy.org? If not, can you reproduce the odd result there? This would allow a bug report to be sent in so we can check all the inputs and the tool itself to eliminate a bug.

How to report an issue: https://wiki.galaxyproject.org/Support#Reporting_tool_errors

Be sure to leave all datasets undeleted, including the intermediate datasets used for the file conversions. Also please include a link to this post to make it easier to link the two.

Sorry you are having problems, but we can review the bug report and provide more feedback. Thanks, Jen, Galaxy

Thanks for the extra info! Jen, Galaxy team

ADD REPLYlink written 23 months ago by Jennifer Hillman Jackson25k

Hi Jen,

Thank you getting back to me.

There are many (probably thousands) of gene names in the two columns that do not match, but the ones that should match are identical.

I'm using usegalaxy.org, but I can't find the report bug icon anywhere. Could you point me in the right direction? (The tool did run without any errors popping up, as outlined above, there is an issue with the output).

Kind Regards, James

ADD REPLYlink written 23 months ago by s15798990

Hi James,

Thank you for clarifying. No bug icon will be present for a successful job.

Would you be able to share a few lines of each dataset that you believe should be a match here? That way both our team and other community members can contribute to helping over the holiday.

Please preserve formatting by either quoting the text or using a share link (a Gist is one example). Also please add this information as a comment to your post (instead of a reply) to preserve open status until resolved.

Thanks, Jen, Galaxy team

ADD REPLYlink written 23 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 178 users visited in the last hour