Question: Beginner question: Finding the right tool
0
gravatar for jrs2020
3.4 years ago by
jrs202010
United States
jrs202010 wrote:

I'm a complete beginner at using Galaxy, and was hoping that someone could help me find the right tool(s) to find out how many gene locations are contained in a dataset I have.

(Some truth in advertising up front: I'm asking because I'm taking an online course on Galaxy, and this question is from a quiz I recently completed. I got the question wrong, and although I know the right answer now, I don't know how to get the right answer. I'd ask my fellow course participants/instructors, but there is no way to do so without the answer being visible to all students, many of whom have not yet taken the quiz.)

The dataset I have is a list of ~1700 transcripts from a human X chromosome. I'm trying to find out how many gene locations are contained within those transcripts. I downloaded the UCSC data for the hg19 X chromosome. And I've tried a variety of combinations of attempts using the Group/Join/Intersect tools, but I don't come anywhere near the right number.

Can anyone help me identify the right set of tools to determine, when given a set transcripts from a known organism/location, how many known genes are contained within that dataset?

Thanks in advance for any assistance.

 

question tool beginner • 1.2k views
ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by jrs202010

Intersect is probably the correct tool ... but for more details we need more about your file formats or which informations do you have for your datasets.

ADD REPLYlink written 3.4 years ago by Bjoern Gruening5.1k
1
gravatar for jrs2020
3.4 years ago by
jrs202010
United States
jrs202010 wrote:

It's a course through Coursera, titled "Genomic Data Science with Galaxy"

ADD COMMENTlink written 3.4 years ago by jrs202010
0
gravatar for jrs2020
3.4 years ago by
jrs202010
United States
jrs202010 wrote:

Thanks for the offer to help.

The original dataset is located here

I re-tried using intersect (both with and without using the 'pieces' option, and with either the test dataset as the first data entry or the second; the hg19 dataset is the second). Here's are my attempts:  https://usegalaxy.org/u/jrs2020/h/request-for-assistance

I've been told that the 'right answer' is that the test dataset should have ~1500 human genes. But none of my intersect options are close to that number.

Thanks again for any assistance.

ADD COMMENTlink written 3.4 years ago by jrs202010
0
gravatar for jrs2020
3.4 years ago by
jrs202010
United States
jrs202010 wrote:

Nevermind, I found the answer. I was way overthinking it, instead of grouping and counting by the name of the transcript in the original datafile.

Thanks for the assistance though.

ADD COMMENTlink written 3.4 years ago by jrs202010

Cool, great you figured it out! Just out of interest which course you are attending?

ADD REPLYlink written 3.4 years ago by Bjoern Gruening5.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 165 users visited in the last hour