Beginner question: Finding the right tool

Question: Beginner question: Finding the right tool

3.4 years ago by

jrs2020 • 10

United States

jrs2020 • 10 wrote:

I'm a complete beginner at using Galaxy, and was hoping that someone could help me find the right tool(s) to find out how many gene locations are contained in a dataset I have.

(Some truth in advertising up front: I'm asking because I'm taking an online course on Galaxy, and this question is from a quiz I recently completed. I got the question wrong, and although I know the right answer now, I don't know how to get the right answer. I'd ask my fellow course participants/instructors, but there is no way to do so without the answer being visible to all students, many of whom have not yet taken the quiz.)

The dataset I have is a list of ~1700 transcripts from a human X chromosome. I'm trying to find out how many gene locations are contained within those transcripts. I downloaded the UCSC data for the hg19 X chromosome. And I've tried a variety of combinations of attempts using the Group/Join/Intersect tools, but I don't come anywhere near the right number.

Can anyone help me identify the right set of tools to determine, when given a set transcripts from a known organism/location, how many known genes are contained within that dataset?

Thanks in advance for any assistance.

question tool beginner • 1.2k views

ADD COMMENT • link •

modified 3.4 years ago • written 3.4 years ago by jrs2020 • 10

Intersect is probably the correct tool ... but for more details we need more about your file formats or which informations do you have for your datasets.

ADD REPLY • link written 3.4 years ago by Bjoern Gruening ♦ 5.1k

3.4 years ago by

jrs2020 • 10

United States

jrs2020 • 10 wrote:

It's a course through Coursera, titled "Genomic Data Science with Galaxy"

ADD COMMENT • link written 3.4 years ago by jrs2020 • 10

3.4 years ago by

jrs2020 • 10

United States

jrs2020 • 10 wrote:

Thanks for the offer to help.

The original dataset is located here.

I re-tried using intersect (both with and without using the 'pieces' option, and with either the test dataset as the first data entry or the second; the hg19 dataset is the second). Here's are my attempts: https://usegalaxy.org/u/jrs2020/h/request-for-assistance

I've been told that the 'right answer' is that the test dataset should have ~1500 human genes. But none of my intersect options are close to that number.

Thanks again for any assistance.

ADD COMMENT • link written 3.4 years ago by jrs2020 • 10

3.4 years ago by

jrs2020 • 10

United States

jrs2020 • 10 wrote:

Nevermind, I found the answer. I was way overthinking it, instead of grouping and counting by the name of the transcript in the original datafile.

Thanks for the assistance though.

ADD COMMENT • link written 3.4 years ago by jrs2020 • 10

Cool, great you figured it out! Just out of interest which course you are attending?

ADD REPLY • link written 3.4 years ago by Bjoern Gruening ♦ 5.1k

Please log in to add an answer.

Similar posts • Search »