Question: How to run a multiple hypothesis test on genomic data in galaxy
0
gravatar for kmcbri12
2.1 years ago by
kmcbri1210
kmcbri1210 wrote:

Hello!

I am trying to answer some very broad questions with some chromosomal genomic data. So far, I have created a table based on chromosome 12 of hg38 which includes percent coverage of exons by common SNPs, percent coverage of exons by flagged SNPs, and percent coverage of exons with OMIM allele variants. What I need to know is: Is there a tool I can use that will tell me if there is a common biological function of genes that have a high percentage of SNPs? And how do I get this information based on Gene Ontology terms?

Part two of my question would be is there a tool which I can use to run multiple hypothesis testing? If so where is it and how does it work?

Thank you for any help!

assembly snp galaxy statistics • 760 views
ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by kmcbri1210
1
gravatar for Devon Ryan
2.1 years ago by
Devon Ryan1.9k
Germany
Devon Ryan1.9k wrote:

Have a look at the David tool under "phenotype association". You'll need to give it the IDs of the genes enriched in SNPs. David does the multiple testing and correction for you (I should note that I've never used this in Galaxy, so I can't say what its output looks like).

ADD COMMENTlink written 2.1 years ago by Devon Ryan1.9k
1
gravatar for kmcbri12
2.1 years ago by
kmcbri1210
kmcbri1210 wrote:

Thank you for your reply! This sounds exactly like what I need. One follow up question though- under the David Tool, it asks for an identifier type. Do you know which option would be best for comparing gene function?

Thanks again!

ADD COMMENTlink written 2.1 years ago by kmcbri1210
1

The option identifier type informs that tool about the gene names/symbols source/type. If you are not sure, try googling a few of these to determine which in the pull-down menu is a fit for the data. Best, Jen, Galaxy team

ADD REPLYlink written 2.1 years ago by Jennifer Hillman Jackson25k

Jen,

Once you have analyzed the data using DAVID, how do you see the results? The html file provided by galaxy leads to a mostly blank webpage and I'm not sure how to see the result.

When using a table, do I need to use the column with the "name" (i.e. uc058nfe.1_cds_4_0_chr12_48132868_f) as the column to identify, or is there a better one to use?

Thanks!

ADD REPLYlink written 2.1 years ago by kmcbri1210
1

Did you predict ORFs or something? Your ID is a UCSC ID followed by something that looks like an ORF prediction. You can get rid of everything after the UCSC ID with cut, but I'd be concerned that you have duplicate IDs.

ADD REPLYlink written 2.1 years ago by Devon Ryan1.9k

This value does not appear to be a standard gene name or symbol and is probably why the link goes nowhere specific. As Devon wrote, the input dataset should include those for your target genome. That then becomes the column to use as the identifier and the source of that information is what to select as the identifier type. In short, all must be a match for the tool to produce correct results.

Perhaps review the documentation for the underlying binary to understand how it works? The Galaxy tool is just a wrapper around that same binary.

ADD REPLYlink written 2.1 years ago by Jennifer Hillman Jackson25k

Did you submit a computeMatrix bug report on our public deepTools Galaxy server a couple hours ago (it was an anonymous user)? The IDs in one of the files looks very similar to your example.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Devon Ryan1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour