Question: Compare SNP density of genes and intergenic regios
gravatar for ahockstedler1
29 days ago by
ahockstedler10 wrote:

Hello, I am trying to calculate the SNP density of my Chr genes as well as the SNP density for intergenic regions of my Chr. Can you help with these steps?

ADD COMMENTlink modified 29 days ago by Jennifer Hillman Jackson24k • written 29 days ago by ahockstedler10

Hello Jennifer,

Thanks for your response, can you elaborate a little more on how you extract these from the UCSC table browser as I have went through every drop down box etc in this feature and don't see how I would get either one.

Any help would be greatly appreciated.

ADD REPLYlink written 29 days ago by ahockstedler10
gravatar for Jennifer Hillman Jackson
29 days ago by
United States
Jennifer Hillman Jackson24k wrote:


The Galaxy 101 tutorials demonstrate how to calculate SNP density for coding exons. This analysis could be modified to calculate density for full genes and intergenic regions by swapping in BED datasets representing those regions. Both can be extracted from the UCSC table browser or you can upload/create your own BED file for regions of interest.

Galaxy tutorials:

Thanks! Jen, Galaxy team

ADD COMMENTlink written 29 days ago by Jennifer Hillman Jackson24k

For a very basic UCSC query for "genes" (many genes have more than one transcript, and it is transcripts that are actually mapped to the genome), query the Table Browser similar to the one described in the tutorial except choose Exons - not Coding Exons. You might also want to include Introns, 5'/3' UTR and promoter (Upstream) - it depends on how you want to define a "gene": To annotate exons by transcript or gene, please see this prior Q&A:

For something more advanced like "gene bounds", cluster/merge the target transcript features yourself. This avoids duplicated SNPs counts for overlapping exons associated with transcripts belonging to the same gene. This also would create a paired SNP count result - your gene query would be counting up SNPs in complementary regions versus intergenic. How-to is the first step in the query below.

Intergenic regions are the coordinates not included in Exons, Introns, and the 5'/3' UTRs and optionally an estimated promoter (Upstream - 5').

1) Get the gene bound coordinates. This is similar to the query for Exons - you will be just picking more transcript features to include. Once that data is in your Galaxy history, tools in the group Operate on Genomic Intervals can be used to collapse the regions into clusters (aka estimated gene bounds based on transcript features). Merge or Cluster can be used -- what each does exactly is explained on the tool forms.

2) Convert to intergenic coordinates: Use Complement to get the coordinates for the intergenic regions to compare with your SNPs.

UCSC Table browser help: Review the advanced filters to further customize your query: region, identifiers, filters, intersect, linking in related tables, etc.

There are a few different ways to do this, so feel free to experiment. The primary transcript from the UCSC Genes track could be used to represent a "gene/gene bound" (if that track is available for your genome) -- and the complement of those regions used for intergenic. Or you can pick the longest transcript per gene to represent the entire gene bound yourself from any of the Gene tracks. The table browser help and prior Q&A at the UCSC google group can be searched if you want to read how others are doing this type of query - or you can ask a new question there and they can send/point you to query examples.

Hope that helps!

ADD REPLYlink modified 29 days ago • written 29 days ago by Jennifer Hillman Jackson24k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 65 users visited in the last hour