Question: How to find genetic distance for a set of genomic intervals
gravatar for pooja.narang
3.6 years ago by
United States
pooja.narang0 wrote:

I have a set of genomic regions (in bed format) and I want to find their cM distance from the nearest gene. 

I know that it would need the human genetic map for the recombination rates. I am considering hg18 reference genome here. So I got the recombination map from here:

The format of the recombination map is something like this:

Physical_Position_Build36(hg18) Genetic_Map(cM) 
742584 0
744045 8.96305756252859e-09
750775 4.12689390157335e-08
758311 8.14337595996149e-08
766409 1.32596464295120e-07
769185 1.46222766745286e-07

But my genomic intervals are like these:

chr1    751448    752765    NR_024321   
chr1    752833    784689    NR_047519    
chr1    752833    768847    NR_047526    
chr1    752833    784689    NR_047524    
chr1    752833    784689    NR_047523   

Should I just find the position of the reference map for each genomic interval and assign the cM value for that interval? This will probably just give the distance of each region in cM, but I want their distance from the nearest gene in cM.

Or if there is some other way? Or is there any tool do do that?

Please help!


galaxy samtools • 1.5k views
ADD COMMENTlink modified 3.6 years ago by Jennifer Hillman Jackson25k • written 3.6 years ago by pooja.narang0
gravatar for Jennifer Hillman Jackson
3.6 years ago by
United States
Jennifer Hillman Jackson25k wrote:


Using the genomic intervals to find the closest gene is fairly straightforward (see the tool "Operate on Genomic Intervals -> Fetch closest non-overlapping feature". All you would need to do is choose the reference annotation you wish to compare against (UCSC is a good source - using the Table browser and sending to Galaxy in BED format). These would be transcripts - but there are ancillary files for each primary track that can consolidate/map transcript to gene (various "gene" types - Gene Symbols, HUGO, Ensembl, and more). Exact with the tool "Get Data -> UCSC Main".

In order to do the next step, a protocol that goes something like this might work.

  1. identify the closest transcript/gene for each interval
  2. create an interval file that represents the region between the interval and nearest transcript/gene
  3. add in the position (chrom/start/end) in interval/bed format for the genetic map info
  4. use #2 to define the regions to extract from the data in #3

#4 would be best done using the same protocol as described for the sample "random distance" calculations at the same web site where you obtained the reference file above (example samples here). The calculations themselves can almost certainly be done within Galaxy using "Text manipulation" and similar tools. When you come up with a successful protocol, please consider sharing/publishing it on Galaxy Main for others to use and posting back the share link here.

I am not aware of any specialized tools for this type of manipulation in the Tool Shed, but you could review. These tools are for use in a local/cloud Galaxy.

There could also very well be specialized tools for this type of manipulation at a public Galaxy instance. Each has their own focus and tools change through time. Reviewing is the best way to see if any are a fit (but no guarantees!): Galaxy Public Servers

Best, Jen, Galaxy team


ADD COMMENTlink written 3.6 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 179 users visited in the last hour