How to use data from Roadmap epigenomics project to run ChromHMM

Question: How to use data from Roadmap epigenomics project to run ChromHMM

22 months ago by

Tonyf • 10

Tonyf • 10 wrote:

I am new to this area. I am now trying to utilize the data from Roadmap epigenomics project (link here: http://egg2.wustl.edu/roadmap/web_portal/imputed.html#imp_sig) to run the ChromHMM to give binarized files for different histone marks and identify chromatin states. Actually I succeeded in doing with the data I got from UCSC genome browser and also ENCODE project, but the problem is that the data from these two are the ones that I can easily find with .bed format or .bam format (that I can convert .bam to .bed easily using codes from Github), and use them as the input for BinarizeBed mode in ChrmoHMM. However, for the Roadmap epigenomics project, I cannot find the .bed file, all I can find for different marks are in the format of .bigWig (link here: http://egg2.wustl.edu/roadmap/data/byFileType/signal/consolidatedImputed/), this may not be the correct one that I really need but this is the only place where I can find that looks like what I need.

I searched online, there are two possible ways:

Convert .bigWig to .wig then to .bed format. I did it using bigWigtowig for step 1 and then BEDOPS package to convert .wig to .bed. I did get something, but the .bed I got is really different in the format that I got from USCS or ENCODE, and the size of the file much much bigger reaching ~GB. Actually I think I don't quite understand the meaning of the data details in all these file types, .bed, .bam, .bigWig, .wig--like which column stands for signal, and whether they are normalized or not. I know it will be better to go deep into code, but I am not very familiar with java (in which chromhmm is written) and I am still learning it, so it will be great if I can get some idea on all those things.
Maybe using another mode in ChromHMM? like BanirizeSignal or something? But also maybe I need to understand all the meaning of the input in advance.

Could someone with experience on this help? The central question would be how to start from (.bigWig) data from Roadmap epigenomics project to run ChromHMM, and explanations to the data format would be great.

Thanks in advance!

chromhmm • 978 views

ADD COMMENT • link •

written 22 months ago by Tonyf • 10

22 months ago by

Devon Ryan • 1.9k

Germany

Devon Ryan • 1.9k wrote:

You can usually get the BAM files here

ADD COMMENT • link written 22 months ago by Devon Ryan • 1.9k

Thank you very much Devon. But I am wondering why there are only .bigWig file for different histone marks with various cell types on Roadmap epigenomics project (http://egg2.wustl.edu/roadmap/web_portal/imputed.html#imp_sig) and if there is any ways that I can directly use the data from this database (Or these database are actually with the same data source?).

ADD REPLY • link written 22 months ago by Tonyf • 10

Your link goes to a webportal to allow convenient access and visualization or processed data. My link goes to the officially released data at NCBI. Use my link.

ADD REPLY • link written 22 months ago by Devon Ryan • 1.9k

I got it. But can I ask one further question, that where I can download the files like the different histone marks of a specific cell type, e.g. E116 (Gm12878, which is lymphoblastoid cell line) in this database 'cause I don't think I find it... Thanks a lot...

ADD REPLY • link written 22 months ago by Tonyf • 10

I think that's originally from ENCODE.

ADD REPLY • link written 22 months ago by Devon Ryan • 1.9k

Similar posts • Search »