Question: How to use data from Roadmap epigenomics project to run ChromHMM
1
gravatar for Tonyf
22 months ago by
Tonyf10
Tonyf10 wrote:

I am new to this area. I am now trying to utilize the data from Roadmap epigenomics project (link here: http://egg2.wustl.edu/roadmap/web_portal/imputed.html#imp_sig) to run the ChromHMM to give binarized files for different histone marks and identify chromatin states. Actually I succeeded in doing with the data I got from UCSC genome browser and also ENCODE project, but the problem is that the data from these two are the ones that I can easily find with .bed format or .bam format (that I can convert .bam to .bed easily using codes from Github), and use them as the input for BinarizeBed mode in ChrmoHMM. However, for the Roadmap epigenomics project, I cannot find the .bed file, all I can find for different marks are in the format of .bigWig (link here: http://egg2.wustl.edu/roadmap/data/byFileType/signal/consolidatedImputed/), this may not be the correct one that I really need but this is the only place where I can find that looks like what I need.

I searched online, there are two possible ways:

  1. Convert .bigWig to .wig then to .bed format. I did it using bigWigtowig for step 1 and then BEDOPS package to convert .wig to .bed. I did get something, but the .bed I got is really different in the format that I got from USCS or ENCODE, and the size of the file much much bigger reaching ~GB. Actually I think I don't quite understand the meaning of the data details in all these file types, .bed, .bam, .bigWig, .wig--like which column stands for signal, and whether they are normalized or not. I know it will be better to go deep into code, but I am not very familiar with java (in which chromhmm is written) and I am still learning it, so it will be great if I can get some idea on all those things.

  2. Maybe using another mode in ChromHMM? like BanirizeSignal or something? But also maybe I need to understand all the meaning of the input in advance.

Could someone with experience on this help? The central question would be how to start from (.bigWig) data from Roadmap epigenomics project to run ChromHMM, and explanations to the data format would be great.

Thanks in advance!

chromhmm • 978 views
ADD COMMENTlink written 22 months ago by Tonyf10
1
gravatar for Devon Ryan
22 months ago by
Devon Ryan1.9k
Germany
Devon Ryan1.9k wrote:

You can usually get the BAM files here

ADD COMMENTlink written 22 months ago by Devon Ryan1.9k

Thank you very much Devon. But I am wondering why there are only .bigWig file for different histone marks with various cell types on Roadmap epigenomics project (http://egg2.wustl.edu/roadmap/web_portal/imputed.html#imp_sig) and if there is any ways that I can directly use the data from this database (Or these database are actually with the same data source?).

ADD REPLYlink written 22 months ago by Tonyf10

Your link goes to a webportal to allow convenient access and visualization or processed data. My link goes to the officially released data at NCBI. Use my link.

ADD REPLYlink written 22 months ago by Devon Ryan1.9k

I got it. But can I ask one further question, that where I can download the files like the different histone marks of a specific cell type, e.g. E116 (Gm12878, which is lymphoblastoid cell line) in this database 'cause I don't think I find it... Thanks a lot...

ADD REPLYlink written 22 months ago by Tonyf10

I think that's originally from ENCODE.

ADD REPLYlink written 22 months ago by Devon Ryan1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour