I am new to this area. I am now trying to utilize the data from Roadmap epigenomics project (link here: http://egg2.wustl.edu/roadmap/web_portal/imputed.html#imp_sig) to run the ChromHMM to give binarized files for different histone marks and identify chromatin states. Actually I succeeded in doing with the data I got from UCSC genome browser and also ENCODE project, but the problem is that the data from these two are the ones that I can easily find with .bed format or .bam format (that I can convert .bam to .bed easily using codes from Github), and use them as the input for BinarizeBed mode in ChrmoHMM. However, for the Roadmap epigenomics project, I cannot find the .bed file, all I can find for different marks are in the format of .bigWig (link here: http://egg2.wustl.edu/roadmap/data/byFileType/signal/consolidatedImputed/), this may not be the correct one that I really need but this is the only place where I can find that looks like what I need.
I searched online, there are two possible ways:
Convert .bigWig to .wig then to .bed format. I did it using bigWigtowig for step 1 and then BEDOPS package to convert .wig to .bed. I did get something, but the .bed I got is really different in the format that I got from USCS or ENCODE, and the size of the file much much bigger reaching ~GB. Actually I think I don't quite understand the meaning of the data details in all these file types, .bed, .bam, .bigWig, .wig--like which column stands for signal, and whether they are normalized or not. I know it will be better to go deep into code, but I am not very familiar with java (in which chromhmm is written) and I am still learning it, so it will be great if I can get some idea on all those things.
Maybe using another mode in ChromHMM? like BanirizeSignal or something? But also maybe I need to understand all the meaning of the input in advance.
Could someone with experience on this help? The central question would be how to start from (.bigWig) data from Roadmap epigenomics project to run ChromHMM, and explanations to the data format would be great.
Thanks in advance!