Hello all. I am looking for some help with a computational genetics task in Galaxy, but I am new to bioinformatics and practical genetics, so please forgive me if my questions seem elementary or in need of clarification.
I am trying to figure out a way to look at quantitative differences between the presence of two DNA-binding proteins. In the lab I an interning for, we are looking at the difference in trimethylation of histone 3 proteins under two different conditions, in order to understand changes in transcriptional regulation.
For simplicity, I have four chip-seq files: one for H3, and one for K4 (ie trimethylated H3), for each condition. I have mapped those (using an input chip-seq and BWA) to my reference genome, converted them to BED files, and then
run those BED files against the input using SICER to find peaks showing where these histones are on in the genome.
After this point, I am lost. In general, H3 and K4 are going to show up in the same spots in the genome, so the resulting BED files telling me where they are without the magnitudes aren't very helpful. What I want is a measure of the K4/H3 ratios for each location, so that I can compare the same loci from both conditions and see if there is a change in how much a certain H3 is methylated.
I think the wig files produced by sicer have some magnitude data (number of counts), but I cannot figure out if there is a way to compare two wig files and perform operations (such as subtracting values from one wig file from another). Interval operations require that they be converted to BED, which causes them to lose the count data anyway.
So in summary, I need to:
A. Use Sicer to find peaks for H3 and K4 sequences
B. Generate a file that gives the K4/H3 ratio at each peak for both conditions
C. Find the difference in those ratios between both conditions at each peak, ideally so that I can generate another
BED file only at peaks where the difference meets a certain threshold.
Is there someone that knows how I might go about this?