I would like to know how to find CpG sites on chromosome 17 in human (homo sapiens) using galaxy. I have used ucsc database to do so, but did not get anything out of it. There are the CpG sites I need to find out (cg02228185 in ASPA, cg25809905 in ITGA2B, and cg17861230 in PDE4C), and I know their source sequences. If anyone could help me out in finding these CpG sites, this will literally help me survive my college.
This is the target publication (age related CpG sites?): http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4082572/
How was the data in the publication not helpful?
If you have the sequences, BLAT at UCSC can map these to the genomic locations for you. These output coordinates can be loaded into Galaxy, along with the CpG track, then the two intersected to find common overlap. You could also extract a gene track and compare those coordinates with the BLAT mapped sites, if that is what you are looking for. Even cross-genome mapping should be productive in many cases using BLAT, if that is the goal (discovery of these regions in other species).
Various online data providers also likely have these sites exactly mapped (beyond what sequence analysis can do as far as prediction).
Please explain your issue in more detail if this does not produce what you need. We really shouldn't answer exact classroom assignment questions, but can offer analysis guidance.
Thanks, Jen, Galaxy team
I really appreciate your answer; my question may seem irresponsible, but I study in 3rd year in netherlands. I only found this galaxy website through coursera. I still do not know a lot about how to work with them as I have not had any proper bioinformatics course. This is for my project carrying 18 credits. I have tried so hard and many different methods; asked many teachers for this one. They did not know what to do about it. It may sound like a silly excuse, but I am trying hard to not only work on my project, but also to do other things. There are still so many things I do not know about these softwares.I am kind of trying desperately as I could just fail my project. Could you tell me how to find the overlaps? I do not even know how to load the output coordinates apparently.. I know I am asking a lot.
we have a tool that does exactly that for you, at least if I understood it correctly. https://github.com/bgruening/galaxytools/commits/master/tools/find_subsequences
Here is the TS entry: https://toolshed.g2.bx.psu.edu/view/bgruening/find_subsequences/d882a0a75759
Jen would this be something for Galaxy main? Cheers, Bjoern
I have published my work here: https://usegalaxy.org/u/chawnerd/h/unnamed-history.
Thank you for the help. These are the source sequences for each CpG islands.
cg25809905 36 17 39823254 NCBI:RefSeq 36.1 CCAAGAGTAAACAGTGTGCTCAATGCTGTGCCTACGTGTGTTAGCCCACG 39822399 - GeneID:3674 ITGA2B
cg02228185 36 17 3326317 NCBI:RefSeq 36.1 GGTTAGTAATAAATGGTTTTACCTCCAGCCCTGTTCTCTGAATCTCAGCG 3326046 + GeneID:443 ASPA
cg17861230 36 19 18204901 NCBI:RefSeq 36.1 GGATCCGAATAGAAGCGCTGTTGGATGCGGATGGGGCGCCGGGGTTGCCG 18205016 - GeneID:5143 PDE4C
Hi, this is email I got from the author 3 days ago. Dear chanwoo,
Thank you for your email. We have provided a online calculator to predictor biological age using three CpG sites. Please go to http://www.molcell.rwth-aachen.de/epigenetic-aging-signature/ for details.
Best, wishes, Qiong Lin In that link author gave me the three sequences, which I believe are the source sequences for CpG sites used. I also have the three sequences which I mentioned above. None of them (3 from author and 3 from illumina database) matches CpG islands database formed by the hg19 or hg38. The ones author used are the ones from the illumina database. In this case, what is the solution? Indeed, I am learning on how to use it. :)