how to select the appropriate reference genome ?what does C.elegans Feb 2009(WS200/ce7)(ce7) mean? Looking forward to your reply.Thanks.
C.elegans Feb 2009(WS200/ce7)(ce7) means the version of the C. elegans reference genome published in Wormbase release WS200, which is identical to UCSC release ce7. Does that answer your question?
Thanks you very much for your answer. It did help help me a lot. And if you can tell me how can i get N2 genome as the refference genome?Thanks again.
I'm not sure I understand your follow-up question. N2 is the C. elegans reference strain so, of course, what's released by Wormbase is the N2 sequence. I think you have to understand that published reference genomes are not a static thing, but undergoing curation and corrections, meaning there exact nucleotide sequence can change. When that happens UCSC assigns a new ce version number. Wormbase on the other hand is a database that includes way more than just the reference genome sequence (protein sequences, genome annotations, etc.) and these other things typically change much faster than the reference sequence. That's why Wormbase releases (with new WS version numbers) are quite frequent, but not every one of them brings an actual change to the reference genome sequence (e.g., WS190 was ce6, but between WS190 and WS199 the reference sequence didn't change). Anyway, all these versions are N2 sequences (just at different levels of accuracy or correctness, if you will).
Thanks for your sincere help . It means a lot to me.And there are still some questions remain to me. Firstly,I thought WS220/ce10 refers to the genome of CB4856,but i am not sure . And i need the genome of N2 to be the reference genome in my analysis .So i wonder if the latest version of N2 genome is the best choice for me. Thirdly,what is function of the HA_SNPS_Unfiltered_112061Variants_WS220.64_chr.vcf file in the analysis of WGS? I will appreciate it if you can give me some suggestion .
Firstly,I thought WS220/ce10 refers to the genome of CB4856,but i am not sure . And i need the genome of N2 to be the reference genome in my analysis.
As I said before, WSxyz/cexy are always N2 references. The HA_SNPS_....vcf file you are mentioning is a file from the CloudMap workflow and lists 112061 SNPs present in CB4856 with their coordinates in Wormbase release WS220 (a.k.a. ce10) of the reference N2 genome. If you want to use this file than you should, of course, use the WS220 version of the reference genome, too.
Custom genome help, if needed: https://galaxyproject.org/support/ >> https://galaxyproject.org/learn/custom-genomes/