Question: DanRer10 Effective Genome Size?
0
gravatar for amaheras7091
2.4 years ago by
amaheras709110
amaheras709110 wrote:

Hi,

I'm trying to use bamCoverage to convert my TopHat bam files into bigwig files for visualization on UCSC genome browser. bamCoverage asks for the Effective Genome Size and if I'm using the zebra fish DanRer10 genome, does anyone know what the effective genome size would be for that?

Also, this is the information it gives about what the effective genome size is: "The effective genome size is the portion of the genome that is mappable. Large fractions of the genome are stretches of NNNN that should be discarded. Also, if repetitive regions were not included in the mapping of reads, the effective genome size needs to be adjusted accordingly. See Table 2 of http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0030377 or http://www.nature.com/nbt/journal/v27/n1/fig_tab/nbt.1518_T1.html for several effective genome sizes."

I cam across the number 1371719383, but I think that is the total length rather than just the mappable length?

Thanks for any help!

rna-seq tophat galaxy • 1.1k views
ADD COMMENTlink modified 2.4 years ago by Jennifer Hillman Jackson25k • written 2.4 years ago by amaheras709110
0
gravatar for Jennifer Hillman Jackson
2.4 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

See the Assembly Statistics tab here: http://www.ncbi.nlm.nih.gov/assembly/210611

Repetitive regions are not reported, as this can be a variable number depending on how repeats are categorized and masked. However, the danRer10 repeat tracks at UCSC (http://genome.ucsc.edu) could be reviewed, the appropriate one used, and the coverage subtracted.

Any that come up with a number (amaheras7091 or other readers), please share that back along with methodology/assumptions as a follow-up post. Other sources for this data (pre-calculated) are also welcome.

Best, Jen, Galaxy team

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Jennifer Hillman Jackson25k

Hi,

On the UCSC Genome Browser (http://genome.ucsc.edu/cgi-bin/hgTables), there are several tracks that are under "Variation and Repeats" (Interrupted Rpts, Microsatellite, RepeatMasker, Simple Repeats, WM + SDust) and I looked at the summary statistics for each. In order to choose which one(s) to subtract from the total nucleotide length, I did the same process for the human genome which has a known effective genome size of 2,451,960,000. However, subtracting the "item bases" from any one (or a combination) of "Repeat" tracks from the total number of nucleotides 3,209,286,105 did not yield the known effective genome size. Any further assistance on how to calculate the effective genome size would be greatly appreciated.

ADD REPLYlink written 2.4 years ago by amaheras709110

Update: The DeepTools documentation now states that removing non-N bases from the total count is one way to do this calculation: https://deeptools.readthedocs.io/en/develop/content/feature/effectiveGenomeSize.html

UCSC has a tool for this (faCount a line command utility), or you can find the information on their wiki. Their hg38 100-way conservation track includes most of their genomes with statistics, including genome size minus the NNNs. http://genomewiki.ucsc.edu/index.php/Hg38_100-way_Genome_size_statistics

With the danRer10 assembly having a count of 1,369,631,918 once non-N bases are removed.

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour