3.7 years ago by
United States
Hello,
The genomes may contain N content (representing chromatin AFAIK - the first 30k of mouse has that at UCSC, per chrom). However, the rest of the sequence should be fine (is just soft-masked, if you picked that version). You can compare to the exact version used on Galaxy Main (http://usegalaxy.org) by accessing our rsync server:
http://wiki.galaxyproject.org/Admin/UseGalaxyRsync
As another example, to show which version of the genome is used on Main, this was the source for mm10 (we use similar versions for all UCSC-sourced genomes):
http://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/
chromFa.tar.gz - The assembly sequence in one file per chromosome.
Repeats from RepeatMasker and Tandem Repeats Finder (with period
of 12 or less) are shown in lower case; non-repeating sequence is
shown in upper case.
The "mm10.2bit" file has the same content, all chromosomes in one file. Sometimes this is easier to download/work with. Use the UCSC utility "twoBitToFa" to convert to fasta (available from UCSC's source downloads, same web site, located here: http://hgdownload.soe.ucsc.edu/admin/exe/).
Thanks, Jen, Galaxy team