How do I run Tophat and RNA-Seq analysis using the GRCH37- embl 66
genome? I noticed there is no input for this genome version.
Can I construct a reference genome from the following embl format
ftp://ftp.ensembl.org/pub/release-66/embl/homo_sapiens/, and map it
against my RNAseq data?
Karthik Srinivasan | Senior Application Engineer
P:HYPERLINK "tel:+912242554282"+912242554282 | M:HYPERLINK
Oracle Health Sciences Global Business Unit
6'th Floor, Silver Metropolis, W.E.Highway, Goregaon(E) | 400063
Because it is sourced from UCSC, the GRCh37 genome is available in
Galaxy as "hg19". The full name is:
Human Feb. 2009 (GRCh37/hg19) (hg19)
This aligns with how Ensembl also understands the content:
For RNA-seq analysis (and sometime other types of analysis) you may
to adjust other input data's chromosome naming to match the UCSC
This is explained in the RNA-seq FAQ:
The data in the FTP link you provide is annotation data. To use
annotation data with the RNA-seq pipeline, a GTF file would be a good
format. The RNA-seq tool pages have a link out to Ensembl, but any
source from the same genome is OK (UCSC, etc.).