Question: One Question About The Genome Coordinates When Using Fetch Sequences
Sean wrote:
Hi, I have one stupid question. The coordinates of the region chr1 2351533 - 2351843 from UCSC (hg18) will retrieve 311 bases. However, when I use Fetch Sequences from galaxy, it will only retrieves 310 bases. Apparently, the first base of the 311 bases is missing from the Fetch Sequences result because the ending bases are the same. Does this mean that I need to modified the coordinates first and then use the Fetch Sequences to get the correct sequence? I thought UCSC and galaxy were both 0 base? Thanks. Sean
Jennifer Hillman Jackson wrote:
Hello, The coordinates are interpreted in Galaxy as having a 0-based start. This means that in order to determine the actual start genome position, add 1. Not a stupid question - everyone has to learn this as they begin to work with data types sourced originally from UCSC and associated projects. Depending on which tool you are using in the UCSC database, the coordinates will be interpreted as 0-based or 1-based. What tools outside of UCSC or Galaxy do with the coordinates can vary. In general: positional coordinates of format "chrA:NNN-NNNN" will be 1-based BED/Interval format will be 0-based More help is on the "Convert formats" tool descriptions (included in BED format description). And, this link at UCSC has all of the details: Hopefully this helps! Best, Jen Galaxy team -- Jennifer Jackson
