Galaxy interval format - what should be provided as CHROM#?

Question: Galaxy interval format - what should be provided as CHROM#?

2.8 years ago by

netnauke • 0 wrote:

Dear colleagues,

I have a .txt file with >100 lines of the following format (1st column - sequence ID, 2nd - start coordinate, 3rd - end coordinate, 4th - strand; everything separated by TABs):

PA14sr_076 2867353 2867490 +

I am trying to fetch the sequences from the full genome sequence corresponding to these coordinates. I figured out that for this I could convert my file into Interval format. However, I do not understand, what should I use as a CHROM# in this case? As this is a bacterial species, it has only one chromosome anyway.. And when I check the full genome sequence in GenBank - it has no identifiers similar to chromosome or something.

When I am doing "Extract genomic DNA" without providing CHROM# - I get an empty output. Could someone help me, please? Thanks in advance...

galaxy • 796 views

ADD COMMENT • link •

modified 2.8 years ago • written 2.8 years ago by netnauke • 0

2.8 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

The attribute for chrom is a sequence identifier from one of the sequences in the reference genome (one or more chromosomes, or multiple scaffold/contigs, or some combination). The start and end are positions on a specific sequence contained within the reference genome with strand used as a modifier.

Examine the reference genome to understand the identifiers used. Then double check that the coordinates are based on those sequences. My guess is that the example region you shared has a genetic region name as chrom identifier (not the reference genome's actual chromosome name). However, the start/end appear to be genomic coordinates.

More about reference genomes: https://wiki.galaxyproject.org/Support#Reference_genomes

More about interval and bed format: https://wiki.galaxyproject.org/Learn/Datatypes

Bed format is a better choice with this particular tool. Pad columns with default values when there is no known content (name and score). Tools in the group Text Manipulation can be used, or format the data prior to upload.

Thanks, Jen, Galaxy team

ADD COMMENT • link written 2.8 years ago by Jennifer Hillman Jackson ♦ 25k

2.8 years ago by

netnauke • 0

netnauke • 0 wrote:

Thanks a lot for the help, everything works now.

ADD COMMENT • link written 2.8 years ago by netnauke • 0

Please log in to add an answer.

Similar posts • Search »