Question: Galaxy interval format - what should be provided as CHROM#?
0
gravatar for netnauke
2.3 years ago by
netnauke0
netnauke0 wrote:

Dear colleagues,

I have a .txt file with >100 lines of the following format (1st column - sequence ID, 2nd - start coordinate, 3rd - end coordinate, 4th - strand; everything separated by TABs):

PA14sr_076    2867353    2867490    +

I am trying to fetch the sequences from the full genome sequence corresponding to these coordinates. I figured out that for this I could convert my file into Interval format. However, I do not understand, what should I use as a CHROM# in this case? As this is a bacterial species, it has only one chromosome anyway.. And when I check the full genome sequence in GenBank - it has no identifiers similar to chromosome or something.

When I am doing "Extract genomic DNA" without providing CHROM# - I get an empty output. Could someone help me, please? Thanks in advance...

galaxy • 664 views
ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by netnauke0
1
gravatar for Jennifer Hillman Jackson
2.3 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

The attribute for chrom is a sequence identifier from one of the sequences in the reference genome (one or more chromosomes, or multiple scaffold/contigs, or some combination). The start and end are positions on a specific sequence contained within the reference genome with strand used as a modifier.

Examine the reference genome to understand the identifiers used. Then double check that the coordinates are based on those sequences. My guess is that the example region you shared has a genetic region name as chrom identifier (not the reference genome's actual chromosome name). However, the start/end appear to be genomic coordinates. 

More about reference genomeshttps://wiki.galaxyproject.org/Support#Reference_genomes

More about interval and bed format: https://wiki.galaxyproject.org/Learn/Datatypes

Bed format is a better choice with this particular tool. Pad columns with default values when there is no known content (name and score). Tools in the group Text Manipulation can be used, or format the data prior to upload.

Thanks, Jen, Galaxy team

ADD COMMENTlink written 2.3 years ago by Jennifer Hillman Jackson25k
0
gravatar for netnauke
2.3 years ago by
netnauke0
netnauke0 wrote:

Thanks a lot  for the help, everything works now.

ADD COMMENTlink written 2.3 years ago by netnauke0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 98 users visited in the last hour