Question: Galaxy-Fetch Sequences-How to extract genomic DNA from Fasta file?
2.8 years ago by
United States
mohamed_ismail10 wrote:

I am trying to extract Virus genomic DNA sequence using Fetch sequences tools. The source of genomic data is from my history  (Fasta file with the name: >DQ900900.1). 

Unlike human genomic dna, virus genome cannot be labelled with chromosome no. Therefore, I labelled the first column in the interval file as >DQ900900.1. On analysis, I end up with warning message as shown below:

  Unable to fetch the sequence from '35123' to '100' for chrom '>DQ900900.1'. 

I assume something wrong with my labels in the first column of the interval file. Please advice.



galaxy • 924 views
ADD COMMENTlink modified 2.8 years ago by Jennifer Hillman Jackson24k • written 2.8 years ago by mohamed_ismail10
2.8 years ago by
United States
Jennifer Hillman Jackson24k wrote:


Remove the ">" from the identifiers and this will likely solve part of the issue. Just make certain that the identifiers in the reference fasta dataset and the interval dataset are identical otherwise.

The other item to check is that the start coordinate is smaller than the end coordinate. And that the start is "0-based", the same as used in BED format. If the sequence to be extracted is on the complementary strand, designate that by including a strand field.

More about common bioinformatics file formats is in the Galaxy wiki (and also many other places across the internet):

Best, Jen, Galaxy team

ADD COMMENTlink written 2.8 years ago by Jennifer Hillman Jackson24k
