Question: Extract Genomic DNA using coordinates from a gff file
15 months ago
bnavarro


I have been extracting genomic sequences, using "Extract genomic DNA" from "Fetch Alignments/Sequences". I used gff files that I have downloaded from NCBI-clone, in a fasta file of a genome assembly, which I have also downloaded from NCBI and uploaded to Galaxy, as the last version of the assembly of that genome is not available to work in Galaxy right now.

According to the example, the names of the sequences in the output fasta file should contain the name of the chromosome/scaffold in the gff file. In my output, the names of the chromosomes/scaffolds in the output fasta file are NOT present in the gff file I used for the extraction. How is that possible? Am I missing something? I am not sure on how the extraction works, then.

15 months ago
United States
Jennifer Hillman Jackson (Galaxy team)


I doubled checked your data and all is correct.

Example: The "NC_" identifier in the first line of the output fasta file is in both the reference GFF3 and the custom genome used (near the top of both datasets for your case).

If there isn't an identifier match between the two inputs, there are no results to process. If you ever want to pull all lines out of a dataset that match a particular pattern, the Select tool can be used. Maybe Select out the lines from all three datasets (2 inputs, 1 output) that match one of the output NC identifiers to confirm this for yourself?

Thanks, Jen, Galaxy team

15 months ago
bnavarro


Sure, seems to be all correct now. I may have mixed up the data before. Sorry.



