3.8 years ago by
It is a bit difficult to understand what you are trying to do, so if I get this wrong, just clarify and we can offer more help.
If you have genome regions defined by coordinates, based on a specific fasta dataset, then you can use the tool "Extract Genomic DNA" to pull out the sequence for just those coordinate regions. They key here is to create a custom reference genome for the fasta dataset. The tool states that it is for "Genomic DNA", but this works with nearly any fasta dataset (very large NGS datasets are probably the exception, the job may exceed compute resources).
How to turn a fasta dataset into a custom reference genome is defined here:
Since you also want flanking regions for your coordinates to be included, first use the tool "Get flanks" to extended the coordinates, then use the Extract tool. The Extract tool uses the chromosome (sequence) identifiers in the genome and the coordinate file to make a match. They must be exact. Use BED format for the coordinate input for best results.
All of these are tools found on the public Main Galaxy instance at http://usegalaxy.org. Use the "search" at the top of the tool form to locate them. At the bottom of each tool form is a link to the Tool Shed repository the wrappers are based on.
The tool 'filter_by_fasta_ids' is most likely not what you are needing, but again, I may have misunderstood your question. Please send more details if that is true.
Best, Jen, Galaxy team