In general, you can use *'**NGS: Picard (beta) -> SAM to FASTQ'* to
extract sequences (convert BAM > SAM first), but this tool does not
in extra sequence based off the reference genome (or pad the
quality scores, etc.). I don't know of a Galaxy wrapped tool that does
this, but you might check the Tool Shed, or other public Galaxy
Others reading this post may also have advice.
Now, going from *BAM* -> coordinates (bed/interval) *->* *FASTA*
sequence is possible a few ways. The general idea is that the
coordinates are manipulated to extend the mapped footprint and then
sequence is extracted from the reference genome. Any content novel in
the original sequence is lost, but maybe this still has some utility
you. The two methods below show how to do this, with the 2nd being
simpler, if the genome is at UCSC. There are other ways to get
sequence, merge/cluster, etc. (see tools in group 'Operate on Genomic
Intervals') but below are the most direct methods per-sequence to
And if you need to filter down multi-mapped data, use the tool ' NGS:
SAM Tools -> Filter SAM' (converting to/from SAM from BAM as needed).
*1st method, works for any genome, include a custom reference genome:*
1 - convert 'NGS: SAM Tools ->BAM-to-SAM'
2 - convert SAM to interval with 'NGS: SAM Tools -> Convert SAM' or
convert to bed with 'BEDTools -> Convert from BAM to BED'
3 - split the file into two: one representing the (+) strand
one the (-) using the tool ' Filter and Sort -> Filter'
4 - adjust the start or end coordinate to extend the alignment
as wanted using the tool 'Text Manipulation -> Compute'. Remember that
for negative stranded coordinates, the "start" is really where the end
of the sequence aligned and "end" is where the start of the sequence
aligned - interval files report coordinates with respect to (+)
smallest -> largest.
5 - cut out the columns to create a standard interval file again,
swapping in the new coordinates. Click on the pencil icon to make
attribute assignment for columns and to assign a reference genome as
needed - this information is required by the next tool.
6 - get the fasta sequence by using the tool 'Fetch Sequences ->
7 - merge all fasta results together with the tool 'Text Manipulation
8 - if you need fastq format, you can pad out quality scores and
that with the tool 'NGS: QC and manipulation -> Combine FASTA and
*2nd method, if the reference genome is at UCSC:*
1 - convert 'BEDTools -> Convert from BAM to BED'
2 - click on the "view at UCSC main" link for the dataset
3 - once at UCSC Browser, the data will show up as a custom track, by
default named "User Track" in the top track group. Click on the track
name - it will take you to the track controls and focus the browser on
4 - in the top blue menu bar, click on "Tools -> Table Browser". This
track will now be pre-loaded in the form with all options probably set
as you want them (this user track is selected and "region" is
- except for one - change "output format" from "BED" to be "sequence
5 - confirm that the "Galaxy" box is checked, and click on "get
6 - the next form has options for extending the sequence at 5' and/or
ends, all in one go, adjust as you want
7 - click on "Send query to Galaxy" and the dataset will load back
the working history
8 - the fasta can be converted to fastq as in the 1st method, step #8
Hopefully some of this is helpful!