The BED extraction data can be resolved in Galaxy. Pull out the whole
gene and then modify the coordinates in Galaxy to be 10k upstream.
To be clear - this coordinate data is going to be used to transform
coordinates in your current fuzznuc output that is transcript-based to
be genome-based. The coordinates are not input for fuzznuc - the are
used after fuzznuc is run on the fasta file, in order to covert the
result coordinates only.
This page in the UCSC wiki has a good description of how the UCSC
coordinates are organized.
The output format for fuzznuc is documented in the tool's help - the
last line on the tool form has a link.
Hopefully this helps to clear up the suggested processing,
Thanks for your help so far.
I've been trying to implement the approach you outlined. It seems to
be taking a lot of steps. I think I'm now at the last step, where I
convert my TAB format file into a BED and push it to USCS for viewing.
But I don't see anything that will allow me to do that last conversion
Any advice would be appreciated.
Thanks for all your help.
Here's the final Galaxy workflow for doing FUZZNUC on a BED file from
UCSC Table Browser, then producing BED file that you can view in UCSC.
I do not include the "Get Flank" operation in this base workflow, but
include a note in the description.
I have not (yet) had time to make the score in the final BED dependent
on the quality of the match, when mis-matches are allowed, but I hope
to come back and add that later.
How does one handle versioning of published workflows? Do updated the
existing one, or create another with a .v2 name?
Also, I used several "Text Manipulation> Compute" steps - is there any
way to compute more than 1 new column at a time?