If I've understood correctly - when using Galaxy's stitch MAF blocks tool, the resulting fasta file uses gap characters to represent genomic regions for which no alignment block was present. These gap characters are therefore present in both the original target/reference sequence and the query sequences used to create the multiple alignment.
For phyloCSF to work properly, the reference sequence must be un-gapped (see https://github.com/mlin/PhyloCSF/wiki). There is a phyloCSF option to remove gaps common to all sequences; however, this would cause frame-shifts in the reference sequence that would invalidate the analysis.
Given that people frequently use Stitch MAF blocks to generate multi-fasta files for phyloCSF analysis, is there a convenient way to create fasta files in which the reference sequence does not contain gaps?
I understand that it may be possible to run the underlying galaxy code (e.g. get_spliced_region_alignment()), and customize the resulting fasta, but, given that galaxy is aimed at users without programming experience, I wonder if there is an easier way to overcome this problem?
Thanks in advance.