4 months ago by
This is how to combine two or more fasta datasets. In your case, there are two datasets where each represents a genome.
- Upload the datasets to Galaxy in fasta format.
- Use FTP upload if the data is over 2 GB (it probably will be).
- Run the tool Normalize Fasta on each. This standardizes the format.
- Use the option to wrap at 80 bases and to trim the title line at the first whitespace
- It is important to make sure that each identifier (the ">" line's first "word") is unique for any datasets that you wish to combine
- Combined the two (or more) normalized fasta datasets into one with the tool Concatenate
- The datasets are "stacked" into a single dataset
- Any number of plain text datasets of the same datatype (no headers or comment lines, or these removed first) can be combined with this same tool, it is not just for fasta format.
This will effectively create a fasta dataset that can be used as a Custom Reference Genome and optionally a Custom Build. If you have trouble with this or want more details, please start by reviewing the guide here, then let us know if anything is unclear: https://galaxyproject.org/learn/custom-genomes/
Hope this helps! Jen, Galaxy team