I am beginning to analyze some RNA-Seq data and having some difficulties with the custom reference genome. My reference genome is the goat (Capra hircus). On NCBI, I can download a fasta file for each chromosome but do not see an option to download just one fasta file of the genome, which is how was interpreting it done from the wiki custom reference genome page. Do I have to run for each chromosome individually?
Hello,
Most data providers include a bundled file that contains all chromosomes. This is usually in the same list as the individual chromosomes, but named slightly different and by version. First double check that is not true.
If you do by chance need to load individual chromosomes, do that first. Then use the tool "Text manipulation -> Concatenate datasets" to merge all into one file.
Once merged, perform some QA. Specifically, double check that the new fasta dataset (assign the datatype using the pencil icon if needed), does not contain extra spaces, and is wrapped. You want the data in strict fasta format. The Troubleshooting portion of the second wiki link contains instructions for manipulating fasta data to achieve that format.
General help:
http://wiki.galaxyproject.org/Support#Custom_reference_genome
Specific, including Troubleshooting (section 8):
http://wiki.galaxyproject.org/Learn/CustomGenomes
Take care, Jen, Galaxy team