2.1 years ago by
I checked your history with the analysis. There is a reference genome mismatch problem between the natively indexed TAIR10 genome at Galaxy Main (http://usegalaxy.org) and the GTF reference annotation from iGenomes.
Good news!! You are very close to obtaining a successful run and avoiding issues with other tools. The genome.fa from the same annotation bundle from iGenomes is already in your history! To solve the current problem and later potential issues, do the following:
- Re-map the reads again the TAIR10 *genome.fa genome from the bundle*. This will ensure that all data going forward is based on the same exact reference genome with identical chromosome identifiers. This is very important to obtain valid analysis results - whether tools fail or not.
- Use the Custom Reference genome option with the mapping tool. General help with video guides and other quick tips: https://galaxyproject.org/learn/custom-genomes/
- DO NOT assign the database metadata attribute as the natively indexed TAIR10 genome.
- Instead, promote the Custom Genome to a Custom Build (detailed help in the same link above).
- Assign that Custom Genome Build as the metadata "Database" attribute to the BAM and all other datasets associated with this genome (generated by tools - if not done by default - plus upload datasets used). Again, this avoids issues and ensures tools use the correct genome build. It is worth the extra steps. No one likes to start over from mapping.
- Mapping tools will not use this database assignment, but many other common tools do, and this proper database assignment will avoid further confusing issues/poor results. The goal is to fully annotate datasets with the actual genome used.
Note that some sources of Custom Reference genomes (in particular those from NCBI, or those assembled yourself) have title lines with complicated/extended annotation - not just the simple chromosome identifiers. Before starting an analysis, clean up the title line so that only chromosome identifiers remain (the ">" line in a fasta dataset) and re-wrap the fasta file lines at 80 bases before creating a Custom genome/build. Use the tool NormalizeFasta (also explained in the link above).
Please try the above and let us know if you need more help. Cheers! Jen, Galaxy team