13 months ago by
United States
Hi,
Yes, a transcriptome is needed. It might be from a public source or an assembly you performed or otherwise obtained. Avoid using unassembled NGS reads for a transcriptome is not going to work - these are too fragmented and large to work with this tool (and most tools that use a transcript/genome custom genome/transcriptome).
Upload the fasta file for the transcriptome in fasta format. It should not contain any IUPAC characters and be in "strict fasta format".
Run the tool NormalizeFasta on the data wrapping the lines to 80 bases and trimming the title line (">" line) at the first whitespace. This standardizes the format, removed description content on the title line, and hopefully is enough to isolate the transcript names in a way that will be a match for other data (if used).
But once that is done, be sure to double check the transcriptome against any other data you plan to use from other tools or from external sources. You need to ensure that the identifiers left on the title line are an exact match for the transcript identifiers found in those other data (for example, additional bed reference data if available/used). It is definitely best to modify the transcriptome fasta title line identifiers first (if needed) before using it with any tools. Modifying the results after is a much more complicated task.
Wherever you see "custom reference genome" in the FAQs below, know the formatting rules also apply to "transcriptome fasta data. https://galaxyproject.org/support/#troubleshooting
Hope that helps! Jen, Galaxy team