Sort Your Inputs
Many tools require inputs to be sorted in a specific way prior to use. The tool form help will often declare if sorting is required and the expected ordering.
Good news! Galaxy includes tools to do this sorting.
- SortSam Best choice for SAM/BAM Sort BAM dataset
- Sort data in ascending or descending order An alternate choice for SAM/BAM and the best choice for Tabular/BED/Interval/GTF.
Note: For the second tool, BAM data will need to be converted to SAM format first (tool: BAM-to-SAM). Depending on the data format of any type, you may need to split off the header (tool: Select), sort the data lines with this tool, then replace the header (tool: Concatinate). A workflow for BAM data that works well for RNA-seq analysis is here as an example (it may require tuning for your paricular data): https://usegalaxy.org/u/jen/w/sort-bam-inc-headers
Tools and Tool groups that require input sorting
NGS: SAMTools (most)
Example error on bug report. Yours may differ. If there is a problem, try sorting first before reporting a bug.
job stdout: [samopen] SAM header is present: N sequences. [bam_index_core] the alignment is not sorted (display_dataset_name): A-th chr > B-th chr [bam_index_build2] fail to index the BAM file.
How to sort?
Try using Coordinate sort on the inputs with SortSam before using these tools. This is often required as a distinct step even if the input dataset states in the name that it is already sorted.
NGS: Picard (most)
Tools can error for a variety of reasons that seem to be unrelated to sort order, including this one seen on the bug report (click on the green bug icon, but there is no need to submit the bug/error):
job info: This job was terminated because it used more memory than it was allocated. Please click the bug icon to report this problem if you need help.
NGS: RNA-seq: Tophat, Cufflinks, Cuffmerge, Cuffdiff
Different errors can be reported and some may seem unrelated to sort order. Try sorting as a first pass troubleshooting solution.
If sorting does not resolve the error
It could be that your FASTQ data is not actually in .fastqsanger format. This occurs quite often in reported issues. For the quickest resolution, instead of reporting the bug and being sent back this link, first double check your data format directly using the guidelines in Section 2.11 of the Galaxy support wiki:
If the job fails for memory after sorting
Section 2.8 of the Galaxy support wiki explains alternatives for working with data/jobs that exceed the compute resources at http://usegalaxy.org (Galaxy Main):
Thanks for using Galaxy!