Tutorial: Sorted inputs and unexpected results or errors
0
gravatar for Jennifer Hillman Jackson
2.4 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Sort Your Inputs

Many tools require inputs to be sorted in a specific way prior to use. The tool form help will often declare if sorting is required and the expected ordering.

Good news! Galaxy includes tools to do this sorting.

Sorting tools

  • SortSam Best choice for SAM/BAM Sort BAM dataset
  • Sort data in ascending or descending order An alternate choice for SAM/BAM and the best choice for Tabular/BED/Interval/GTF.

Note: For the second tool, BAM data will need to be converted to SAM format first (tool: BAM-to-SAM). Depending on the data format of any type, you may need to split off the header (tool: Select), sort the data lines with this tool, then replace the header (tool: Concatinate). A workflow for BAM data that works well for RNA-seq analysis is here as an example (it may require tuning for your paricular data): https://usegalaxy.org/u/jen/w/sort-bam-inc-headers

Tools and Tool groups that require input sorting

NGS: SAMTools (most)

Example error on bug report. Yours may differ. If there is a problem, try sorting first before reporting a bug.

job stdout: [samopen] SAM header is present: N sequences. [bam_index_core] the alignment is not sorted (display_dataset_name): A-th chr > B-th chr [bam_index_build2] fail to index the BAM file.

How to sort?

Try using Coordinate sort on the inputs with SortSam before using these tools. This is often required as a distinct step even if the input dataset states in the name that it is already sorted.

NGS: Picard (most)

Tools can error for a variety of reasons that seem to be unrelated to sort order, including this one seen on the bug report (click on the green bug icon, but there is no need to submit the bug/error):

job info: This job was terminated because it used more memory than it was allocated. Please click the bug icon to report this problem if you need help.

NGS: RNA-seq: Tophat, Cufflinks, Cuffmerge, Cuffdiff

Different errors can be reported and some may seem unrelated to sort order. Try sorting as a first pass troubleshooting solution.

If sorting does not resolve the error

It could be that your FASTQ data is not actually in .fastqsanger format. This occurs quite often in reported issues. For the quickest resolution, instead of reporting the bug and being sent back this link, first double check your data format directly using the guidelines in Section 2.11 of the Galaxy support wiki:

https://wiki.galaxyproject.org/Support

If the job fails for memory after sorting

Section 2.8 of the Galaxy support wiki explains alternatives for working with data/jobs that exceed the compute resources at http://usegalaxy.org (Galaxy Main):

https://wiki.galaxyproject.org/Support

Thanks for using Galaxy!

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 171 users visited in the last hour