18 months ago by
United States
Hello,
Double check the formatting. This error usually provides a sequence identifier and an example of the first line encountered that had a format problem. This is quite often the last few lines in a dataset, but not always. Click on the green "bug" icon within the error dataset to see the full error message.
The tool Select last lines from a dataset (tail) can be used within Galaxy to check the end of uploaded to datasets to see if they are complete. If the data is simply truncated, upload again and use FTP to load again after confirming that the data is not truncated at the source (your computer or from wherever you loaded it from).
FTP help: https://galaxyproject.org/support/loading-data/
While the tool Select can be used to pull out individual lines from a dataset using a search string (sequence identifier or any text), fastq content per-sequence is on four distinct lines so this will not provide enough information alone to troubleshoot the exact issue. Instead, first use the tool Add column to an existing dataset to add in line numbers, then use Select to find out where the problem starts (search by the sequence identifier to find out what line the problem starts on), then split up the fastq data by the line numbers around it to locate problems (several tools to select specific lines from a dataset are in the tool group Text Manipulation, first, last, and the like). There can be more than one line or fastq record with problems. Running the Groomer tool again is one way to check if the format is intact after making the first correction.
Note that some of these Text Manipulation tools require the input to be in tabular
format (not fastq
). Reassign the datatype as needed, before and after making corrections, if you choose or are able to make the correction within Galaxy. Fixes internal to datasets are not always possible due to the specific formatting problem/content but sometimes are.
Once the format is fixed, you might not need to run the data through the groomer to set the datatype to fastqsanger
. It can be directly assigned in many cases.
How to check the existing fastq format and preparation options are explained here: https://galaxyproject.org/support/fastqsanger/
More usage and troubleshooting help is here: https://galaxyproject.org/support/
Thanks, Jen, Galaxy team