Error due to format of uploaded fastq data

Question: Error due to format of uploaded fastq data

18 months ago by

o.haidar • 10 wrote:

Hello Everyone,

I am a new to RNA-Seq and Galaxy. I tried to analyse my RNA-Seq data from RAT (rn6) Cells using galaxy. I uploaded my files (as .fastq.bz2 and once uploaded they are in .FASTQ format, on their own!!!) and then tried to convert one-by-one Fastq files to Fastqsanger formats using FASTQ Groomer. However, I get an error message as shown below: https://ibb.co/mquwJF

If unable to follow the link, I will provide some details of what I did: INPUT FASTQ quality Score Type ==> Sanger & Illumina 1.8+ ADVANCED OPTIONS ==> Show Advanced Options OUTPUT FASTQ quality scores type ==> Sanger (recommended) Force Quality Score ending ==> ASCII Summarize Input data ==> Summarize Input

And the error message: There was an error reading your input file. Your input file is likely malformed. It is suggested that you double-check your original input file for errors -- helpful information for this purpose has been provided below. However, if you think that you hav

Can anyone help!!!

Regards, Omar

error upload groomer fastq format • 573 views

ADD COMMENT • link •

modified 18 months ago by Jennifer Hillman Jackson ♦ 25k • written 18 months ago by o.haidar • 10

18 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

Double check the formatting. This error usually provides a sequence identifier and an example of the first line encountered that had a format problem. This is quite often the last few lines in a dataset, but not always. Click on the green "bug" icon within the error dataset to see the full error message.

The tool Select last lines from a dataset (tail) can be used within Galaxy to check the end of uploaded to datasets to see if they are complete. If the data is simply truncated, upload again and use FTP to load again after confirming that the data is not truncated at the source (your computer or from wherever you loaded it from).

FTP help: https://galaxyproject.org/support/loading-data/

While the tool Select can be used to pull out individual lines from a dataset using a search string (sequence identifier or any text), fastq content per-sequence is on four distinct lines so this will not provide enough information alone to troubleshoot the exact issue. Instead, first use the tool Add column to an existing dataset to add in line numbers, then use Select to find out where the problem starts (search by the sequence identifier to find out what line the problem starts on), then split up the fastq data by the line numbers around it to locate problems (several tools to select specific lines from a dataset are in the tool group Text Manipulation, first, last, and the like). There can be more than one line or fastq record with problems. Running the Groomer tool again is one way to check if the format is intact after making the first correction.

Note that some of these Text Manipulation tools require the input to be in tabular format (not fastq). Reassign the datatype as needed, before and after making corrections, if you choose or are able to make the correction within Galaxy. Fixes internal to datasets are not always possible due to the specific formatting problem/content but sometimes are.

Once the format is fixed, you might not need to run the data through the groomer to set the datatype to fastqsanger. It can be directly assigned in many cases.

How to check the existing fastq format and preparation options are explained here: https://galaxyproject.org/support/fastqsanger/

More usage and troubleshooting help is here: https://galaxyproject.org/support/

Thanks, Jen, Galaxy team

ADD COMMENT • link written 18 months ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »