Error with "Trim leading or trailing characters" tool

I have been trying to trim my fastq files, and I am running into problems. I used the "Trim leading or trailing characters" tool to trim the first 10bps of my reads (from 51bp -> 41bp for each read) with the following conditions:

this dataset LV_12_S24_L005_R1_001.fastq Trim this column only 0 Trim from the beginning up to this position 11 Remove everything from this position to the end 50 Is input dataset in fastq format? Yes Ignore lines beginning with these characters

After I run this, I get an error message saying:

Traceback (most recent call last): File "/cvmfs/main.galaxyproject.org/galaxy/tools/filters/trimmer.py", line 111, in <module> main() File "/cvmfs/main.galaxyproject.org/galaxy/tools/filters/trimmer.py", line 75, in main invalid_starts[i] = chr( int( item ) ) ValueError: invalid literal for int() with base 10: '-q'

I have previously used the same tool to trim sequences for my previous experiments, and had no problem. Strangely, when I try to repeat exactly the same trimming process using the same parameters and same fastq file, I cannot get the trim to work.

Any help would be appreciated,

Yena

2.6 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

There is a new problem with the option to specify Fastq input. This ticket captures the details and a work-around for current use: https://github.com/galaxyproject/galaxy/issues/2245

Track fix promotion to Galaxy Main here: https://github.com/jennaj/support-known-issues/wiki

Thanks again for reporting the problem! Jen, Galaxy team

ADD COMMENT • link written 2.6 years ago by Jennifer Hillman Jackson ♦ 25k

Hi Jen,

Thanks for the feedback. I have tried out the directions written under "WORK-around" section, which suggested the following parameters:

Is input dataset in fastq format?   **No**  (instead of yes)
Ignore lines beginning with these characters    **@ +**

The trim process worked fine without any errors. Now the problem lies with downstream processing - FASTQ Groomer. Since all lines beginning with @ and + were set to be "ignored," sequences for which the Sanger Quality Score Value begins with "@ or +" are not trimmed, hence leading to a mismatch in the sequence length. This leads to the following error (example):

The reported error is: 'Invalid FASTQ file: quality score length (51) does not match sequence length (41)'.
The last valid FASTQ read had an identifier of '@D00124:312:C877CANXX:4:2309:4905:2066 1:N:0:CGATGT'.
The error in your file occurs between lines '321' and '324', which corresponds to byte-offsets '11040' and '11188', and contains the text (148 of 148 bytes shown):

@D00124:312:C877CANXX:4:2309:4795:2099 1:N:0:CGATGT
GCACTTCCTGCTCTGCGATGAGCGGAGAAGCAGCAGCGTCC
+
@BB00EFGGGGGCGGGGGGGGG@D@GGGGGGGGGGFGFGGGGEGGGFGGGG

Any suggestions?

ADD REPLY • link modified 2.6 years ago • written 2.6 years ago by yena.oh • 70

Oh rats, that is one of the gotchas. Maybe not such a great work-around. I'll remove it from Github.

The true alternative, for now, is to use another trimming tool. There are many in the tool group "NGS: QC and manipulation" designed specifically for Fastq data. The Trim tool used was not created for that exact purpose originally.

ADD REPLY • link written 2.6 years ago by Jennifer Hillman Jackson ♦ 25k

Thanks for your help Jen,

Just as a followup, for other methods of trimming(i.e. FASTQ Trimmer or Trim Sequences tools), raw .fastq files cannot be used as input files. They must be formatted to Fastqsanger through Groomer.

With respects to this, is there any difference between :

Trim -> Groomer -> Tophat
Groomer -> Trim -> Tophat

Thanks, Yena

ADD REPLY • link modified 2.6 years ago • written 2.6 years ago by yena.oh • 70

If the trimming tool used trims based on quality score values, then grooming first is needed to ensure that these are interpreted correctly. The way you were using this tool did not do that - but some of the others do.

Most tools expect .fastqsanger format as input, so converting the quality scores over is a good idea anyway. That said, your data appears to be already in .fastqsanger format. Run FastQC on the dataset first if you want to double check. If it is, there is no need to groom, just assign the more specific fastq type. If there are many files of the same type, the datatype can be assigned upon upload in batch.

https://wiki.galaxyproject.org/Support#FASTQ_Datatype_QA

ADD REPLY • link written 2.6 years ago by Jennifer Hillman Jackson ♦ 25k

Hi Jen,

I have 12 samples that were run across 2 lanes. Hence, to process these resulting 24 fastq files, I have done FASTQ Trimmer -> Concatenate -> Groomer on my fastq files, followed by Tophat. With the tophat, I seem to be getting an error for some of the output files(most align summary and accepted hits files) with the following message:

An error occurred setting the metadata for this dataset
Set it manually or retry auto-detection

It seems like the error is not specific to any particular tophat output files, as some accepted_hits files do not show the error. This error prevents me from visualizing the reads alignment on IGV or IGB, showing me a webpage with the following error description:

Conflict

There was a conflict when trying to complete your request. 
Error generating display_link: type object 'Bam' has no attribute 'name'

I experienced the same error when I tried to trim the sequences using another trim tool, "Trim sequences," and from other posts, cufflinks won't be able to successfully run either. Do you have any ideas how to fix this? I have also tried manually downloading the tophat outputs, and opening them, but this did not work.

Thanks,

Yena

ADD REPLY • link modified 2.6 years ago • written 2.6 years ago by yena.oh • 70

Problem fixed:

Click on "Set it manually or retry auto-detection" or "edit attributes," and "auto-detect."

If auto-detect does not work and instead shows the following error: "This dataset is currently being used as input or output. You cannot change metadata until the jobs have completed or you have canceled them."

Permanently delete any uncompleted analysis (from all hidden or deleted datasets) that use this specific dataset. If the dataset is not being used as any input or output, try re-running the analysis.

ADD REPLY • link written 2.6 years ago by yena.oh • 70

I have been trying to trim my fastq files, and I am running into problems. I used the "Trim leading or trailing characters" tool to trim the first 10bps of my reads (from 51bp -> 41bp for each read) with the following conditions:

Similar posts • Search »