Question: Failed to run trinity on a fastq file
0
gravatar for danieltsangmanhon
3 months ago by
danieltsangmanhon0 wrote:

I have uploaded a ~3GB fastq file downloaded from SRA database and filtered it with fastx toolkit before upload, but I cannot run trinity and I encounter the following msg, I have no idea what is the problem, can anyone help? Thank you very much.


-------------- Trinity Phase 1: Clustering of RNA-Seq Reads ---------------------

Wednesday, April 11, 2018: 04:38:39 CMD: cat /pylon5/mc48nsp/xcgalaxy/main/staging//19061292/inputs/dataset_24604640.dat | /opt/packages/trinity/2.2.0/trinity-plugins/fastool/fastool --illumina-trinity --to-fasta >> single.fa 2> /pylon5/mc48nsp/xcgalaxy/main/staging//19061292/inputs/dataset_24604640.dat.readcount Trinity run failed. Must investigate error above.

assembly trinity galaxy rna-seq • 128 views
ADD COMMENTlink modified 3 months ago by Jennifer Hillman Jackson25k • written 3 months ago by danieltsangmanhon0

I have tried to convert it to fasta format and run trinity, still fail with same error.

I followed the insruction below to evaluate my fastq file, Run FastQC first to assess the type Run FASTQ Groomer if the data needs to have the quality scores rescaled If you are certain that the quality scores are already scaled to Sanger Phred+33 (the result of an Illumina 1.8+ pipeline), the datatype ".fastqsanger" can be directly assinged. Click the pencil icon to reach the Edit Attributes form. In the center panel, click on the "Datatype" tab (3rd), enter the datatype ".fastqsanger", and save. Metadata will assign, then the dataset can be used.

Filename filter2SRR299028_fastq File type Conventional base calls Encoding Sanger / Illumina 1.9 Total Sequences 9399306 Sequences flagged as poor quality 0 Sequence length 100 %GC 41

I changed the datatype to fasqsanger and rerun, still fail, I dont know what is going on.

ADD REPLYlink written 3 months ago by danieltsangmanhon0
0
gravatar for Jennifer Hillman Jackson
3 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

The format of the original uploaded fastq data (dataset 3 and 20) appear to be Ok. However, I can see that the Trinity assembly is failing at an early step.

I would suggest loading the data directly from the source into Galaxy and manipulating it within Galaxy (trim, other QA). Use the tool Download and Extract Reads in FASTA/Q format from NCBI SRA. Then try a rerun. Sometimes data from this source needs standardized reformatting, see the help here for details: https://galaxyproject.org/support/ncbi-sra-fastq/

The datatype attribute will be automatically assigned. Avoid assigning a database attribute. If you want to annotate datasets by the source species (or any other info), consider using Tags: https://galaxyproject.org/tutorials/histories/#tagging-datasets

If the job fails again with the cleaned up inputs, it is probably too large to run at Galaxy Main https://usegalaxy.org. The tool itself has no known issues and see that you have had other successful runs using different inputs. Choices:

  • You could try running a sample/subset of the data through as a test (or for the final result, as there are many duplicated reads, see the FastQC report for details). To sub-sample randomly, convert with Fastq-to-Tabular, run the tool Select random lines from a file, then convert back with Tabular-to-Fastq.
  • Consider setting up your own Galaxy server and allocating sufficient memory. Cloudman is a good choice for many. https://galaxyproject.github.io/ and https://galaxyproject.org/choices

Note: The database you have been assigning is not hg38, but another human database. Human hg38 is the genome you had successful mapping against, and because of the fastq database assignment being different, the BAM and other results are ending up with the wrong database assignment (inherited from the fastq input) for tools like Tophat and Cufflinks. This is a known bug we are working to resolve. For now, do not assign datatype for fastq inputs. This will not be a factor for Trinity assembly but I would still avoid the database assignment for fastq/fasta inputs when using most tools. If you must assign it, make sure it is correct (the same database used in the rest of the analysis). https://github.com/galaxyproject/usegalaxy-playbook/issues/104. How to remove/adjust metadata assignments: https://galaxyproject.org/support/metadata/

Thanks! Jen, Galaxy team

ADD COMMENTlink written 3 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 148 users visited in the last hour