Question: Trinity analysis error before to start ?normalization? fastdump?
0
gravatar for javiersg
10 weeks ago by
javiersg0
javiersg0 wrote:

Good morning everyone

My trinity analysis is stucked before to start. I wonder what is the hipothetical problem. The files are fastqc which was filtered before. The error insisted if my files are sra format. But I think that is a fastq format. Someone can give a clue to me.

The file: Encoding Sanger / Illumina 1.9

$ head good_quality.fastq
@SRR2147321.2.1 HW-ST994:223:C0MBWACXX:3:1101:1858:1989 length=101
CAGCTGTTGAGGGTCTTTGCCACAGCTTCCAGTCCTTCAACACTTGCTACAAGGACACCGGTCTGTGGGGCATCTACTTCGTCTCAGAGCCTTTACAGATT
+
@@@DFFFFHGHHHCGIIJIJJJIIJJJJEHIJJJJIJIGIJJIJGCIJIJJJIIJJJJJIJGGJHEHFFFDCEDC@CDDDDDDDDA>CDCDDDDDDDDDCC
@SRR2147321.4.2 HW-ST994:223:C0MBWACXX:3:1101:2284:1944 length=101
AAAAAGATGCCTCAAACCAAACATCATATGTACCAATGAGCTCATTGTTCTTGTCAGTTGAGTAACCGCTAGTTGTTTTCATCTTCCCCCTACTCACTCTG
+
@@CFFFFFHHHHHJJJJJJJJJ>HIEHGIJGGIEIICHIIJCHEHIIGHGFDGCHIJIIJGHIJJIGGGIEFCHHFFFFFDEEECEDDDDDBDDDDCDDC@
@SRR2147321.8.1 HW-ST994:223:C0MBWACXX:3:1101:2858:1986 length=101
CGCAGTTGTACTTCATGGCCAGGATACGCAGAGAGGGCTCAATGGTACCTCCACGCAGCCTGAGCACCAAATGAAGGGTGGACTCCTTCTGAATGTTGTAG

The log error file:

Left read files: $VAR1 = [
          '/N/dc2/scratch/trinity/database/files/022/dataset_22447.dat'
        ];
Right read files: $VAR1 = [
          '/N/dc2/scratch/trinity/database/files/022/dataset_22447.dat'
        ];
Trinity version: Trinity-v2.4.0
-currently using the latest production release of Trinity.

Monday, March 13, 2017: 14:07:45    CMD: java -Xmx64m -XX:ParallelGCThreads=2 -jar /gpfs/hps/soft/rhel6/trinityrnaseq/2.4.0/util/support_scripts/ExitTester.jar 0
Monday, March 13, 2017: 14:07:45    CMD: java -Xmx64m -XX:ParallelGCThreads=2 -jar /gpfs/hps/soft/rhel6/trinityrnaseq/2.4.0/util/support_scripts/ExitTester.jar 1
Monday, March 13, 2017: 14:07:45    CMD: mkdir -p /N/dc2/scratch/trinity/job_working_directory/014/14609/trinity_out_dir
Monday, March 13, 2017: 14:07:45    CMD: mkdir -p /N/dc2/scratch/trinity/job_working_directory/014/14609/trinity_out_dir/chrysalis
STARTING COLLECTL
I'M THE PARENT, COLLECTL_PID=29788
I'M THE CHILD RUNNING TRINITY
Running CMD: bash -c "set -eou pipefail && cd /N/dc2/scratch/trinity/job_working_directory/014/14609/collectl && exec nohup /gpfs/hps/soft/rhel6/trinityrnaseq/2.4.0/trinity-plugins/COLLECTL/collectl/collectl --procfilt u1392440 --interval :60 -sZ -oD | egrep 'Trinity|trinityrnaseq|samtools|sort' | egrep -v ParaFly > /N/dc2/scratch/trinity/job_working_directory/014/14609/collectl/collectl.dat " 


----------------------------------------------------------------------------------
-------------- Trinity Phase 1: Clustering of RNA-Seq Reads  ---------------------
----------------------------------------------------------------------------------

---------------------------------------------------------------
------------ In silico Read Normalization ---------------------
-- (Removing Excess Reads Beyond 50 Coverage --
---------------------------------------------------------------

# running normalization on reads: $VAR1 = [
          [
            '/N/dc2/scratch/trinity/database/files/022/dataset_22447.dat'
          ],
          [
            '/N/dc2/scratch/trinity/database/files/022/dataset_22447.dat'
          ]
        ];


Monday, March 13, 2017: 14:07:45    CMD: /gpfs/hps/soft/rhel6/trinityrnaseq/2.4.0/util/insilico_read_normalization.pl --seqType fq --JM 200G  --max_cov 50 --CPU 8 --output /N/dc2/scratch/trinity/job_working_directory/014/14609/trinity_out_dir/insilico_read_normalization   --max_pct_stdev 10000  --left /N/dc2/scratch/trinity/database/files/022/dataset_22447.dat --right /N/dc2/scratch/trinity/database/files/022/dataset_22447.dat --pairs_together --PARALLEL_STATS  
Converting input files. (both directions in parallel)CMD: seqtk-trinity seq -A /N/dc2/scratch/trinity/database/files/022/dataset_22447.dat >> left.fa
CMD: seqtk-trinity seq -A /N/dc2/scratch/trinity/database/files/022/dataset_22447.dat >> right.fa
Error, not recognizing read name formatting: [SRR2147321.2.1]

If your data come from SRA, be sure to dump the fastq file like so:

    SRA_TOOLKIT/fastq-dump --defline-seq '@$sn[_$rn]/$ri' --split-files file.sra 

Error, not recognizing read name formatting: [SRR2147321.2.1]

If your data come from SRA, be sure to dump the fastq file like so:

    SRA_TOOLKIT/fastq-dump --defline-seq '@$sn[_$rn]/$ri' --split-files file.sra 

Thread 1 terminated abnormally: Error, cmd: seqtk-trinity seq -A /N/dc2/scratch/trinity/database/files/022/dataset_22447.dat >> left.fa died with ret 512 at /gpfs/hps/soft/rhel6/trinityrnaseq/2.4.0/util/insilico_read_normalization.pl line 758.
Thread 2 terminated abnormally: Error, cmd: seqtk-trinity seq -A /N/dc2/scratch/trinity/database/files/022/dataset_22447.dat >> right.fa died with ret 512 at /gpfs/hps/soft/rhel6/trinityrnaseq/2.4.0/util/insilico_read_normalization.pl line 758.
Error, conversion thread failed at /gpfs/hps/soft/rhel6/trinityrnaseq/2.4.0/util/insilico_read_normalization.pl line 329.
Error, cmd: /gpfs/hps/soft/rhel6/trinityrnaseq/2.4.0/util/insilico_read_normalization.pl --seqType fq --JM 200G  --max_cov 50 --CPU 8 --output /N/dc2/scratch/trinity/job_working_directory/014/14609/trinity_out_dir/insilico_read_normalization   --max_pct_stdev 10000  --left /N/dc2/scratch/trinity/database/files/022/dataset_22447.dat --right /N/dc2/scratch/trinity/database/files/022/dataset_22447.dat --pairs_together --PARALLEL_STATS   died with ret 6400 at /N/soft/rhel6/trinityrnaseq/2.4.0/Trinity line 2462.
TERMINATING COLLECTL, PID = 29788
Statistics:
===========
Trinity Version:      Trinity-v2.4.0
Compiler:             Intel
Trinity Parameters:   --max_memory 200G --CPU 8 --normalize_reads --monitoring --left /N/dc2/scratch/trinity/database/files/022/dataset_22447.dat --right /N/dc2/scratch/trinity/database/files/022/dataset_22447.dat --seqType fq
error trinity normalization • 160 views
ADD COMMENTlink modified 9 weeks ago • written 10 weeks ago by javiersg0

The data appears to be in fastq format. Is the datatype "fastqsanger" assigned to the dataset? (It has this quality score scaling and the datatype assignment is needed).

There could also be internal errors. Run the tool Fastq Groomer on the dataset - it produces informative error messages if there is a format problem. It doesn't matter what setting are used - the output would be disposable (perm delete once done with it).

Let us know how that goes, Jen, Galaxy team

ADD REPLYlink written 10 weeks ago by Jennifer Hillman Jackson22k
1
gravatar for lecorguille
9 weeks ago by
lecorguille10
lecorguille10 wrote:

A summary of your issue: https://github.com/trinityrnaseq/trinityrnaseq/issues/66 and https://groups.google.com/forum/#!topic/trinityrnaseq-users/L_5huffafC0

But I don't know if there is a tool to help you in this convertion :/

ADD COMMENTlink written 9 weeks ago by lecorguille10
0
gravatar for javiersg
9 weeks ago by
javiersg0
javiersg0 wrote:

Good morning Jennifer.

That is the showed print by fastq groomer

Groomed 18487694 sanger reads into sanger reads.
Based upon quality and sequence, the input data is valid for: sanger
Input ASCII range: '5'(53) - 'J'(74)
Input decimal range: 20 - 41

I guess that the sequences arent paired-end.

ADD COMMENTlink written 9 weeks ago by javiersg0

The data are paired end. Consider obtaining the fastq from SRA ENA. The two links for Fastq files (galaxy) will place the forward and reverse reads into your history where they can be input into the Trinity tool form as distinct datasets.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Jennifer Hillman Jackson22k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 71 users visited in the last hour