Good morning everyone
My trinity analysis is stucked before to start. I wonder what is the hipothetical problem. The files are fastqc which was filtered before. The error insisted if my files are sra format. But I think that is a fastq format. Someone can give a clue to me.
The file: Encoding Sanger / Illumina 1.9
$ head good_quality.fastq
@SRR2147321.2.1 HW-ST994:223:C0MBWACXX:3:1101:1858:1989 length=101
CAGCTGTTGAGGGTCTTTGCCACAGCTTCCAGTCCTTCAACACTTGCTACAAGGACACCGGTCTGTGGGGCATCTACTTCGTCTCAGAGCCTTTACAGATT
+
@@@DFFFFHGHHHCGIIJIJJJIIJJJJEHIJJJJIJIGIJJIJGCIJIJJJIIJJJJJIJGGJHEHFFFDCEDC@CDDDDDDDDA>CDCDDDDDDDDDCC
@SRR2147321.4.2 HW-ST994:223:C0MBWACXX:3:1101:2284:1944 length=101
AAAAAGATGCCTCAAACCAAACATCATATGTACCAATGAGCTCATTGTTCTTGTCAGTTGAGTAACCGCTAGTTGTTTTCATCTTCCCCCTACTCACTCTG
+
@@CFFFFFHHHHHJJJJJJJJJ>HIEHGIJGGIEIICHIIJCHEHIIGHGFDGCHIJIIJGHIJJIGGGIEFCHHFFFFFDEEECEDDDDDBDDDDCDDC@
@SRR2147321.8.1 HW-ST994:223:C0MBWACXX:3:1101:2858:1986 length=101
CGCAGTTGTACTTCATGGCCAGGATACGCAGAGAGGGCTCAATGGTACCTCCACGCAGCCTGAGCACCAAATGAAGGGTGGACTCCTTCTGAATGTTGTAG
The log error file:
Left read files: $VAR1 = [
'/N/dc2/scratch/trinity/database/files/022/dataset_22447.dat'
];
Right read files: $VAR1 = [
'/N/dc2/scratch/trinity/database/files/022/dataset_22447.dat'
];
Trinity version: Trinity-v2.4.0
-currently using the latest production release of Trinity.
Monday, March 13, 2017: 14:07:45 CMD: java -Xmx64m -XX:ParallelGCThreads=2 -jar /gpfs/hps/soft/rhel6/trinityrnaseq/2.4.0/util/support_scripts/ExitTester.jar 0
Monday, March 13, 2017: 14:07:45 CMD: java -Xmx64m -XX:ParallelGCThreads=2 -jar /gpfs/hps/soft/rhel6/trinityrnaseq/2.4.0/util/support_scripts/ExitTester.jar 1
Monday, March 13, 2017: 14:07:45 CMD: mkdir -p /N/dc2/scratch/trinity/job_working_directory/014/14609/trinity_out_dir
Monday, March 13, 2017: 14:07:45 CMD: mkdir -p /N/dc2/scratch/trinity/job_working_directory/014/14609/trinity_out_dir/chrysalis
STARTING COLLECTL
I'M THE PARENT, COLLECTL_PID=29788
I'M THE CHILD RUNNING TRINITY
Running CMD: bash -c "set -eou pipefail && cd /N/dc2/scratch/trinity/job_working_directory/014/14609/collectl && exec nohup /gpfs/hps/soft/rhel6/trinityrnaseq/2.4.0/trinity-plugins/COLLECTL/collectl/collectl --procfilt u1392440 --interval :60 -sZ -oD | egrep 'Trinity|trinityrnaseq|samtools|sort' | egrep -v ParaFly > /N/dc2/scratch/trinity/job_working_directory/014/14609/collectl/collectl.dat "
----------------------------------------------------------------------------------
-------------- Trinity Phase 1: Clustering of RNA-Seq Reads ---------------------
----------------------------------------------------------------------------------
---------------------------------------------------------------
------------ In silico Read Normalization ---------------------
-- (Removing Excess Reads Beyond 50 Coverage --
---------------------------------------------------------------
# running normalization on reads: $VAR1 = [
[
'/N/dc2/scratch/trinity/database/files/022/dataset_22447.dat'
],
[
'/N/dc2/scratch/trinity/database/files/022/dataset_22447.dat'
]
];
Monday, March 13, 2017: 14:07:45 CMD: /gpfs/hps/soft/rhel6/trinityrnaseq/2.4.0/util/insilico_read_normalization.pl --seqType fq --JM 200G --max_cov 50 --CPU 8 --output /N/dc2/scratch/trinity/job_working_directory/014/14609/trinity_out_dir/insilico_read_normalization --max_pct_stdev 10000 --left /N/dc2/scratch/trinity/database/files/022/dataset_22447.dat --right /N/dc2/scratch/trinity/database/files/022/dataset_22447.dat --pairs_together --PARALLEL_STATS
Converting input files. (both directions in parallel)CMD: seqtk-trinity seq -A /N/dc2/scratch/trinity/database/files/022/dataset_22447.dat >> left.fa
CMD: seqtk-trinity seq -A /N/dc2/scratch/trinity/database/files/022/dataset_22447.dat >> right.fa
Error, not recognizing read name formatting: [SRR2147321.2.1]
If your data come from SRA, be sure to dump the fastq file like so:
SRA_TOOLKIT/fastq-dump --defline-seq '@$sn[_$rn]/$ri' --split-files file.sra
Error, not recognizing read name formatting: [SRR2147321.2.1]
If your data come from SRA, be sure to dump the fastq file like so:
SRA_TOOLKIT/fastq-dump --defline-seq '@$sn[_$rn]/$ri' --split-files file.sra
Thread 1 terminated abnormally: Error, cmd: seqtk-trinity seq -A /N/dc2/scratch/trinity/database/files/022/dataset_22447.dat >> left.fa died with ret 512 at /gpfs/hps/soft/rhel6/trinityrnaseq/2.4.0/util/insilico_read_normalization.pl line 758.
Thread 2 terminated abnormally: Error, cmd: seqtk-trinity seq -A /N/dc2/scratch/trinity/database/files/022/dataset_22447.dat >> right.fa died with ret 512 at /gpfs/hps/soft/rhel6/trinityrnaseq/2.4.0/util/insilico_read_normalization.pl line 758.
Error, conversion thread failed at /gpfs/hps/soft/rhel6/trinityrnaseq/2.4.0/util/insilico_read_normalization.pl line 329.
Error, cmd: /gpfs/hps/soft/rhel6/trinityrnaseq/2.4.0/util/insilico_read_normalization.pl --seqType fq --JM 200G --max_cov 50 --CPU 8 --output /N/dc2/scratch/trinity/job_working_directory/014/14609/trinity_out_dir/insilico_read_normalization --max_pct_stdev 10000 --left /N/dc2/scratch/trinity/database/files/022/dataset_22447.dat --right /N/dc2/scratch/trinity/database/files/022/dataset_22447.dat --pairs_together --PARALLEL_STATS died with ret 6400 at /N/soft/rhel6/trinityrnaseq/2.4.0/Trinity line 2462.
TERMINATING COLLECTL, PID = 29788
Statistics:
===========
Trinity Version: Trinity-v2.4.0
Compiler: Intel
Trinity Parameters: --max_memory 200G --CPU 8 --normalize_reads --monitoring --left /N/dc2/scratch/trinity/database/files/022/dataset_22447.dat --right /N/dc2/scratch/trinity/database/files/022/dataset_22447.dat --seqType fq
The data appears to be in fastq format. Is the datatype "fastqsanger" assigned to the dataset? (It has this quality score scaling and the datatype assignment is needed).
There could also be internal errors. Run the tool Fastq Groomer on the dataset - it produces informative error messages if there is a format problem. It doesn't matter what setting are used - the output would be disposable (perm delete once done with it).
Let us know how that goes, Jen, Galaxy team