Good Afternoon,
Right now I am trying to use the workflow by devikasub (https://usegalaxy.org/u/devikasub/w/workflow-constructed-from-history-gccworkflow-1) for the differential expression of my bacteria RNA. I follow the procedure and everything works great until Cuffmerge. Who give me this error:
**Fatal error: Matched on Error
Error running cuffmerge.
[Tue Jun 19 08:16:40 2018] Beginning transcriptome assembly merge
-------------------------------------------
[Tue Jun 19 08:16:40 2018] Preparing output location cm_output/
[Tue Jun 19 08:16:40 2018] Converting GTF files to SAM
[08:16:40] Loading reference annotation.
[08:16:40] Loading reference annotation.
[Tue Jun 19 08:16:41 2018] Quantitating transcripts
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
Command line:
cufflinks -o cm_output/ -F 0.05 -g /galaxy-repl/main/files/025/780/dataset_25780598.dat -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 6 cm_output/tmp/mergeSam_filelAw1I7
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
File cm_output/tmp/mergeSam_filelAw1I7 doesn't appear to be a valid BAM file, trying SAM...
[08:16:41] Loading reference annotation.
[08:16:41] Inspecting reads and determining fragment length distribution.
Processed 1339 loci.
Map Properties:
Normalized Map Mass: 4252.00
Raw Map Mass: 4252.00
Fragment Length Distribution: Truncated Gaussian (default)
Default Mean: 200
Default Std Dev: 80
[08:16:41] Assembling transcripts and estimating abundances.
Processed 1339 loci.
[Tue Jun 19 08:16:42 2018] Comparing against reference file /galaxy-repl/main/files/025/780/dataset_25780598.dat
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
No fasta index found for ref.fa. Rebuilding, please wait..
Fasta index rebuilt.
GFF Error: duplicate/invalid 'transcript' feature ID=MSM (multiple sugar metabolism) operon regulatory protein CDS
[FAILED]
Error: could not execute cuffcompare**
Do you know what is the problem and how to solve it? Can I avoid cuffmerge ? For exemple, StringeTie can produce output for DESeq, could I do that, and use DESeq insted of Cuffmerge/Cuffdiff ?
Thanks for your help
Lorenzo
Actually, I think that the problem could my in my GFF3 file. If I open it with excel it is like this