Question: Differential Expression in bacteria RNA-seq
gravatar for Lorenzo.Pavarini
10 days ago by
Lorenzo.Pavarini0 wrote:

Hello, I got a problem in Cufflinks: I need to analyze fastq file from a RNA-seq (I got bacteria RNA),and I wuold like to use Bowtie2, Cufflinks, Cuffmerge and Cuffdiff to do the Differential Expression of 2 set of sample. I am just trying with 1 sample, The fastqc is ok Bowtie works But when I use cufflink with the set of BAM file from Bowtie and a gff3 from the same genome use as fasta in Bowtie I got an error. Do you know where I am wrong?

Thanks Lorenzo

bacteria cufflinks rna-seq • 55 views
ADD COMMENTlink modified 2 days ago • written 10 days ago by Lorenzo.Pavarini0

Actually, I think that the problem could my in my GFF3 file. If I open it with excel it is like this

##gff-version 3                             
##source-version geneious 9.1.8                             

    MRx0004_ER_19051    Geneious    CDS 1   1467    .   +   .   ID=Chromosomal replication initiator protein DnaA CDS
    MRx0004_ER_19051    Geneious    CDS 2044    3168    .   +   .   ID=DNA polymerase III beta subunit (EC CDS
    MRx0004_ER_19051    Geneious    CDS 3244    4350    .   +   .   ID=DNA recombination and repair protein RecF CDS
    MRx0004_ER_19051    Geneious    CDS 4347    4817    .   +   .   ID=Zn-ribbon-containing%2C possibly RNA-binding protein and truncated derivatives CDS
ADD REPLYlink written 2 days ago by Lorenzo.Pavarini0
gravatar for Jennifer Hillman Jackson
10 days ago by
United States
Jennifer Hillman Jackson25k wrote:


Try sorting the results from Bowtie2 by coordinate, then using that with Cufflinks (and later, Cuffdiff).


  • My job ended with an error. What can I do?
  • Tool error? Try Sorting Your Inputs
  • Mismatched Chromosome identifiers (and how to avoid them)

There are updated methods that you should consider. The Cufflinks suite is considered deprecated. Galaxy RNA-seq tutorials:

Thanks! Jen, Galaxy team

ADD COMMENTlink written 10 days ago by Jennifer Hillman Jackson25k
gravatar for Lorenzo.Pavarini
2 days ago by
Lorenzo.Pavarini0 wrote:

Good Afternoon,

Right now I am trying to use the workflow by devikasub ( for the differential expression of my bacteria RNA. I follow the procedure and everything works great until Cuffmerge. Who give me this error:

**Fatal error: Matched on Error
Error running cuffmerge. 
[Tue Jun 19 08:16:40 2018] Beginning transcriptome assembly merge
[Tue Jun 19 08:16:40 2018] Preparing output location cm_output/
[Tue Jun 19 08:16:40 2018] Converting GTF files to SAM
[08:16:40] Loading reference annotation.
[08:16:40] Loading reference annotation.
[Tue Jun 19 08:16:41 2018] Quantitating transcripts
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (
Command line:
cufflinks -o cm_output/ -F 0.05 -g /galaxy-repl/main/files/025/780/dataset_25780598.dat -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 6 cm_output/tmp/mergeSam_filelAw1I7 
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
File cm_output/tmp/mergeSam_filelAw1I7 doesn't appear to be a valid BAM file, trying SAM...
[08:16:41] Loading reference annotation.
[08:16:41] Inspecting reads and determining fragment length distribution.
Processed 1339 loci.    

Map Properties:
    Normalized Map Mass: 4252.00
    Raw Map Mass: 4252.00
    Fragment Length Distribution: Truncated Gaussian (default)
               Default Mean: 200
               Default Std Dev: 80
[08:16:41] Assembling transcripts and estimating abundances.
Processed 1339 loci.                        
[Tue Jun 19 08:16:42 2018] Comparing against reference file /galaxy-repl/main/files/025/780/dataset_25780598.dat
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (
No fasta index found for ref.fa. Rebuilding, please wait..
Fasta index rebuilt.
GFF Error: duplicate/invalid 'transcript' feature ID=MSM (multiple sugar metabolism) operon regulatory protein CDS
Error: could not execute cuffcompare**

Do you know what is the problem and how to solve it? Can I avoid cuffmerge ? For exemple, StringeTie can produce output for DESeq, could I do that, and use DESeq insted of Cuffmerge/Cuffdiff ?

Thanks for your help Lorenzo

ADD COMMENTlink written 2 days ago by Lorenzo.Pavarini0

The problem is with the content of the GFF3 (duplicate "ID" values). These duplicates are not accepted by this tool suite. Some data providers release the data that way. The reference annotation can be edited to remove duplicates (manually, will result in data loss, and is not recommended). Or, you can find or create a GTF version of the annotation and use that (tool: gffread can convert GFF3 to GTF).

That said, updating your workflow to the current best practices (HISAT2, Stringtie, DeSeq2 and others) is a better plan. You'll still need a file in GTF format (or GFF -- not GFF3 -- that shares key features of a GTF file).

Thanks, Jen, Galaxy team

ADD REPLYlink written 2 days ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 105 users visited in the last hour