Differential Expression in bacteria RNA-seq

Question: Differential Expression in bacteria RNA-seq

5 months ago by

Hello, I got a problem in Cufflinks: I need to analyze fastq file from a RNA-seq (I got bacteria RNA),and I wuold like to use Bowtie2, Cufflinks, Cuffmerge and Cuffdiff to do the Differential Expression of 2 set of sample. I am just trying with 1 sample, The fastqc is ok Bowtie works But when I use cufflink with the set of BAM file from Bowtie and a gff3 from the same genome use as fasta in Bowtie I got an error. Do you know where I am wrong?

Thanks Lorenzo

bacteria cufflinks rna-seq • 456 views

ADD COMMENT • link •

modified 5 months ago • written 5 months ago by Lorenzo.Pavarini • 20

Actually, I think that the problem could my in my GFF3 file. If I open it with excel it is like this

##gff-version 3                             
##source-version geneious 9.1.8                             

    MRx0004_ER_19051    Geneious    CDS 1   1467    .   +   .   ID=Chromosomal replication initiator protein DnaA CDS
    MRx0004_ER_19051    Geneious    CDS 2044    3168    .   +   .   ID=DNA polymerase III beta subunit (EC 2.7.7.7) CDS
    MRx0004_ER_19051    Geneious    CDS 3244    4350    .   +   .   ID=DNA recombination and repair protein RecF CDS
    MRx0004_ER_19051    Geneious    CDS 4347    4817    .   +   .   ID=Zn-ribbon-containing%2C possibly RNA-binding protein and truncated derivatives CDS

ADD REPLY • link written 5 months ago by Lorenzo.Pavarini • 20

5 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

Try sorting the results from Bowtie2 by coordinate, then using that with Cufflinks (and later, Cuffdiff).

FAQs: https://galaxyproject.org/support/#troubleshooting

My job ended with an error. What can I do?
Tool error? Try Sorting Your Inputs
Mismatched Chromosome identifiers (and how to avoid them)

There are updated methods that you should consider. The Cufflinks suite is considered deprecated. Galaxy RNA-seq tutorials:

https://galaxyproject.org/learn/

Thanks! Jen, Galaxy team

ADD COMMENT • link written 5 months ago by Jennifer Hillman Jackson ♦ 25k

5 months ago by

Lorenzo.Pavarini • 20

Lorenzo.Pavarini • 20 wrote:

Good Afternoon,

Right now I am trying to use the workflow by devikasub (https://usegalaxy.org/u/devikasub/w/workflow-constructed-from-history-gccworkflow-1) for the differential expression of my bacteria RNA. I follow the procedure and everything works great until Cuffmerge. Who give me this error:

**Fatal error: Matched on Error
Error running cuffmerge. 
[Tue Jun 19 08:16:40 2018] Beginning transcriptome assembly merge
-------------------------------------------
[Tue Jun 19 08:16:40 2018] Preparing output location cm_output/
[Tue Jun 19 08:16:40 2018] Converting GTF files to SAM
[08:16:40] Loading reference annotation.
[08:16:40] Loading reference annotation.
[Tue Jun 19 08:16:41 2018] Quantitating transcripts
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
Command line:
cufflinks -o cm_output/ -F 0.05 -g /galaxy-repl/main/files/025/780/dataset_25780598.dat -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 6 cm_output/tmp/mergeSam_filelAw1I7 
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
File cm_output/tmp/mergeSam_filelAw1I7 doesn't appear to be a valid BAM file, trying SAM...
[08:16:41] Loading reference annotation.
[08:16:41] Inspecting reads and determining fragment length distribution.
Processed 1339 loci.    

Map Properties:
    Normalized Map Mass: 4252.00
    Raw Map Mass: 4252.00
    Fragment Length Distribution: Truncated Gaussian (default)
               Default Mean: 200
               Default Std Dev: 80
[08:16:41] Assembling transcripts and estimating abundances.
Processed 1339 loci.                        
[Tue Jun 19 08:16:42 2018] Comparing against reference file /galaxy-repl/main/files/025/780/dataset_25780598.dat
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
No fasta index found for ref.fa. Rebuilding, please wait..
Fasta index rebuilt.
GFF Error: duplicate/invalid 'transcript' feature ID=MSM (multiple sugar metabolism) operon regulatory protein CDS
    [FAILED]
Error: could not execute cuffcompare**

Do you know what is the problem and how to solve it? Can I avoid cuffmerge ? For exemple, StringeTie can produce output for DESeq, could I do that, and use DESeq insted of Cuffmerge/Cuffdiff ?

Thanks for your help Lorenzo

ADD COMMENT • link written 5 months ago by Lorenzo.Pavarini • 20

The problem is with the content of the GFF3 (duplicate "ID" values). These duplicates are not accepted by this tool suite. Some data providers release the data that way. The reference annotation can be edited to remove duplicates (manually, will result in data loss, and is not recommended). Or, you can find or create a GTF version of the annotation and use that (tool: gffread can convert GFF3 to GTF).

That said, updating your workflow to the current best practices (HISAT2, Stringtie, DeSeq2 and others) is a better plan. You'll still need a file in GTF format (or GFF -- not GFF3 -- that shares key features of a GTF file).

Galaxy RNA-seq tutorials: https://galaxyproject.org/learn/
FAQs (datatypes): https://galaxyproject.org/learn/ >> Common datatypes explained

Thanks, Jen, Galaxy team

ADD REPLY • link written 5 months ago by Jennifer Hillman Jackson ♦ 25k

Please log in to add an answer.

Similar posts • Search »