Question: Differential Expression in bacteria RNA-seq
0
gravatar for Lorenzo.Pavarini
5 months ago by
Lorenzo.Pavarini20 wrote:

Hello, I got a problem in Cufflinks: I need to analyze fastq file from a RNA-seq (I got bacteria RNA),and I wuold like to use Bowtie2, Cufflinks, Cuffmerge and Cuffdiff to do the Differential Expression of 2 set of sample. I am just trying with 1 sample, The fastqc is ok Bowtie works But when I use cufflink with the set of BAM file from Bowtie and a gff3 from the same genome use as fasta in Bowtie I got an error. Do you know where I am wrong?

Thanks Lorenzo

bacteria cufflinks rna-seq • 456 views
ADD COMMENTlink modified 5 months ago • written 5 months ago by Lorenzo.Pavarini20

Actually, I think that the problem could my in my GFF3 file. If I open it with excel it is like this

##gff-version 3                             
##source-version geneious 9.1.8                             

    MRx0004_ER_19051    Geneious    CDS 1   1467    .   +   .   ID=Chromosomal replication initiator protein DnaA CDS
    MRx0004_ER_19051    Geneious    CDS 2044    3168    .   +   .   ID=DNA polymerase III beta subunit (EC 2.7.7.7) CDS
    MRx0004_ER_19051    Geneious    CDS 3244    4350    .   +   .   ID=DNA recombination and repair protein RecF CDS
    MRx0004_ER_19051    Geneious    CDS 4347    4817    .   +   .   ID=Zn-ribbon-containing%2C possibly RNA-binding protein and truncated derivatives CDS
ADD REPLYlink written 5 months ago by Lorenzo.Pavarini20
3
gravatar for Jennifer Hillman Jackson
5 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

Try sorting the results from Bowtie2 by coordinate, then using that with Cufflinks (and later, Cuffdiff).

FAQs: https://galaxyproject.org/support/#troubleshooting

  • My job ended with an error. What can I do?
  • Tool error? Try Sorting Your Inputs
  • Mismatched Chromosome identifiers (and how to avoid them)

There are updated methods that you should consider. The Cufflinks suite is considered deprecated. Galaxy RNA-seq tutorials:

Thanks! Jen, Galaxy team

ADD COMMENTlink written 5 months ago by Jennifer Hillman Jackson25k
0
gravatar for Lorenzo.Pavarini
5 months ago by
Lorenzo.Pavarini20 wrote:

Good Afternoon,

Right now I am trying to use the workflow by devikasub (https://usegalaxy.org/u/devikasub/w/workflow-constructed-from-history-gccworkflow-1) for the differential expression of my bacteria RNA. I follow the procedure and everything works great until Cuffmerge. Who give me this error:

**Fatal error: Matched on Error
Error running cuffmerge. 
[Tue Jun 19 08:16:40 2018] Beginning transcriptome assembly merge
-------------------------------------------
[Tue Jun 19 08:16:40 2018] Preparing output location cm_output/
[Tue Jun 19 08:16:40 2018] Converting GTF files to SAM
[08:16:40] Loading reference annotation.
[08:16:40] Loading reference annotation.
[Tue Jun 19 08:16:41 2018] Quantitating transcripts
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
Command line:
cufflinks -o cm_output/ -F 0.05 -g /galaxy-repl/main/files/025/780/dataset_25780598.dat -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 6 cm_output/tmp/mergeSam_filelAw1I7 
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
File cm_output/tmp/mergeSam_filelAw1I7 doesn't appear to be a valid BAM file, trying SAM...
[08:16:41] Loading reference annotation.
[08:16:41] Inspecting reads and determining fragment length distribution.
Processed 1339 loci.    

Map Properties:
    Normalized Map Mass: 4252.00
    Raw Map Mass: 4252.00
    Fragment Length Distribution: Truncated Gaussian (default)
               Default Mean: 200
               Default Std Dev: 80
[08:16:41] Assembling transcripts and estimating abundances.
Processed 1339 loci.                        
[Tue Jun 19 08:16:42 2018] Comparing against reference file /galaxy-repl/main/files/025/780/dataset_25780598.dat
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
No fasta index found for ref.fa. Rebuilding, please wait..
Fasta index rebuilt.
GFF Error: duplicate/invalid 'transcript' feature ID=MSM (multiple sugar metabolism) operon regulatory protein CDS
    [FAILED]
Error: could not execute cuffcompare**

Do you know what is the problem and how to solve it? Can I avoid cuffmerge ? For exemple, StringeTie can produce output for DESeq, could I do that, and use DESeq insted of Cuffmerge/Cuffdiff ?

Thanks for your help Lorenzo

ADD COMMENTlink written 5 months ago by Lorenzo.Pavarini20

The problem is with the content of the GFF3 (duplicate "ID" values). These duplicates are not accepted by this tool suite. Some data providers release the data that way. The reference annotation can be edited to remove duplicates (manually, will result in data loss, and is not recommended). Or, you can find or create a GTF version of the annotation and use that (tool: gffread can convert GFF3 to GTF).

That said, updating your workflow to the current best practices (HISAT2, Stringtie, DeSeq2 and others) is a better plan. You'll still need a file in GTF format (or GFF -- not GFF3 -- that shares key features of a GTF file).

Thanks, Jen, Galaxy team

ADD REPLYlink written 5 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 175 users visited in the last hour