Question: Galaxy Tuxedo Protocol Questions
gravatar for howard_saw
3.9 years ago by
United Kingdom
howard_saw10 wrote:

Hi everybody,

I'm totally new to RNA-seq analyses and it's my first time doing this and using Galaxy's Tuxedo Protocol. Can somebody please help me with some questions?

My experiment includes (3 biological replicates of each strain)

i. Strain A wildtype

ii. Strain A with gene deleted

iii. Strain A with genes introduced.

I would like to see how the genome gene expression changes in (ii) and (iii) as compared to (i).

1) Cuffcompare

Am I supposed to Cuffcompare all 12 samples (3 biological replicates of each in i, ii and iii)? Or just the wildtype strain (i) as I am comparing (ii) and (iii) to the wildttype?

2) Cuffcompare vs Cuffmerge

Should I use Cuffmerge or Cuffcompare? I've read the description "Cuffcompare Or Cuffmerge" but still have no idea about which to use, as I have no knowledge in bioinformatics. I've tried to use the Cuffmerge in Galaxy but it doesn't recognize the Cufflink files, unsure why.

3) Cuffdiff

For the "Transcript" input at the top, should I use the gff3 file I've downloaded from Ensembl Bacteria or the Cuffcompare file?

The output file from Cuffdiff seems to give me XLOC numbers, how can I get the corresponding gene names?

I've tried to search the genes using the 'location' from the Cuffdiff Gene Differential Expression Testing file, but the location doesn't tally with my gff3 annotation file which I've input during Cuffcompare.

By the way, is it possible to detect genomic DNA contamination from the RNA-seq data?


I apologize for the number of questions, but can anyone please advise?

Many thanks.



rna-seq galaxy • 2.3k views
ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by howard_saw10
gravatar for Jennifer Hillman Jackson
3.9 years ago by
United States
Jennifer Hillman Jackson25k wrote:


Complete protocol help is here:

Please note that the URL for the tool package updated and I have yet to update the wiki. But there are redirects in place. You can also just go here directly for the external help:

1. & 2. Cuffmerge should be enough, as it includes Cuffcompare.

3. Use the GTF dataset produced by Cuffmerge as the reference annotation dataset input to Cuffdiff. Be sure to include the Emsembl GTF with this run to have that content included. It is possible to just use the external GTF file, but this will restrict the results to just known genes, probably not your goal.

For DNA contamination issues in RNA data, you don't need to worry about the DNA data passing through this particular pipeline, as only spliced alignments are considered in the analysis. If curious about the potential rate, Tophat2 provides mapping statistics. Reads with unspliced alignments can be analysed by genome position to determine if intergenic, overlap with introns, and such using Bowtie2 and then interval comparison tools along with a reference annotation datasets for specific genomic features (UCSC has many options in BED format, which is a easily converted to Interval). Try tool in Operate on Genomic Intervals and Bed Tools.

Thanks, Jen, Galaxy team


ADD COMMENTlink written 3.9 years ago by Jennifer Hillman Jackson25k

Hi Jen,

Thank you for the reply, but I still get error message, please see below.

Best wishes,


ADD REPLYlink written 3.9 years ago by howard_saw10
gravatar for howard_saw
3.9 years ago by
United Kingdom
howard_saw10 wrote:

Hi Jen,

Thank you for the info.

I've tried using Cuffmerge but it kept giving error message

"Required metadata values are missing. Some of these values may not be editable by the user. Selecting "Auto-detect" will attempt to fix these values."

Is it a problem with my gff3 annotation file? I've downloaded the file from EnsemblBacteria. By the way. there is no gtf file on that website. Am I using the wrong website?

If I try using Cuffcompare and then do Cuffdiff using the Cuffcompare generated gtf with the individual Tophat files for all the three conditions (strains) with the gff3 annotation and sequence file included as reference (from EnsemblBacteria), I still cannot get the gene names in the output file.

test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant
XLOC_000001 XLOC_000001 - Chromosome:0-9403 Strain A Strain A+gene OK 6.36664 7.44377 0.2255 0.423301 0.4574 0.999783 no
XLOC_000002 XLOC_000002 - Chromosome:17787-20352 Strain A Strain A+gene OK 49.8228 47.5494 -0.0673778 -0.217237 0.69665 0.999783 no
XLOC_000003 XLOC_000003 - Chromosome:23487-27502 Strain A Strain A+gene NOTEST 4.04985 3.85093 -0.0726607 0 1 1 no

I understand from other Discussion Threads that this maybe due to problems with the gff3 files that I'm using, and suggestions were to get gff3 files from iGenome instead. However, there is only E.coli in the iGenome. Is there other ways to solve this problem without using command lines as I have no knowledge in command line programmes.

Many of the chromosome gene locus from the table doesn't fit exactly with a gene, some of the locus span across a few genes, is this normal?

Many thanks.




ADD COMMENTlink written 3.9 years ago by howard_saw10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 165 users visited in the last hour