Galaxy Tuxedo Protocol Questions

Question: Galaxy Tuxedo Protocol Questions

3.9 years ago by

United Kingdom

howard_saw • 10 wrote:

Hi everybody,

I'm totally new to RNA-seq analyses and it's my first time doing this and using Galaxy's Tuxedo Protocol. Can somebody please help me with some questions?

My experiment includes (3 biological replicates of each strain)

i. Strain A wildtype

ii. Strain A with gene deleted

iii. Strain A with genes introduced.

I would like to see how the genome gene expression changes in (ii) and (iii) as compared to (i).

1) Cuffcompare

Am I supposed to Cuffcompare all 12 samples (3 biological replicates of each in i, ii and iii)? Or just the wildtype strain (i) as I am comparing (ii) and (iii) to the wildttype?

2) Cuffcompare vs Cuffmerge

Should I use Cuffmerge or Cuffcompare? I've read the description "Cuffcompare Or Cuffmerge" but still have no idea about which to use, as I have no knowledge in bioinformatics. I've tried to use the Cuffmerge in Galaxy but it doesn't recognize the Cufflink files, unsure why.

3) Cuffdiff

For the "Transcript" input at the top, should I use the gff3 file I've downloaded from Ensembl Bacteria or the Cuffcompare file?

The output file from Cuffdiff seems to give me XLOC numbers, how can I get the corresponding gene names?

I've tried to search the genes using the 'location' from the Cuffdiff Gene Differential Expression Testing file, but the location doesn't tally with my gff3 annotation file which I've input during Cuffcompare.

By the way, is it possible to detect genomic DNA contamination from the RNA-seq data?

I apologize for the number of questions, but can anyone please advise?

Many thanks.

Howard

rna-seq galaxy • 2.3k views

ADD COMMENT • link •

modified 3.9 years ago • written 3.9 years ago by howard_saw • 10

3.9 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

Complete protocol help is here:
http://wiki.galaxyproject.org/Support#Tools_on_the_Main_server:_RNA-seq

Please note that the URL for the tool package updated and I have yet to update the wiki. But there are redirects in place. You can also just go here directly for the external help:
http://cole-trapnell-lab.github.io/cufflinks/

1. & 2. Cuffmerge should be enough, as it includes Cuffcompare.

3. Use the GTF dataset produced by Cuffmerge as the reference annotation dataset input to Cuffdiff. Be sure to include the Emsembl GTF with this run to have that content included. It is possible to just use the external GTF file, but this will restrict the results to just known genes, probably not your goal.

For DNA contamination issues in RNA data, you don't need to worry about the DNA data passing through this particular pipeline, as only spliced alignments are considered in the analysis. If curious about the potential rate, Tophat2 provides mapping statistics. Reads with unspliced alignments can be analysed by genome position to determine if intergenic, overlap with introns, and such using Bowtie2 and then interval comparison tools along with a reference annotation datasets for specific genomic features (UCSC has many options in BED format, which is a easily converted to Interval). Try tool in Operate on Genomic Intervals and Bed Tools.

Thanks, Jen, Galaxy team

ADD COMMENT • link written 3.9 years ago by Jennifer Hillman Jackson ♦ 25k

Hi Jen,

Thank you for the reply, but I still get error message, please see below.

Best wishes,

Howard

ADD REPLY • link written 3.9 years ago by howard_saw • 10

3.9 years ago by

howard_saw • 10

United Kingdom

howard_saw • 10 wrote:

Hi Jen,

Thank you for the info.

I've tried using Cuffmerge but it kept giving error message

"Required metadata values are missing. Some of these values may not be editable by the user. Selecting "Auto-detect" will attempt to fix these values."

Is it a problem with my gff3 annotation file? I've downloaded the file from EnsemblBacteria. By the way. there is no gtf file on that website. Am I using the wrong website?

If I try using Cuffcompare and then do Cuffdiff using the Cuffcompare generated gtf with the individual Tophat files for all the three conditions (strains) with the gff3 annotation and sequence file included as reference (from EnsemblBacteria), I still cannot get the gene names in the output file.

test_id

gene_id

gene

locus

sample_1

sample_2

status

value_1

value_2

log2(fold_change)

test_stat

p_value

q_value

significant

XLOC_000001

Chromosome:0-9403

Strain A

Strain A+gene

6.36664

7.44377

0.2255

0.423301

0.4574

0.999783

XLOC_000002

Chromosome:17787-20352

Strain A

Strain A+gene

49.8228

47.5494

-0.0673778

-0.217237

0.69665

0.999783

XLOC_000003

Chromosome:23487-27502

Strain A

Strain A+gene

NOTEST

4.04985

3.85093

-0.0726607

I understand from other Discussion Threads that this maybe due to problems with the gff3 files that I'm using, and suggestions were to get gff3 files from iGenome instead. However, there is only E.coli in the iGenome. Is there other ways to solve this problem without using command lines as I have no knowledge in command line programmes.

Many of the chromosome gene locus from the table doesn't fit exactly with a gene, some of the locus span across a few genes, is this normal?

Many thanks.

Howard

ADD COMMENT • link written 3.9 years ago by howard_saw • 10

Please log in to add an answer.

Similar posts • Search »