Question: Reference genome/ annotation
I am running apple RNAseq data. When I do Cufflink, Cuffcompare, and Cuffdiff, what reference genome or annotation should I use for each process.

The reference data I have are:





They are not probably the correct ones, what types of genome data should I use for each purpose?

Also, the Galaxy only take GFF3/GTF file, how should I do with GFF files if I need to use them?


Thank you,


rna-seq • 1.1k views
We can help .. here are the guidelines for common usage with the Tuxedo RNA-seq pipeline using a reference genome that is not-native to the Galaxy instance you happen to be working on:

1. The tuxedo pipeline will accept "reference annotation" in GTF or GFF3 format. GFF is not supported (will not contain the transcript/gene identifiers necessary to be useful). So you will be using this file: Malus_x_domestica.v1.0-primary.transcripts.gff3

2. The "reference genome" you need to be using is same consensus genomic backbone that the gff3 is based on (the chromosome identifiers and coordinates the transcripts/gene bounds are mapped to). This would be " Malus_x_domestica.v1.0"? You want the .fasta version of the genome loaded into Galaxy. As a "Custom Reference Genome".

3. You data for "Malus_x_domestica.v1.0.consensus2contigs.gff" and "apple_genome_contigs.nuc" maps back details about how the "reference annotation" was created from the source genomic contigs. Useful in case there is transcript assembly/gene assembly or splice variant question/discrepancy in a region and you wish to investigate (real or artifact/sequencing issue).

Key links to help and many more details (including tutorials, etc):

I have made some assumptions from the given information, so please add clarification where I have misunderstood the context of the available inputs and we can work from there to further customize a solution. 

Take care! Jen, Galaxy team

ADD COMMENTlink written 3.9 years ago by Jennifer Hillman Jackson25k
