TopHat and Microbial RNASeq

Question: TopHat and Microbial RNASeq

3.8 years ago by

gkuffel22 • 170

United States

gkuffel22 • 170 wrote:

Hi everyone,

I am trying to use TopHat for RNASeq analysis of the bacterium Vibrio fischeri. I've run into some problems, first this genome is obviously not built into Galaxy. I finally figured out how to build a custom genome so I'm good there but now I believe I need a GTF file for the gene annotation and I have no idea where to find this or how to build this. Does anyone have any expertise in this?

rna-seq • 1.3k views

ADD COMMENT • link •

modified 3.8 years ago by Jennifer Hillman Jackson ♦ 25k • written 3.8 years ago by gkuffel22 • 170

3.8 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

A reference annotation dataset is probably best obtained from a data provider, if one can be identified. And you can run the pipeline without annotation. The results would just reflect the content of your NGS sequence inputs and certain features of tools (such as Cuffdiff) would not be utilized. More about the annotation features used by these tools can be found at the Cufflinks web site: http://cole-trapnell-lab.github.io/cufflinks/

This genome is hosted in the UCSC Archaeal Genome Browser http://archaea.ucsc.edu. The availability and type of annotation varies by strain. Also review the "Resources" tab, one of these research groups may have the annotation data you want, in GTF or GFF3 format. There are almost certainly other options. Reviewing publications is probably a good place to start (to gain insight into what others performing similar analysis are using).

If you do decide to use a reference annotation dataset, be sure to use the same exact reference genome that it is based on for your analysis. This may mean creating a new Custom Genome. The sequence identifiers, content, and lengths must be exact between all inputs, meaning created from the same build and using the same nomenclature.

Best, Jen, Galaxy team

ADD COMMENT • link written 3.8 years ago by Jennifer Hillman Jackson ♦ 25k

First off, thank you so much for your help. You have been incredibly helpful. So I did find the genome hosted in the UCSC Archaeal Genome Browser so I was able to send the output of the gtf file to Galaxy which was great, but as you already mentioned this caused issues because I created my reference genome from NCBI using accession #NC_006840.2 so this gtf file doesn't seem to match this.

I don't see a way to download a single fasta file from UCSC, I only see a CDS fasta from a multiple alignment and NCBI has a gff file that matches the original fasta file but I don't think that will work either. Are you also saying that the analysis can be done without the gtf file in the first place, I thought the algorithm for Tophat needed a gtf file?

ADD REPLY • link written 3.8 years ago by gkuffel22 • 170

Very glad you were able to locate an annotation dataset, that is the difficult part sometimes. From here, obtaining the matching genome should be straightforward.

The build notes for the genome source should be noted in the UCSC browser (often on the first page, near the bottom). If it is not present, you could write into their support and ask which build was used and the source, then obtain the same exact version. As far as I know, this version of the browser does not host the reference genomes through an FTP website, but double check with them, it was recently updated.

For the annotation file, it is not required with any of the tools from this suite. Using it with Tophat is just one option, as it is with the downstream tools. The annotation would only provide additional splice junctions - which can be used as a guide (will supplement splices found in your sequence input) or as truth (only splices in the annotation will be considered). The Tophat manual explains the difference in more detail so that you can fully understand how/when/if to use it for different analysis goals. The same is true for Cufflinks, etc.

Take care, Jen, Galaxy team

ADD REPLY • link written 3.8 years ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »