Question: How to map RNA-seq reads to an annotated reference genome in GFF format
0
gravatar for htreves
3.5 years ago by
htreves0
European Union
htreves0 wrote:

Hi,

I am trying to map RNA-seq reads to a GFF annotation file I created using tophat2 through Galaxy. When trying to select a reference genome from my history files, no file is identified as an option. I've uploaded a GFF3 file. the same thing happened even when I've uploaded a GTF file from the "RNA-seq Analysis Exercise" page (named Galaxy Dataset | iGenomes UCSC hg19, chr19 gene annotation). 

 

What am I doing wrong and how can I get tophat to work with my data?

 

Thanks,

-- 

Haim Treves

 

 

Dept. Plant and Environmental Sciences
The Alexander Silberman Institute of Life Sciences
The Hebrew University of Jerusalem
91904 Jerusalem, Israel
Phone(Lab): 972 2 6585204/31
Fax (Lab): 972 2 6584463

 

gff tophat galaxy rna-seq • 8.9k views
ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by htreves0
2
gravatar for Jennifer Hillman Jackson
3.5 years ago by
United States
Jennifer Hillman Jackson23k wrote:

Hello,

The RNA-seq reads must be aligned against a reference genome or transcriptome for use with Tophat. GTF files and the top portion (or all) of a GFF3 file is a reference annotation dataset - describing features on a reference genome/transcriptome. Protocol help is here:
https://wiki.galaxyproject.org/Support#Tools_on_the_Main_server:_RNA-seq

The iGenome's GTF collection has an option for UCSC-published version of genomes. These are in Galaxy as built-in indexes - search by the short genome name (or "dbkey) to locate. For example, "hg19" or "mm10" or "dm3".

If supplying a custom reference genome, load and use a fasta dataset. Instructions are in the link below. Please be aware that when using GFF3 datasets, the tool expects for just the top annotation portion of the file to be used as "reference annotation" - if there is fasta sequence at the end of the file, use this in a distinct dataset and as the "reference genome".
https://wiki.galaxyproject.org/Learn/Datatypes#GFF3

Assigning datatype to each correctly is important - use the pencil icon to do this.
https://wiki.galaxyproject.org/Support#Tool_doesn.27t_recognize_dataset
More about Custom genomes:
https://wiki.galaxyproject.org/Support#Custom_reference_genome

Take care, Jen, Galaxy team

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Jennifer Hillman Jackson23k
0
gravatar for htreves
3.5 years ago by
htreves0
European Union
htreves0 wrote:

Thanks,

It was very helpful and now I got all the output files from Tophat. Now, even after creating a New track browser in the visualization menu, and using my genome as reference genome, when trying to add datasets of these output files, they are not recognized and the browser shows that there are no items in my unnamed history, when I see all of these files in my history pane.

How can I get the visualization tool to recognize these files?

 

Thanks again,

 

Haim

 

 

Ho

ADD COMMENTlink written 3.5 years ago by htreves0
0
gravatar for Jennifer Hillman Jackson
3.5 years ago by
United States
Jennifer Hillman Jackson23k wrote:

Hello,

You added your Custom Reference Genome as a Custom Build (necessary for Trackster, and can be also found/added under "User -> Custom Builds" for any one wishes to include)? 

If yes, then go through your datasets and assign the "database" to be your Custom Build. This will tell Galaxy that they are mapped to this same genomic backbone as the Trackster visualization is based on. All that are accepted formats (most standard are) should show up in the add datasets window to choose from.

Best,

Jen

Galaxy team

ADD COMMENTlink written 3.5 years ago by Jennifer Hillman Jackson23k
0
gravatar for htreves
3.5 years ago by
htreves0
European Union
htreves0 wrote:

Hey again,

 

Now it seems to work! Thank you! :)

Going over the data, I can see the reads mapped to a location in the reference sequence (fasta format), but cannot tell how it corresponds to the annotated genes, since I could only use fasta format file as the reference (and not an annotation file, like the gff3 that I have). 

We tried to add the gff dataset to the visualization for that purpose, but we get the following error:

Input error: Chromosome 140113503864811 found in your input file but not in your genome file.
needLargeMem: trying to allocate 0 bytes (limit: 100000000000)

 

How can we add the annotation correctly? Is gff an accepted format for that?

 

Thanks,

 

Haim

ADD COMMENTlink written 3.5 years ago by htreves0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 116 users visited in the last hour