Question: How To Analyze The Encode Rna-Seq Data From Ucsc Genome Browser With Galazy
0
gravatar for Santagostino Marco
5.6 years ago by
Santagostino Marco50 wrote:
Dear Sir/Madam, I am new at Galaxy. I need to define if a set loci ( about 700) is transcribed, i.e. these loci overlap with those reported in the Encode RNA-seq data. The track contains several tables, can you please suggest me how to proceed? do I need to download all the tables from UCSC table browser and then upload/send them to Galaxy? Is there a way to refer only to the Encode RNA-seq track without downloading the whole table set? I have the coordinates of each one of my loci, from those I can obtain the sequences. I intended to use the Public Galaxy Main Instance. Thank you, Marco Santagostino -- Marco Santagostino, PhD Laboratorio di Biologia Molecolare e Cellulare Dipartimento di Biologia e Biotecnologie, University of Pavia Ferrata street, 9 - 27100 Pavia, Italy Tel.: +39 0382 985540 Fax: +39 0382 528496 e-mail: marco.santagostino@unipv.it
galaxy • 2.0k views
ADD COMMENTlink modified 5.6 years ago by Jennifer Hillman Jackson25k • written 5.6 years ago by Santagostino Marco50
0
gravatar for Jennifer Hillman Jackson
5.6 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hi Marco, Each RNA-seq study in the ENCODE project may have variable coverage, but if the goal is to identify overlapping regions with gene annotations targeted by the ENCODE project, the "GENCODE Genes" track is most likely the one you are looking for. Review the contents of the track at the ENCODE hub at UCSC by going to their web site http://genome.ucsc.edu, clicking into Genomes, the target genome (hg19?), then scroll down to the track group "Gene and Gene Predictions". Click on the track "Gencode genes" to read about how it is constructed, what the content options are, and how these relate to ENCODE builds. You can follow more links in the description to the subtracks (for example, in hg19, Version 14 is the most current), and "describe schema" will take you into the Table Browser where the actual format of the data table can be reviewed. "Tools -> Table Browser" will bring you to a form where the table can be extracted and sent to Galaxy, it is the same form found in Galaxy under "Get data -> UCSC Main". If you start this browser process while still logged into Galaxy from the history you want to import the data in to, you can extract directly from here, making sure that "Galaxy" is checked (it will be by default) next to the "output format: BED" section of the form. Or, you can simply explore, and once you know what tracks/tables you are interested in, go through the Galaxy tool "Get data -> UCSC Main". You may know this already, but the core hub for ENCODE is at: http://genome.ucsc.edu/ENCODE/index.html Basic examples that show how to extract data from UCSC and use coordinate overlap comparison tools can be found at: https://main.g2.bx.psu.edu/u/aun1/p/galaxy101 https://main.g2.bx.psu.edu/u/galaxyproject/p/using-galaxy-2012 (protocol 1) More screencasts/tutorials are at: http://wiki.galaxyproject.org/Learn/Screencasts https://main.g2.bx.psu.edu/page/list_published Hopefully this helps, Jen Galaxy team -- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
ADD COMMENTlink written 5.6 years ago by Jennifer Hillman Jackson25k
Dear Jennifer, thank you, I already checked the coverage with the GENCODE and found those loci overlapping with the annotations. I also tried to overlap with the tracks Small RNA-seq from ENCODE/Cold Spring Harbor Lab and ENCODE RNA-seq Tracks a few of the loci not covered by GENCODE annotation. Since some of my loci do overlap with regions that are transcribed according to the two above-mentioned tracks, I would like to proceed with this analysis, but the number of tables per track to be searched is big, so I was wondering whether Galaxy would allow to make the work easier. Here is an example the loci I am analysing, http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&position=chr1:41965319 -41965336&hub_4607_uniformRNA=full It does not overlap with annotated Gencode transcript, but it overlaps with "Long RNA-seq from Encode/Cold Spring Harbor Lab" ("GM78 cel pa-") and "ENCODE Long RNA-seq and Short RNA-seq Contigs and Signal" (GM12878 Nucleus PolyA Long CSHL Contigs (contig_17188)), the two tracks probably refers to the same contigs, but they use a different level of detail. Thank you, Marco 2013/5/9 Jennifer Jackson <jen@bx.psu.edu> -- Marco Santagostino, PhD Laboratorio di Biologia Molecolare e Cellulare Dipartimento di Biologia e Biotecnologie, University of Pavia Ferrata street, 9 - 27100 Pavia, Italy Tel.: +39 0382 985540 Fax: +39 0382 528496 e-mail: marco.santagostino@unipv.it
ADD REPLYlink written 5.6 years ago by Santagostino Marco50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 146 users visited in the last hour