Question: How Can I Extract Sequence Information Fromm Cuffdiff Files?
0
gravatar for Humberto Boncristiani
6.2 years ago by
Humberto Boncristiani20 wrote:
Hi. I got cuffdiff files with gene differential expression on it. I don't have the annotation, therefore I need to extract the sequence information from the genome coordinates and them blast them to identify those. How the easiest way to do it? Thanks. Humberto Dr. Humberto Boncristiani National Research Council (NRC) Fellow Adjunct Research Associate Department of Biology Univ. North Carolina at Greensboro 312 Eberhart Bldg Greensboro, NC 27403, USA. Tel.:(1) 336-256-2591 Fax: (1) 336-334-5839 email: humbfb@gmail.com
rna-seq cuffdiff • 1.6k views
ADD COMMENTlink modified 6.2 years ago by Jennifer Hillman Jackson25k • written 6.2 years ago by Humberto Boncristiani20
0
gravatar for Jennifer Hillman Jackson
6.2 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hello, By no annotation, do you mean species-specific annotation (GTF) was not used? And you want to compare to a protein database like Genbank NR or RefSeq? Then these are the instructions. Please let us know if you had something else in mind. The sequence extraction can be done on Galaxy Main (if that is where you are working), but the BLAST will need to be run on a local or cloud install. To get set up (instance and data), start here: http://getgalaxy.org http://usegalaxy.org/cloud The BLAST+ wrapper recently moved from the distribution to the Tool Shed, but there are installation tools integrated to help get this into your instance. See the latest News Brief for details (Sept 7, 2012) - these are also good to follow as you maintain your instance: http://wiki.g2.bx.psu.edu/News http://wiki.g2.bx.psu.edu/DevNewsBriefs/2012_09_07 Questions about local/cloud installs are best directed to the galaxy-dev@bx.psu.edu mailing list: http://wiki.g2.bx.psu.edu/Mailing%20Lists To extract the transcript sequences, use the tool 'Fetch Sequences -> Extract Genomic DNA'. This will accept a custom reference genome from the history, if you have been using one, by changing the option "Source for Genomic Data:" to "History". Hopefully this helps, Jen Galaxy team -- Jennifer Jackson http://galaxyproject.org
ADD COMMENTlink written 6.2 years ago by Jennifer Hillman Jackson25k
0
gravatar for Jennifer Hillman Jackson
6.2 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hi Humberto, Yes, my apologies, this should have been included in the original reply. The 'locus' field in the Cuffdiff files refers to a gene bound - not individual transcripts. To get to the transcripts, the inputs to Cuffdiff need to be accessed. If you used Cuffmerge, the "merged transcripts" GTF file would be the correct file to use as input to "Extract". If you used just Cuffcompare, use the "combined transcripts" GTF. To know which transcript was associated with which gene bound, compare the Cuffmerge merged transcripts GTF attributes (9th column: gene_id, tss_id, etc) with Cuffdiffs "gene_id", "tss_id" values - is also in the test_id column, depending on the file. The Cuffcompare GTF comparisons will be similar. You can gain access to the GTF attributes with the tool "Filter and Sort -> Filter GTF data by attribute values_list". Cut out the column of interest in the Cuffdiff file ("Text Manipulation -> Cut"), edit as desired, and use as a list filter. Or explore the other GFF filter options in the same tool group. Take care, Jen Galaxy team -- Jennifer Jackson http://galaxyproject.org
ADD COMMENTlink written 6.2 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 180 users visited in the last hour