I got cuffdiff files with gene differential expression on it. I don't
have the annotation, therefore I need to extract the sequence
information from the genome coordinates and them blast them to
How the easiest way to do it?
Dr. Humberto Boncristiani
National Research Council (NRC) Fellow
Adjunct Research Associate
Department of Biology
Univ. North Carolina at Greensboro
312 Eberhart Bldg
Greensboro, NC 27403, USA.
Fax: (1) 336-334-5839
By no annotation, do you mean species-specific annotation (GTF) was
used? And you want to compare to a protein database like Genbank NR or
RefSeq? Then these are the instructions. Please let us know if you had
something else in mind.
The sequence extraction can be done on Galaxy Main (if that is where
are working), but the BLAST will need to be run on a local or cloud
install. To get set up (instance and data), start here:
The BLAST+ wrapper recently moved from the distribution to the Tool
Shed, but there are installation tools integrated to help get this
your instance. See the latest News Brief for details (Sept 7, 2012) -
these are also good to follow as you maintain your instance:
Questions about local/cloud installs are best directed to the
email@example.com mailing list:
To extract the transcript sequences, use the tool 'Fetch Sequences ->
Extract Genomic DNA'. This will accept a custom reference genome from
the history, if you have been using one, by changing the option
for Genomic Data:" to "History".
Hopefully this helps,
Yes, my apologies, this should have been included in the original
The 'locus' field in the Cuffdiff files refers to a gene bound - not
individual transcripts. To get to the transcripts, the inputs to
Cuffdiff need to be accessed. If you used Cuffmerge, the "merged
transcripts" GTF file would be the correct file to use as input to
"Extract". If you used just Cuffcompare, use the "combined
To know which transcript was associated with which gene bound, compare
the Cuffmerge merged transcripts GTF attributes (9th column: gene_id,
tss_id, etc) with Cuffdiffs "gene_id", "tss_id" values - is also in
test_id column, depending on the file. The Cuffcompare GTF comparisons
will be similar.
You can gain access to the GTF attributes with the tool "Filter and
-> Filter GTF data by attribute values_list". Cut out the column of
interest in the Cuffdiff file ("Text Manipulation -> Cut"), edit as
desired, and use as a list filter. Or explore the other GFF filter
options in the same tool group.