The reference annotation will need to include the attribute gene_name
order for this to be in the final output. If not provided, gene_id
the GTF/GFF3 file will be in the output instead. In addition, if the
reference annotation does not contain the attributes p_id and tss_id,
then some functions of Cuffdiff will be skipped. If no reference
annotation is used, then Cufflinks will assign gene_id/transcript_id,
but it does not generate the other attributes and again full
functionality of Cuffdiff will not be envoked.
When you mention "duplicate error" I am guessing that the reference
annotation file that you are using has formatting problems? This is
for some sources, in particular with GFF3 files and duplicated "ID"
attributes. Sometimes these can be corrected, but it is not always
simple, and the best advice is to contact the source for a correction
to select a different source for annotation, ideally one that also
contains the p_id and tss_id attributes, plus in your case, the
gene_name attribute since that is important to you.
Or you can try to correct. This isn't recommended as a first choice
solution as each of these duplicate ID problems tends to be a bit
different and there isn't a single, simple solution. Also, changing
file at all could cause problems with the overall integrity and this
could be very difficult problem to detect. But if this is something
decided to do, test carefully - the /*RNA-seq analysis*/ *tools *link
below has links to formats and there are some command-line and online
tools that will help to verify format.
Our wiki has some guidelines for where to find help and how to verify
format. Once at the Cufflinks tool site, the Manual explains more
how attributes are used. Getting Started and Protocol can help with
developing a pipeline that yields the desired results.
See Tools on the Main server: /Example/ ? /*RNA-seq analysis*/
I can also point you directly to the iGenomes reference annotation at
the Cufflinks web site.
Perhaps your genome is here:
On the public Main Galaxy server usegalaxy.org), the UCSC hg19 and
gtf files are available under "Shared Data -> Data Libraries ->
iGenomes". These can be directly imported into a history as a dataset
and used with the RNA Analysis tool set.
But, the iGenomes datasets do not cover all genomes and in fact all
genomes will not even have an appropriate reference dataset available.
If you think that you have found yourself here, then you might want to
send an email to the tool support group *firstname.lastname@example.org*
<mailto:email@example.com> and see if someone has advice. Be
sure to mention the genome/build you are working with. There could be
resource someone else on the list may be able to suggest.
Hopefully some part of this leads you to a successful analysis run!
Galaxy Support and Training