Problem

Question: Problem

5.8 years ago by

Hi, Can you help me? I'm using the main Galaxy server for RNA-Seq analysis and I'm having trouble. I'm running top hat analysis, followed by cuff links, cuff compare and finally cuffdiff. I'm unable to retrieve gene names from the final cuffdiff analysis. I have tried using genome annotation with cuff links and cuff compare, but I either run into a duplicate error or the result does not include gene names. I am new to this, I have only been using Galaxy for a week, so could any suggestions be simplistic please! Thanks! Christopher O'Toole.

galaxy • 748 views

ADD COMMENT • link •

modified 5.8 years ago by Jennifer Hillman Jackson ♦ 25k • written 5.8 years ago by Christopher O'Toole • 10

5.8 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello Christopher, The reference annotation will need to include the attribute gene_name in order for this to be in the final output. If not provided, gene_id from the GTF/GFF3 file will be in the output instead. In addition, if the reference annotation does not contain the attributes p_id and tss_id, then some functions of Cuffdiff will be skipped. If no reference annotation is used, then Cufflinks will assign gene_id/transcript_id, but it does not generate the other attributes and again full functionality of Cuffdiff will not be envoked. When you mention "duplicate error" I am guessing that the reference annotation file that you are using has formatting problems? This is true for some sources, in particular with GFF3 files and duplicated "ID" attributes. Sometimes these can be corrected, but it is not always simple, and the best advice is to contact the source for a correction or to select a different source for annotation, ideally one that also contains the p_id and tss_id attributes, plus in your case, the gene_name attribute since that is important to you. Or you can try to correct. This isn't recommended as a first choice solution as each of these duplicate ID problems tends to be a bit different and there isn't a single, simple solution. Also, changing the file at all could cause problems with the overall integrity and this could be very difficult problem to detect. But if this is something you decided to do, test carefully - the /*RNA-seq analysis*/ *tools *link below has links to formats and there are some command-line and online tools that will help to verify format. Our wiki has some guidelines for where to find help and how to verify format. Once at the Cufflinks tool site, the Manual explains more about how attributes are used. Getting Started and Protocol can help with developing a pipeline that yields the desired results. See Tools on the Main server: /Example/ ? /*RNA-seq analysis*/ *tools.* http://wiki.galaxyproject.org/Support#Interpreting_scientific_results I can also point you directly to the iGenomes reference annotation at the Cufflinks web site. Perhaps your genome is here: http://cufflinks.cbcb.umd.edu/igenomes.html On the public Main Galaxy server usegalaxy.org), the UCSC hg19 and mm9 gtf files are available under "Shared Data -> Data Libraries -> iGenomes". These can be directly imported into a history as a dataset and used with the RNA Analysis tool set. But, the iGenomes datasets do not cover all genomes and in fact all genomes will not even have an appropriate reference dataset available. If you think that you have found yourself here, then you might want to send an email to the tool support group *tophat.cufflinks@gmail.com* <mailto:tophat.cufflinks@gmail.com> and see if someone has advice. Be sure to mention the genome/build you are working with. There could be a resource someone else on the list may be able to suggest. Hopefully some part of this leads you to a successful analysis run! Jen Galaxy team -- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org

ADD COMMENT • link written 5.8 years ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »