Question: Reference Annotation error using HISAT2
0
gravatar for ewilli28
3 months ago by
ewilli2850
ewilli2850 wrote:

I keep getting a reference annotation error when I run cuffdiff using HISAT2 alignment files. The cuffdiff file is still produced and looks normal. When I use TopHap2 alignment files (from the same dataset) I don't get the Cuffdiff reference annotation error. I've tried multiple datasets and always get the same results: reference annotation error when aligning with HISAT2 and no error if aligning with TopHat2. Has anyone encountered this before? This is my protocol: Download FASTQ files from SRA, HISAT2 or TOPHAT alignment, Cufflinks, Cuffmerge, Cuffdiff. I am new to NGS mapping and analysis.

alignment tophat hisat rna-seq • 128 views
ADD COMMENTlink modified 3 months ago by Jennifer Hillman Jackson25k • written 3 months ago by ewilli2850
0
gravatar for Jennifer Hillman Jackson
3 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hi,

This does seem odd. I would suggest reviewing the Cuffdiff output to see if the reference annotation dataset (GTF) was really used or not. Specifically, check for gene_id and transcript_id values (from the GTF) instead of "XLOC" identifiers (the default). At least some output lines should have gene_id/transcript_id incorporated. I suspect that the Cuff* output doesn't if HISAT is rejecting the dataset. The usage/error trapping was improved in the newer tool wrappers. The Cuff* tools are a bit dated and considered deprecated for both scientific and complicated usage (leading to errors) reasons.

Whenever reference annotation is used it is very important that it is an exact match for the target genome/build being mapped against. The chromosome identifiers must be a match, there should be no description content on a custom genome's sequence identifier lines (">" lines) -- just sequence names, and the database metadata attribute needs to be assigned to the GTF for many tools to accept it as proper input (checks that value versus the target genome's database name).

The troubleshooting and input Support FAQs here can help to resolve the majority of problems with reference annotation across tools: https://galaxyproject.org/support/#troubleshooting. If you cannot identify the problem after reviewing the help and can reproduce the problem at Galaxy Main https://usegalaxy.org, a bug report can be sent in for feedback (how-to is also included in the FAQs). Please do not delete datasets (inputs/outputs) associated with the reported error or our ability to help will be limited. I reviewed your current active histories and some of your deleted histories at Galaxy Main already but couldn't locate the history that contains this problem (maybe it occurred at a different Galaxy server?).

And the Galaxy Tutorials here cover RNA-seq analysis. It is advised to switch away from the Cuff* tools and instead use the newer tools/methods, with inputs that have proper content, format, and labels (datatype, database). https://galaxyproject.org/learn/

Thanks! Jen, Galaxy team

ADD COMMENTlink written 3 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 175 users visited in the last hour