Question: HiSAT2 alignment to GRCm38? in Galaxy -- where to find mm10 reference annotation
3
gravatar for dexter.myrick
4 months ago by
dexter.myrick40 wrote:

I am doing an RNA-seq experiment and I ran HiSAT2 with the mm10 reference genome. Then in order to run htseq-count I downloaded grcm38 gtf file from Ensembl. The only gtf file in the galaxy database is mm9. Also, the htseq-count literature states that UCSC encoded gtf files do not work with htseq because "the gene_id attribute incorrectly contains the same value as the transcript_id attribute". Htseq-count results in all reads counts in the "no features" file. Presumably because I ran HiSAt2 with mm10 and htseq-count with grcm38?

So should I go back and run HiSAT2 with grcm38 reference genome? If so, how do I get the grcm38 hisat2 reference index into Galaxy? I tried to download the grcm38 index from the HiSAT2 webpage and I got a folder with about 10 files "genome.2.ht2, genome.3.ht2" for example. There is also a script in the folder called "make_grcm38.sh". When I run this script in that directory, it returns the error "Could not find hisat2-build in current directory or in PATH".

Is there a way to get a mm10 gtf file from UCSC that is compatible with htseqcount into Galaxy? Thanks!!

ADD COMMENTlink modified 4 months ago by Jennifer Hillman Jackson25k • written 4 months ago by dexter.myrick40
1
gravatar for Jennifer Hillman Jackson
4 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hi,

Please try the Gencode GTF version of the annotation. It contains chromosome identifiers that are a match for UCSC's mm10. Avoid the GFF3 version - it will have less utility and some RNA-seq tools will not accept GFF3 annotation as input, or they might error due to the content not meeting a strict GFF3 specification.

You can load this into Galaxy by copy/pasting the URL to the Upload tool. Set the metadata datatype to gtf and database to mm10 so that tools recognize it as an appropriate/matched input for other mm10 input dataset(s).

Note: This data provider includes extra headers in the file that prevent gtf being assigned when using Upload's autodetect function and some tools require that (instead of the default autodetected datatype gff). Also, please don't try to use this annotation with older Tophat/Cuff* tools -- they will fail because of the header (or, you can remove the header with Text Manipulation tools). It is better to use the updated RNA-seq tools anyway.

You can assign both of these metadata during Upload, or after using the Edit Attributes functions.

The BAM should have mm10 already assigned if created with HISAT2 in Galaxy -- yet assign if needed (uploaded BAM). Galaxy cannot autodetect database during Upload - it must be set by the user or the external data source (not all do this).

Thanks! Jen, Galaxy team

ADD COMMENTlink modified 4 months ago • written 4 months ago by Jennifer Hillman Jackson25k

Thanks again!! Trying it now

ADD REPLYlink written 4 months ago by dexter.myrick40

Quick question. Are there any reasons to still be using mm9 instead of mm10 that I'm unaware of? mm10 has been out since 2012 right?

ADD REPLYlink written 4 months ago by dexter.myrick40

None that I am aware of, either, big picture. I would stick with mm10.

ADD REPLYlink written 4 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 183 users visited in the last hour