How to upload Mouse reference genome mm10, in Fasta format to My Galaxy History

Question: How to upload Mouse reference genome mm10, in Fasta format to My Galaxy History

12 months ago by

bioadept • 0 wrote:

I tried to use an imported "tuxedo protocol" RNA-seq pipeline from public workflows. I found mouse .gtf annotation file either from Galaxy Data Library & UCSC Main table browser. But, I could not find the mouse Reference Genome (FASTA) in the Galaxy Data Library ?

Could you tell me how to find & upload mouse mm10 & hg38 Reference genomes in Fasta Format into Galaxy History ?

I have attached snapshot of assigning RNA-seq datasets to the workflow. https://ibb.co/cYrgk6

mm10 galaxy reference-genome hg38 • 1.4k views

ADD COMMENT • link •

modified 12 months ago by Jennifer Hillman Jackson ♦ 25k • written 12 months ago by bioadept • 0

Are any of these files the correct fasta files to be used as reference Genome for RNA-seq analysis ? for the tuxedo pipeline mentioned in the above comment (Check image in link) (https://ibb.co/cYrgk6) ?

Gencode genome fasta file ? ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M15/GRCm38.p5.genome.fa.gz

or These transcript annotation RNA fasta files ? ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M15/gencode.vM15.transcripts.fa.gz UCSC http://hgdownload.cse.ucsc.edu/goldenPath/mm10/bigZips/mrna.fa.gz
NCBI ftp://ftp.ncbi.nih.gov/genomes/Mus_musculus/RNA/rna.fa.gz

or Is there any built in fasta format for human/mouse in https://usegalaxy.org ?

ADD REPLY • link modified 12 months ago • written 12 months ago by bioadept • 0

In this regard can you directly five FASTA GZIP (.fa.gz) file to Galaxy and then feed the data manager or the genome should be unzipped first (local galaxy)?

ADD REPLY • link written 12 months ago by vebaev • 130

12 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

The workflow you are using is inputting the reference genome as a custom reference genome from the history during execution. This is one way to do the analysis. Another is to install reference genome indexes on the server you are working on (if your own or you can make requests). And the final way is to use the built-in native indexes on the server you are working on.

It looks as if you are working on Galaxy Main https://usegalaxy.org. If so, then both mm10 and hg38 are natively indexed for most tools on the server. This means that you do not need to upload the reference genome to your history. And it increases the chance of a successful job as these larger genomes can quickly use up resources building a new index each run. You will need to modify the workflow so that tools use the built-in indexes instead of a custom reference genome.

Genomes are rarely kept in data libraries at Galaxy Main - instead, they are accessed directly by the tools that they are indexed for.

RNA-seq tutorials are here: https://galaxyproject.org/learn/

How to use a Custom reference genome (and where to potentially source one, example: UCSC) is explained in the last link here. https://galaxyproject.org/support/#troubleshooting Also review the Chromosome mismatch FAQ at this location - all inputs must be based on the exact same reference genome or problems will come up with tools/results.

Hope that helps! Jen, Galaxy team

ADD COMMENT • link modified 12 months ago • written 12 months ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »