How can I use galaxy to run RNA-seq analysis without reference seqence?

Question: How can I use galaxy to run RNA-seq analysis without reference seqence?

4.1 years ago by

Singapore

hudiejie • 20 wrote:

Recently I have got my RNA-seq data of a fungal species under several treatments. There are refseq files (fasta, gbff and bbs) available in NCBI, but the alignment/mapping rate is quite low (3-4%). So here I have several questions:

1.Based on NCBI refseq, there is no annotation file available. For the tools in Galaxy, I used fasta refseq file but what is the usage of the gbff and bbs files? Where can I download the annotation gtf file of this species?

2.I cannot understand the low mapping rate. Based on my boss, since there is little about the species under investigation and they are not the same isolates, RNA-seq mapping will be difficult. Is it possible explanation for the low mapping rate?

3.the most important question: How can I use galaxy to run RNA-seq analysis without refseq?

referencedata alignment refseq rna-seq analysis • 1.6k views

ADD COMMENT • link •

modified 4.1 years ago • written 4.1 years ago by hudiejie • 20

4.1 years ago by

hudiejie • 20

Singapore

hudiejie • 20 wrote:

Hi Fubar, Thank you for your reply.

Just add sth:

The raw data has pass the FastQC test and has been trimmed without adapters. And the low mapping rate is double confirmed by the sequencing center BI team.

Still open to any other explanations for the low mapping rate.

I have a local Galaxy installed so would be greatly delighted if you could share with me your experience about how to set up a reference transcriptome sequence or any other sources can be refer to.

ADD COMMENT • link written 4.1 years ago by hudiejie • 20

Not sure what "pass" means but good quality data will generally map well if you have something reliable to map it to. Like I suggested, try blast for a few common sequences.

We assembled a methylome and a transcriptome for differential analysis in a mammal without a reference genome in https://www.landesbioscience.com/journals/epigenetics/article/34391/ Useful for our biologist colleagues despite being far from perfect compared to having a reference genome.

Mostly Galaxy tools but not for the methylome and transcriptome assemblies - abyss and velvet are both in the toolshed I think.

ADD REPLY • link modified 4.1 years ago • written 4.1 years ago by fubar ♦ 1.1k

4.1 years ago by

fubar ♦ 1.1k

Australia

fubar ♦ 1.1k wrote:

1. No reference genome means probably no ready made gene model annotation.

2. Check the read data quality. Low quality data or sequences with untrimmed adapters won't map well even with the right reference. The FastQC tool can be very helpful. BLAST some of the most common sequences to check the likely origin.

3. Given sufficient long reads with deep, high quality coverage of the unknown transcriptome, assembling all the reads might produce a useable reference transcriptome. Then you could try a differential expression pipeline for your different treatments with that reference transcriptome sequence. Requires a (eg cloud) galaxy where you can install velvet or abyss from the toolshed, plus some experience. Assembly software usually requires "informed tweaking" unfortunately and like most analyses is unlikely to work well with poor quality or low coverage data.

ADD COMMENT • link modified 4.1 years ago • written 4.1 years ago by fubar ♦ 1.1k

Please log in to add an answer.

Similar posts • Search »