Question: How can I use galaxy to run RNA-seq analysis without reference seqence?
0
gravatar for hudiejie
4.1 years ago by
hudiejie20
Singapore
hudiejie20 wrote:

Recently I have got my RNA-seq data of a fungal species under several treatments. There are refseq files (fasta, gbff and bbs) available in NCBI, but the alignment/mapping rate is quite low (3-4%). So here I have several questions:

1.Based on NCBI refseq, there is no annotation file available. For the tools in Galaxy, I used fasta refseq file but what is the usage of the gbff and bbs files? Where can I download the annotation gtf file of this species?

2.I cannot understand the low mapping rate. Based on my boss, since there is little about the species under investigation and they are not the same isolates, RNA-seq mapping will be difficult. Is it possible explanation for the low mapping rate?

3.the most important question: How can I use galaxy to run RNA-seq analysis without refseq?

ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by hudiejie20
1
gravatar for hudiejie
4.1 years ago by
hudiejie20
Singapore
hudiejie20 wrote:

Hi Fubar, Thank you for your reply.

Just add sth:

The raw data has pass the FastQC test and has been trimmed without adapters. And the low mapping rate is double confirmed by the sequencing center BI team.

Still open to any other explanations for the low mapping rate.

I have a local Galaxy installed so would be greatly delighted if you could share with me your experience about how to set up a reference transcriptome sequence or any other sources can be refer to.

 

ADD COMMENTlink written 4.1 years ago by hudiejie20

Not sure what "pass" means but good quality data will generally map well if you have something reliable to map it to. Like I suggested, try blast for a few common sequences.

We assembled a methylome and a transcriptome for differential analysis in a mammal without a reference genome in https://www.landesbioscience.com/journals/epigenetics/article/34391/ Useful for our biologist colleagues despite being far from perfect compared to having a reference genome.

Mostly Galaxy tools but not for the methylome and transcriptome assemblies - abyss and velvet are both in the toolshed I think.

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by fubar1.1k
2
gravatar for fubar
4.1 years ago by
fubar1.1k
Australia
fubar1.1k wrote:

1. No reference genome means probably no ready made gene model annotation.

2. Check the read data quality. Low quality data or sequences with untrimmed adapters won't map well even with the right reference. The FastQC tool can be very helpful. BLAST some of the most common sequences to check the likely origin.

3. Given sufficient long reads with deep, high quality coverage of the unknown transcriptome, assembling all the reads might produce a useable reference transcriptome. Then you could try a differential expression pipeline for your different treatments with that reference transcriptome sequence. Requires a (eg cloud) galaxy where you can install velvet or abyss from the toolshed, plus some experience. Assembly software usually requires "informed tweaking" unfortunately and like most analyses is unlikely to work well with poor quality or low coverage data.

 

ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by fubar1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour