Question: Metatranscriptomic reads alignment and assembly
gravatar for Aragion
18 months ago by
Aragion0 wrote:


I have a set of metatranscriptomic Illumina paired-end reads. I need to align them against the database of plant virus genomes (about 55000 sequences) and calculate coverage for each reference. How can I solve the problem? It is also necessary to assemble these reads de novo. I've tried Trinity on Galaxy but the results seem controversial because BLAST search reveals predominantly bacterial and other non-plant viral sequences. Whether there are some other online services for metagenomic analysis?


ADD COMMENTlink modified 18 months ago by Jennifer Hillman Jackson25k • written 18 months ago by Aragion0
gravatar for Jennifer Hillman Jackson
18 months ago by
United States
Jennifer Hillman Jackson25k wrote:


A version of Galaxy pre-configured with tools for metagenomics/metatranscriptomic analysis is available here: Some of these tools can be found on publically hosted Galaxy websites and all are in the Galaxy Tool Shed for installation into any Galaxy (local, cloud).

For your Trinity results, the final assembly content only represents what was originally given as input. Perhaps the target BLAST database should be modified? Or spurious hits filtered out? (viral fragments are expected to be present in other genomes) Or, perhaps there is contamination in your sample(s).

Recommending other services is beyond the scope of this forum. Other places to ask a question or to review prior Q&A include

Thanks! Jen, Galaxy team

ADD COMMENTlink written 18 months ago by Jennifer Hillman Jackson25k

Tnahk you for answer! The proble is that Trinity contigs BLAST against various non-viral sequences even when I chose virus database. Perhaps it is because this tool assembles reads as eukariotic transcripts with exones but not as RNA-genomes. May be I should try to increase the contig length in Trinity settings. Or it is better to use other metagenomic assemblers?

I also tried to map reads against the big database using Galaxy. The resulting BAM file is almost 5Gb in size and I can't open it with any available program.

ADD REPLYlink modified 18 months ago • written 18 months ago by Aragion0

BAM is a compressed format that can be visualized in many viewers.Trackster (within Galaxy) and external viewers such as UCSC, IGV, IGB, and others. The available viewers must host the same genome version available as the assigned genome database metadata attribute as your dataset when using the included display applications linked into the Galaxy server in use.

5 GB is not too large for most - many go up to 50 GB and even larger as the data is often served over in batches.

BAM-to-SAM will convert the compressed format to a human-readable (albeit large) plain text format.

Trinity assembles RNA reads into transcripts. These do not contain gaps for splice sites/introns.

For the BLAST results, perhaps try again with a better target. I am not exactly sure why non-viral sequence would be included in a viral containing reference database but that is certainly possible in a public database. All are not curated.

ADD REPLYlink written 18 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 166 users visited in the last hour