Metatranscriptomic reads alignment and assembly

Question: Metatranscriptomic reads alignment and assembly

18 months ago by

Aragion • 0 wrote:

Hello.

I have a set of metatranscriptomic Illumina paired-end reads. I need to align them against the database of plant virus genomes (about 55000 sequences) and calculate coverage for each reference. How can I solve the problem? It is also necessary to assemble these reads de novo. I've tried Trinity on Galaxy but the results seem controversial because BLAST search reveals predominantly bacterial and other non-plant viral sequences. Whether there are some other online services for metagenomic analysis?

Thanks.

rna-seq assembly metatranscriptome • 621 views

ADD COMMENT • link •

modified 18 months ago by Jennifer Hillman Jackson ♦ 25k • written 18 months ago by Aragion • 0

18 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

A version of Galaxy pre-configured with tools for metagenomics/metatranscriptomic analysis is available here: https://hub.docker.com/r/bgruening/galaxy-metagenomics/. Some of these tools can be found on publically hosted Galaxy websites and all are in the Galaxy Tool Shed for installation into any Galaxy (local, cloud). https://galaxyproject.org/choices/

For your Trinity results, the final assembly content only represents what was originally given as input. Perhaps the target BLAST database should be modified? Or spurious hits filtered out? (viral fragments are expected to be present in other genomes) Or, perhaps there is contamination in your sample(s).

Recommending other services is beyond the scope of this forum. Other places to ask a question or to review prior Q&A include https://www.biostars.org/.

https://galaxyproject.org/learn/

Thanks! Jen, Galaxy team

ADD COMMENT • link written 18 months ago by Jennifer Hillman Jackson ♦ 25k

Tnahk you for answer! The proble is that Trinity contigs BLAST against various non-viral sequences even when I chose virus database. Perhaps it is because this tool assembles reads as eukariotic transcripts with exones but not as RNA-genomes. May be I should try to increase the contig length in Trinity settings. Or it is better to use other metagenomic assemblers?

I also tried to map reads against the big database using Galaxy. The resulting BAM file is almost 5Gb in size and I can't open it with any available program.

ADD REPLY • link modified 18 months ago • written 18 months ago by Aragion • 0

BAM is a compressed format that can be visualized in many viewers.Trackster (within Galaxy) and external viewers such as UCSC, IGV, IGB, and others. The available viewers must host the same genome version available as the assigned genome database metadata attribute as your dataset when using the included display applications linked into the Galaxy server in use.

5 GB is not too large for most - many go up to 50 GB and even larger as the data is often served over in batches.

BAM-to-SAM will convert the compressed format to a human-readable (albeit large) plain text format.

Trinity assembles RNA reads into transcripts. These do not contain gaps for splice sites/introns.

For the BLAST results, perhaps try again with a better target. I am not exactly sure why non-viral sequence would be included in a viral containing reference database but that is certainly possible in a public database. All are not curated.

ADD REPLY • link written 18 months ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »