Question: Remove e. coli and vector contaminants from Ilumina reads
0
gravatar for RACHAMPION
21 months ago by
RACHAMPION10
RACHAMPION10 wrote:

I ran Trinity (wheat 300 bp paired ends) using Ilumina reads. When I ran BLAST on the contigs, some looked like wheat, and some looked like e. coli, and some looked like vectors (viruses). Is there a way of cleaning the reads before running Trinity to remove the contaminants? I was not able to find anything in Galaxy.

Thanks Rick

assembly • 934 views
ADD COMMENTlink modified 21 months ago • written 21 months ago by RACHAMPION10

if the possible contaminants don't map to your reference and there is only a small % of them then you should consider whether you really need to do anything about them. Personally, I think you can ignore them unless there is a substantial or highly variable proportion of such reads. If your % of mapped reads across all samples is similar I think you can just ignore it. But in case you still wanted to clean things up see below

ADD REPLYlink modified 21 months ago • written 21 months ago by Guy Reeves1.0k
2
gravatar for Guy Reeves
21 months ago by
Guy Reeves1.0k
Germany
Guy Reeves1.0k wrote:

Hi as a possible outline plan,. Download an E.coli genome as from UCSC as a Fasta file. Use this as a reference genome from the history.
Map all your reads to this e.g using bowtie2 but activate this option 'Write unaligned reads (in fastq format) to separate file(s) ' then use these fastq files --which should not map to E.coli genome-- as your contamination clean files for your workflow of choice. If you also activate 'Write aligned reads (in fastq format) to separate file(s)' then you will be able to see if there is really any contamination. I guess if you are just checking to see if there is any contamination you only need part of the E.coli genome but if you want to clean up any contamination you will need the whole genome. I have never implemented this workflow, but I do do something similar to sex individuals using a fragment of the Y chromsome Cheers Guy

ADD COMMENTlink written 21 months ago by Guy Reeves1.0k
1
gravatar for RACHAMPION
21 months ago by
RACHAMPION10
RACHAMPION10 wrote:

I'll go with Plan 2 because I'm using de Novo assembly. Thus the contigs that I get are a mixture (by BLAST) of T. aestevium, various strains of e. coli, and a variety of vectors. Leaving contaminants in would confuse down stream analysis. A friend uses Plan 2. Yours is additionally helpful in that it contains more detail. Also, the wheat genome has not been completely sequenced, so trusting completely to guided assembly presents its own problems.

ADD COMMENTlink written 21 months ago by RACHAMPION10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 170 users visited in the last hour