Aaron,
Please do check for contaminants.... our experience with service
providers and QC....I can write a book probably :(. The FastQC suite
is a good place to start (also a galaxy wrapper is available for
that). Even for 454 (not having fixed base positions and fixed
lengths) it's quite informative (kmer overrepresentation and such).
In addition...check for contaminating sequences (ie Coli or Mycoplasma
sequences not expected when sequencing human cells.... but you better
check ....experience).
In the MIRA documentation you will find some info on this prior to
assembly filtering as well if I remember correctly.
Please keep us posted on your progress.
@Peter; hope you manage to take a flight and join the conference. A
pity I won't be there but it looks very promising...
Alex
Van: Aaron Jex [mailto:ajex@unimelb.edu.au]
Verzonden: dinsdag 24 mei 2011 8:52
Aan: Bossers, Alex
Onderwerp: RE: [galaxy-user] (no subject)
Hi Alex,
Thanks for the email. I will have to have a closer read of the MIRA
documentation I think. I know that it definitely makes use of the
quality data to some extent, but I hadn't considered whether it
ignores low quality data or not (perhaps there's a threshold setting I
could use - I'll check that). I'm not too worried about adaptor
sequence at the moment as these "should" be trimmed by our sequencing
service, and I clip the ends on the reads when I extract the qual and
fasta files from the original sff files anyways.
Best regards,
Aaron
Aaron Jex, BSc, PhD
Senior Research Officer,
Department of Veterinary Science,
The University of Melbourne,
250 Princes Highway,
Werribee, Victoria,
3030
tel: +61 3 9731 2294
To: Aaron Jex; galaxy-user@bx.psu.edu
Subject: RE: [galaxy-user] (no subject)
Aaron,
As far as I remember MIRA....isn't MIRA taking into account the
low/high quality bases anyway? So no need to filter there right?
Only filtering needed is for contaminating sequences.....(incl
adapters and such). You can/have to check the MIRA website to be sure
though.
The high qual segments I have used as in the metagenomics example but
indeed you loose the exact qual info....but that is already above the
provided threshold (default above 20 in Sanger quality score range).
Alex
Van: galaxy-user-bounces@lists.bx.psu.edu [mailto:galaxy-user-
bounces@lists.bx.psu.edu] Namens Aaron Jex
Verzonden: dinsdag 24 mei 2011 1:40
Aan: galaxy-user@bx.psu.edu
Onderwerp: [galaxy-user] (no subject)
Hi,
Can't seem to find an answer to this on your wiki site and it's not in
the tutorial. I would like to filter my 454 reads for high quality
regions, rename the resulting sequence fragments AND relink the new
reads (fragments) to the original quality data so that I can take
these filtered reads and assembly them using MIRA. Is there a way to
do this with Galaxy? So basically all I want to do is take the new
read fragments I get from converting the tabular file to the fasta
file as shown in your metagenomics tutorial, and generate a
corresponding qual file for these 'new' reads.
Best regards,
Aaron
Aaron Jex, BSc, PhD
Senior Research Officer,
Department of Veterinary Science,
The University of Melbourne,
250 Princes Highway,
Werribee, Victoria,
3030
tel: +61 3 9731 2294