Question: Re: Assemble A Consensus Genome From Ngs Data
0
gravatar for Benjamin Dickins
7.6 years ago by
United Kingdom
Benjamin Dickins20 wrote:
Hi David, I'm sorry for a slow response. Relatively recently I solved a problem a bit like this and would be happy to share more information with you. If your genome is small I think it makes sense to map to a reference and identify variant sites. (In my opinion de novo assembly isn't needed - see below). A basic approach is: groom FASTA file -> map with BWA -> filter SAM (uniquely mapped reads only) -> SAM-to-BAM -> Generate pileup -> Filter pileup This gives you a position-by-position summary relative to the reference. And that last step is important and needs the most care: you can have it print out differences total numbers of non-reference bases. I can share some information about thresholding how many of these constitute significant evidence that a non-reference base is actually there at that position (basically I use a binomial distribution and ask whether the distribution of ref/non-ref would occur by chance). Given that coverage of small genomes tends to be high, your first question about determining the actual genome sequence (or the quasispecies consensus if you prefer!) can be answered by majority rules: i.e., a small script (or with tools under "Text Manipulation" heading) to read off the base with the most support at each position and then to test whether that base == base in reference nucleotide column. It's probably also worth thinking about PCR duplicates (from library prep) as these could be a significant source of error, but they are also tricky when many reads will be identical anyway in the input DNA. Feel free to get in touch with me if you need a bit more clarity and/or some more specifics... cheers, Ben Benjamin Dickins Postdoctoral Researcher Center for Comparative Genomics and Bioinformatics The Pennsylvania State University 302 Wartik Laboratory University Park, PA 16802, USA Cell/mobile: +1 814 777 1852 Office tel: +1 814 863 2185 Office fax: +1 814 865 9131 Website: http://www.bendickins.net/ Weblog: http://www.open.ac.uk/blogs/ideasblog/
bwa alignment • 1.3k views
ADD COMMENTlink modified 7.6 years ago by David Matthews630 • written 7.6 years ago by Benjamin Dickins20
0
gravatar for David Matthews
7.6 years ago by
United Kingdom
David Matthews630 wrote:
HI Ben, Do not apologise, this is excellent guidance! I have been bumbling about with pile up and your explanation makes it much clearer. I did not use BWA but tophat instead so I'll give it a go with bwa and see if it makes a difference. I'm off to a virology conference next week so I'm not sure how much chance I'll get to work on it but many thanks again and once I do get my teeth into it I'm sure I'll have some more questions - especially on the stats front. On a related subject I am also looking at indels to see if the virus has hotspots for transcription errors that may reflect a deliberate attempt by the virus to modulate RNApolII function through secondary RNA structure interfering with polII fidelity (I have no other evidence for this, just a mad shot in the dark!). Have you ever looked at this either? Best Wishes, David
ADD COMMENTlink written 7.6 years ago by David Matthews630
Hi David, Thanks! Your indel/pol point sounds very interesting - I've not thought of that before - and I'd be happy to talk indels a bit as I ought to be thinking about these a bit more (and because they can be mistaken for substitutions if one isn't careful)... thanks, Ben Benjamin Dickins Postdoctoral Researcher Center for Comparative Genomics and Bioinformatics The Pennsylvania State University 302 Wartik Laboratory University Park, PA 16802, USA Cell/mobile: +1 814 777 1852 Office tel: +1 814 863 2185 Office fax: +1 814 865 9131 Website: http://www.bendickins.net/ Weblog: http://www.open.ac.uk/blogs/ideasblog/
ADD REPLYlink written 7.6 years ago by Benjamin Dickins20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 171 users visited in the last hour