I have just started out in the field of genome assembly, so please bear with my lack of knowledge about the subject. I have been put to the task of assembling de novo the genome of a diatom with both mate paired end reads and paired end reads. I think have a workflow figured out for the paired end reads using velvet as explained below:
- Read QC
- Trimmer
- Read QC
- Cutadapt
- Read QC
- Resync
- velveth (hash = 35)
- velvetg
- assembly stats
However, now I want to improve the quality of the assembly using sequence data from mate paired end reads and don't know how I should prep the data for assembly or what steps I should take after that. Some basic stats on the reads are:
- 2 mate pair libraries prepared. 1 was selected for 3-5kb inserts and the other 5-10kb
- The project was sequenced on a 100bp PE
- Generated >160M reads for the lane
- Average quality scores are 37
I am not sure what other data about the reads I can include, but any help would be very appreciated! I have found information on the topic that deals with RNA-seq data, but nothing so far on genomic data.
Thanks in advance,
Marnie