Thank you for that help!
Sorry, I realized that what I wrote was not very clear,
Here is the context : I want to map viral insertion sites in the genome. to that aim, I amplify vector -genome junctions by PCR. Therefore I end up with reads containing a 39nt sequence corresponding to the 3' end of the viral genome followed by the human gDNA flanking the viral genome.
I need to select valid reads (containing the 39nt with ~90% homology /1ins-del, Figure (1)) and trim that sequence to keep only the human DNA sequence to map (figure (2)). To precisely determine the viral insertion site (figure (3)) I need to remove precisely the complete viral sequence : if there is an insertion or a deletion I need to know that the trimming must be done on only the first 38nt or 40nt instead of 39nt; so that the +1 of the sequence still matches the insertion site.
I have also created sequences to see what would be kept :
the reference sequence is the following :
GATCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTA. I added this sequence in 3' GCAGGTGGTGGTGGTGGTGGTGGTGGTGGT
The reads that were losts are :
Do you know if there is a way to get these sequences back ?
By the way in the results file I get the end position of the mapped sequence on each read.
Do you know any tool that could use this value to trim each sequence to its specific ending position ?
Thank you a lot for your help !