2.6 years ago by
United States
Hello,
N indicates an ambiguous base call and it may interfere with optimal alignments. Perhaps try running the mapping job first and examining the results to see if more manipulation is needed (since this is easier). Then you can decide if you want to remove the base or not. Manipulating the global matches parameters will impact all of the sequence - not just this base - and will probably not produce the desired results. But this could be tested and the results compared (overall mapping statistics and a visualization of a few regions in Trackster, or at UCSC, or the browser of your choice).
To remove the base, the workflow will go something like this: Convert the fastq file(s) to tabular format, split at the N base, remove that base, merge the sequence end back together, and convert the format back to fastq (fastqsanger). I do not believe that using the tools FASTQ splitter/joiner will work in this case since the tool was not designed for this type of manipulation (the ends are not the same length). Creating a custom workflow is likely the only option unless you want to correct the file locally using unix tools line-command ('sed' is one option).
Tools can be found in the groups Text Manipulation, Convert Formats, and NGS: QC and manipulation. Most that you will use will be analogous to common line-command operations.
Thanks, Jen, Galaxy team