I have a set of data that has a linker (not barcode) that needs to be masked/removed before aligning to the genome. My aligner of choice is BWA.
If I mask these linkers (using some dynamic programming...) is it possible to use BWA to align these sequences? That is can I replace the linker with 'N's or similar characters and then align? I've tried replacing with 'N's but then the sequences which contain linkers aren't aligned as they then have too many mismatches.
After no joy with the above I've gone for the option of removing the linker sequences, giving me a mixed set of read lengths which I have to align separately and then recombine the SAM files to create a final BAM file.
Is this the best way of tackling this problem or am I missing something obvious?
Thanks
If I mask these linkers (using some dynamic programming...) is it possible to use BWA to align these sequences? That is can I replace the linker with 'N's or similar characters and then align? I've tried replacing with 'N's but then the sequences which contain linkers aren't aligned as they then have too many mismatches.
After no joy with the above I've gone for the option of removing the linker sequences, giving me a mixed set of read lengths which I have to align separately and then recombine the SAM files to create a final BAM file.
Is this the best way of tackling this problem or am I missing something obvious?
Thanks
Comment