Thanks for your reply.
I followed your advise to simulate different repeat lengths and generated paired end reads. Indeed, when adding more repeat units, there's a point where the pair is not joined but classified as ambiguous. I could manage to merge some longer amplicons by setting loose=t.
However, it is not clear to my why a pair like the following can't be merged, although the reads have an overlap of ~190nt and the repeat region has a complex structure, i.e. repeat stretches are interspersed by single bases and non-repetitive elements. Merging by hand (aligning R1 and rev-comp R2) revealed that there should be only 1 solution without any mismatches.
Please have a look at the following read pair (2x250nt):
Read 1
Read 2
Amplicon (311nt)
Maybe it just depends on the parameters in bbmerge?
Thanks in advance for your help.
Sebastian
I followed your advise to simulate different repeat lengths and generated paired end reads. Indeed, when adding more repeat units, there's a point where the pair is not joined but classified as ambiguous. I could manage to merge some longer amplicons by setting loose=t.
However, it is not clear to my why a pair like the following can't be merged, although the reads have an overlap of ~190nt and the repeat region has a complex structure, i.e. repeat stretches are interspersed by single bases and non-repetitive elements. Merging by hand (aligning R1 and rev-comp R2) revealed that there should be only 1 solution without any mismatches.
Please have a look at the following read pair (2x250nt):
Read 1
Code:
>31.2_14/17 AATCTGGGCGACAAGAGTGAAACTCCGTCAAAAGAAAGAAAGAAAGAGACAAAGAGAGTTAGAAAGAAAGAAAGAGAGAGAGAGAGAAAGGAAGAAAGGAAGAAAAAGAAAGAAAAAGAAAGAAAGAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAA
Code:
>31.2_14/17 ACATCTCCCCTACCGCTATAGTAACTTGCTCTTTCTTTCCTTCCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTCTTTCTTTCTTTCTCTTTCTTTCTTTTTCTTTCTTTTTCTTCCTTTCTTCCTTTCTCTCTCTCTCTCTTTCTTTCTTTC
Code:
AATCTGGGCGACAAGAGTGAAACTCCGTCAAAAGAAAGAAAGAAAGAGACAAAGAGAGTTAGAAAGAAAGAAAGAGAGAGAGAGAGAAAGGAAGAAAGGAAGAAAAAGAAAGAAAAAGAAAGAAAGAGAAAGAAAGAAAGAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGGAAGGAAAGAAAGAGCAAGTTACTATAGCGGTAGGGGAGATGT
Thanks in advance for your help.
Sebastian
Comment