Hey,
we are rather new to the sequencing scene and have a couple of datasets generated by different platforms of our bacterial genomes. The platforms include "older" FLX, Titanium paired (different sizes) and Illumina mate pairs.
The 454 gives a bias for polymeric stretches, especially the Aaa's. Checking the Roche's software they seem to correct for this bias but still its point of attention. My question relates to another bias only recently made public about clonal reads (EDIT: a better term might be DUPLICATE READS :: in theory this never happens but in practice it does). It ends up with exact the same start and end in sequences which is almost impossible by chance. I heard a few people about filtering these out since they might falsely contribute to raise coverage above cutoff in assembly.
Is there a good tool to do this or is someone willing to share their scripts. It seems a reinvention of the wheel when I need to rewrite these while many people already seem to filter their data for it....
Thanks
Alex
we are rather new to the sequencing scene and have a couple of datasets generated by different platforms of our bacterial genomes. The platforms include "older" FLX, Titanium paired (different sizes) and Illumina mate pairs.
The 454 gives a bias for polymeric stretches, especially the Aaa's. Checking the Roche's software they seem to correct for this bias but still its point of attention. My question relates to another bias only recently made public about clonal reads (EDIT: a better term might be DUPLICATE READS :: in theory this never happens but in practice it does). It ends up with exact the same start and end in sequences which is almost impossible by chance. I heard a few people about filtering these out since they might falsely contribute to raise coverage above cutoff in assembly.
Is there a good tool to do this or is someone willing to share their scripts. It seems a reinvention of the wheel when I need to rewrite these while many people already seem to filter their data for it....
Thanks
Alex
Comment