I'm looking over a newbler assembly of older 454 reads (~200bp) and it seems to me that base disagreements are padded and offset, instead of aligned on top of each other (see below); this seems like it will prevent automated SNP-finding, e.g. with the Marth Lab's polyBayesShort or (the updated version) GigaBayes (see the Marth Lab page at Boston College) ...
Here's an example of what I mean in the consed view of the newbler-produced ace file:
consensus:
...TTAT*cAGTGT...
reads:
...TTATg*AGTGT...
...TTATg*AGTGT...
...TTAT*cAGTGT...
...TTAT*cAGTGT...
...TTAT*cAGTGT...
... and here's what I expected:
consensus:
...TTATcAGTGT...
reads:
...TTATGAGTGT...
...TTATGAGTGT...
...TTATCAGTGT...
...TTATCAGTGT...
...TTATCAGTGT...
This is obviously a SNP candidate, but the former representation (padded and offset) is going to be harder to find with someone else's tool or my own scripts. I'm not seeing *any* of the latter case with this assembly ... but I've definitely seen the latter case with 454 reads assembled with PCAP. Does anyone recognize this ... am I missing some newbler default behavior?
Here's an example of what I mean in the consed view of the newbler-produced ace file:
consensus:
...TTAT*cAGTGT...
reads:
...TTATg*AGTGT...
...TTATg*AGTGT...
...TTAT*cAGTGT...
...TTAT*cAGTGT...
...TTAT*cAGTGT...
... and here's what I expected:
consensus:
...TTATcAGTGT...
reads:
...TTATGAGTGT...
...TTATGAGTGT...
...TTATCAGTGT...
...TTATCAGTGT...
...TTATCAGTGT...
This is obviously a SNP candidate, but the former representation (padded and offset) is going to be harder to find with someone else's tool or my own scripts. I'm not seeing *any* of the latter case with this assembly ... but I've definitely seen the latter case with 454 reads assembled with PCAP. Does anyone recognize this ... am I missing some newbler default behavior?
Comment