I'm analysing genotyping-by-sequencing data for 2,100 dual-indexed samples and am looking for a fast, mismatch-tolerant demultiplexer.
Libraries for each sample were prepared using a unique combination of Illumina I5 and I7 index sequences. Libraries were pooled and sequenced using two Hiseq lanes.
I’ve used the Java demuxbyname demultiplexer from the BBMap package, and this demultiplexed the data in around 2 hours on a 32-core machine. However, 30% of reads were rejected because they failed to match the index sequences used.
Pairwise Hamming distances for the I5 sequences (96 in all, one for each well position) vary between 3 and 6, so I'm hoping there shouldn't be a problem tolerating one mismatch there.
The I7 index sequences have pairwise Hamming distances between 2 and 6, so I would need to correct mismatching index reads only where they were one mismatch away from a single genuine index sequence and to reject them otherwise.
Would anyone know of a fast demultiplexer that could offer some or all of this functionality?
Thanks,
Stephen
Libraries for each sample were prepared using a unique combination of Illumina I5 and I7 index sequences. Libraries were pooled and sequenced using two Hiseq lanes.
I’ve used the Java demuxbyname demultiplexer from the BBMap package, and this demultiplexed the data in around 2 hours on a 32-core machine. However, 30% of reads were rejected because they failed to match the index sequences used.
Pairwise Hamming distances for the I5 sequences (96 in all, one for each well position) vary between 3 and 6, so I'm hoping there shouldn't be a problem tolerating one mismatch there.
The I7 index sequences have pairwise Hamming distances between 2 and 6, so I would need to correct mismatching index reads only where they were one mismatch away from a single genuine index sequence and to reject them otherwise.
Would anyone know of a fast demultiplexer that could offer some or all of this functionality?
Thanks,
Stephen
Comment