comparing large sets of sequences
suppose you have 2 large sequences or sets of sequences that you want to compare
for matching entries.
E.g. you sequenced some ancient bone and want to check for bacterial contamination
For simplicity assume you have 2 sets of 1000 nucleotide sequences of length 1000 ,
1GB each set that you want to compare against each other, find the best pairs of matching
sequences or subsequences.
Sounds like a standard problem, doesn't it ?
How is it done ? What is the best, fastest method ?
suppose you have 2 large sequences or sets of sequences that you want to compare
for matching entries.
E.g. you sequenced some ancient bone and want to check for bacterial contamination
For simplicity assume you have 2 sets of 1000 nucleotide sequences of length 1000 ,
1GB each set that you want to compare against each other, find the best pairs of matching
sequences or subsequences.
Sounds like a standard problem, doesn't it ?
How is it done ? What is the best, fastest method ?
Comment