Hello, I have a sequence file that has three columns.

The first one is chromosome, the second one is the position and the third one is the sequence.

Ex,

My question: is there a software to find the segment duplicates?

Or I need to develop an algorithm/code to find it?

Actually the definition of the duplicates here can be 100% match or 80% match?

Thanks for any hint.

Code:

chr10 89646218 TTTTTTGATTGGGGGATAATTGACCAATAAGGCTTTGATAGCCTCTATTGCCCAGGCCCCTCCTCTTCTTTTATGAGAGAAAGGATGAACAGTGACCAGAA chr10 89646221 TTTGATTGGGGGATAATTGGCCAATAAAGCTTTGATAGCCTCTATTGCCCAGGCCCCTCGTCTTCTTTTATGAGAGAAAGGATGAACAGTGACCAGAAATA chr10 89646225 ATTGGGGGATAATTGGCCAATAAAGCTTTGATAGCCTCTATTGCCCAGGCCCCTCCTCTTCTTTTATGAGAGAAAGGATGAACAGTGACCAGAAATAAAGG chr10 89646226 TTGGGGGATAATTGGCCAATAAAGCTTTGATAGCCTCTATTGCCCAGGCCCCTCCTCTTCTTTTATGAGAGAAAGGATGAACAGTGACCAGAAATAAAGGT chr10 89646229 GGGGATAATTGGCCAATAAAGCTTTGATAGCCTCTATTGCCCAGGCCCCTCCTCTTCTTTTATGAGAGAAAGGATGAACAGTGACCAGAGATAAAGGAATT chr10 89646232 GATAATTGGCCAATAAAGCTTTGATAGCCTCTATTGCCCAG chr10 89646237 ATGGCCAATAAAGGTTTGATAGCCTCTATTGCCCAGGCCCCTCCTCTTCTTTTATGAGAGAAAGGATGAACAGTGACCAGAAATAAAGGTATTGTTTTTTT chr10 89646238 TGGCCAATAAAGCTTTGATAGCCTCTATTGCCCAGGCCCCTCCTCCTCTTTTTTGTGAGAAAGGATGAACAGTGACCAGAAAAAAAGGGATTGTGTTTTTC chr10 89646242 CAATAAAGCTTTGATAGCCTCTATTGCCCAGGCCCCTCCTCTTCTTTTTTGAGAGAAAGGATGAACAGTGACCAGAAATAAAGGGATTGTTTTTTTTTATC

## Comment