Hello, I have a sequence file that has three columns.
The first one is chromosome, the second one is the position and the third one is the sequence.
Ex,
My question: is there a software to find the segment duplicates?
Or I need to develop an algorithm/code to find it?
Actually the definition of the duplicates here can be 100% match or 80% match?
Thanks for any hint.
The first one is chromosome, the second one is the position and the third one is the sequence.
Ex,
Code:
chr10 89646218 TTTTTTGATTGGGGGATAATTGACCAATAAGGCTTTGATAGCCTCTATTGCCCAGGCCCCTCCTCTTCTTTTATGAGAGAAAGGATGAACAGTGACCAGAA chr10 89646221 TTTGATTGGGGGATAATTGGCCAATAAAGCTTTGATAGCCTCTATTGCCCAGGCCCCTCGTCTTCTTTTATGAGAGAAAGGATGAACAGTGACCAGAAATA chr10 89646225 ATTGGGGGATAATTGGCCAATAAAGCTTTGATAGCCTCTATTGCCCAGGCCCCTCCTCTTCTTTTATGAGAGAAAGGATGAACAGTGACCAGAAATAAAGG chr10 89646226 TTGGGGGATAATTGGCCAATAAAGCTTTGATAGCCTCTATTGCCCAGGCCCCTCCTCTTCTTTTATGAGAGAAAGGATGAACAGTGACCAGAAATAAAGGT chr10 89646229 GGGGATAATTGGCCAATAAAGCTTTGATAGCCTCTATTGCCCAGGCCCCTCCTCTTCTTTTATGAGAGAAAGGATGAACAGTGACCAGAGATAAAGGAATT chr10 89646232 GATAATTGGCCAATAAAGCTTTGATAGCCTCTATTGCCCAG chr10 89646237 ATGGCCAATAAAGGTTTGATAGCCTCTATTGCCCAGGCCCCTCCTCTTCTTTTATGAGAGAAAGGATGAACAGTGACCAGAAATAAAGGTATTGTTTTTTT chr10 89646238 TGGCCAATAAAGCTTTGATAGCCTCTATTGCCCAGGCCCCTCCTCCTCTTTTTTGTGAGAAAGGATGAACAGTGACCAGAAAAAAAGGGATTGTGTTTTTC chr10 89646242 CAATAAAGCTTTGATAGCCTCTATTGCCCAGGCCCCTCCTCTTCTTTTTTGAGAGAAAGGATGAACAGTGACCAGAAATAAAGGGATTGTTTTTTTTTATC
Or I need to develop an algorithm/code to find it?
Actually the definition of the duplicates here can be 100% match or 80% match?
Thanks for any hint.
Comment