Hello I have 2 Multi-Fasta files with different Headers - a Reference file and a Test file
Reference file example
>gi|536779208|gb|GANF01000001.1| TSA: Momordica charantia Locus_17026_Transcript_1/1_Confidence_1.000_Length_828 transcribed RNA sequence
CGGGCGTAGCGACGAACGGCGGCGAAGACGACGCTCCAATCGAGGAGGTACTGGTTTTCAATCGCTTCCG
TGAATTAGTTTCGGTCCCTGCGGAGGAAGAGGAATGTTTGGGAGGCAGAGCCGCAACGCCAGGAATGGCG
CTCAAATCGTACTCAGAATCATCGTTGTAGAAACGGGAAGGGGAAGATTGAATCTGGGAGTGAGAATTGG
...
Test file example
>gi|537289490|gb|GANG01000001.1| TSA: Momordica charantia Locus_12460_Transcript_2/3_Confidence_0.400_Length_1699 transcribed RNA sequence
TGTCTGTGTTTTAGAGATATGAAAAGTGTTGGCCTAGTGCCTGATAATGTAATTTATACTATACTTATAG
ATGGGTTTTGTCGAAATGGTGCTATTTCAGATGCTCTGAAAATGCGGGACGAGATGCTTGCTCAGGGCTG
TGTTATGGATGTGGTTGCGTACAATACTATTTTGAATGGGTTATGCAAGAAAAAGATGTATGTTGACGCA
..
These files contain ~51000 entries.
I want to separate out entries that are similar in the Reference and Test with the preference of setting a similarity percentage - like 95% similar or so.
The output would ideally be in 2 files - the similar ones and the excluded ones.
Can the multiple sequence alignment programs like Mummer do that? or BLAST? If any similar program exists please help me out
Thank you.
Reference file example
>gi|536779208|gb|GANF01000001.1| TSA: Momordica charantia Locus_17026_Transcript_1/1_Confidence_1.000_Length_828 transcribed RNA sequence
CGGGCGTAGCGACGAACGGCGGCGAAGACGACGCTCCAATCGAGGAGGTACTGGTTTTCAATCGCTTCCG
TGAATTAGTTTCGGTCCCTGCGGAGGAAGAGGAATGTTTGGGAGGCAGAGCCGCAACGCCAGGAATGGCG
CTCAAATCGTACTCAGAATCATCGTTGTAGAAACGGGAAGGGGAAGATTGAATCTGGGAGTGAGAATTGG
...
Test file example
>gi|537289490|gb|GANG01000001.1| TSA: Momordica charantia Locus_12460_Transcript_2/3_Confidence_0.400_Length_1699 transcribed RNA sequence
TGTCTGTGTTTTAGAGATATGAAAAGTGTTGGCCTAGTGCCTGATAATGTAATTTATACTATACTTATAG
ATGGGTTTTGTCGAAATGGTGCTATTTCAGATGCTCTGAAAATGCGGGACGAGATGCTTGCTCAGGGCTG
TGTTATGGATGTGGTTGCGTACAATACTATTTTGAATGGGTTATGCAAGAAAAAGATGTATGTTGACGCA
..
These files contain ~51000 entries.
I want to separate out entries that are similar in the Reference and Test with the preference of setting a similarity percentage - like 95% similar or so.
The output would ideally be in 2 files - the similar ones and the excluded ones.
Can the multiple sequence alignment programs like Mummer do that? or BLAST? If any similar program exists please help me out
Thank you.
Comment