Hello!
Who can help me with a writing a script?
1. For each file in the set (archive, probably):
Open input nucleotide sequence (.embl format is desirable, but fasta format is possible);
Calculate the observable frequency for all possible words based on input word size (e.g. number of all possible words for word size = 3 (triplet) is 64). Obs. Frequency = Obs Count/Total count [Сompseq is work in a similar way, but not support a batch processing].
Shift the reading frame by 1 nucleotide and repeat the previous step (number of shifts = word size - 1). The file with frequencies for each file in the set save as name_of_the_file(id of a contig).dic(+word size). It looks sth like this (word size = 2):
2. Make a summary file, something like this (example for word size = 2)
Who can help me with a writing a script?
1. For each file in the set (archive, probably):
Open input nucleotide sequence (.embl format is desirable, but fasta format is possible);
Calculate the observable frequency for all possible words based on input word size (e.g. number of all possible words for word size = 3 (triplet) is 64). Obs. Frequency = Obs Count/Total count [Сompseq is work in a similar way, but not support a batch processing].
Shift the reading frame by 1 nucleotide and repeat the previous step (number of shifts = word size - 1). The file with frequencies for each file in the set save as name_of_the_file(id of a contig).dic(+word size). It looks sth like this (word size = 2):
2. Make a summary file, something like this (example for word size = 2)
Comment