Hi all,
I am trying to submit a transcriptome assembly to the TSA.
The format is like this:
>seq1234
TTTTTTTNNNTTTTTTTTTTTTGGTTTTCTTGAGTAAAGTAAAAAAACCTGAATGATG
GATGAGGCGAATGATGTGAGGATAAATNNNNAAACGANTNTTATAAGATGTAAAAGTT
GTCATTAACTTAGTAAAGGCCCTAATTATTGAAGTTAATTATTCCAATGGATAAAAAT
>seq1235
AGACACATCGTGTGTTTCTGGATCTTTTTCAGCTTCTTCCTTCAAATCTACTCTGGTT
GGTGCTGCTGTCAACTGCATCATTTTCGTTTGCTNNNNNCTTTTTGGCCGGAGCATCA
and so on...
The TSA are asking for this criteria:
Ambiguous bases should not be more than total 10% length or more than 14n's in a row.
Does someone knows quick linux based solution for this?
I googled it, but i found only solutions to replace the ambiguous as this:
or this,
but i have perl issues with this..
any linux based solution will be appreciate!
Thanks
I am trying to submit a transcriptome assembly to the TSA.
The format is like this:
>seq1234
TTTTTTTNNNTTTTTTTTTTTTGGTTTTCTTGAGTAAAGTAAAAAAACCTGAATGATG
GATGAGGCGAATGATGTGAGGATAAATNNNNAAACGANTNTTATAAGATGTAAAAGTT
GTCATTAACTTAGTAAAGGCCCTAATTATTGAAGTTAATTATTCCAATGGATAAAAAT
>seq1235
AGACACATCGTGTGTTTCTGGATCTTTTTCAGCTTCTTCCTTCAAATCTACTCTGGTT
GGTGCTGCTGTCAACTGCATCATTTTCGTTTGCTNNNNNCTTTTTGGCCGGAGCATCA
and so on...
The TSA are asking for this criteria:
Ambiguous bases should not be more than total 10% length or more than 14n's in a row.
Does someone knows quick linux based solution for this?
I googled it, but i found only solutions to replace the ambiguous as this:
or this,
but i have perl issues with this..
any linux based solution will be appreciate!
Thanks
Comment