Hi,
I have a data set that contains millions of sequences from 50 up to 600bp.
A lot of the sequences are redundant in that respect, that they are fragments of the bigger chunks.
like this:
>12124334
ABCDEFGHIJKLMNOPQRSTUVXYZ
>121
ABCD
>2343456
ABCDEFGHIJKLMNOPQRSTUV
>23123443556
CDEFGHIJKLMNOPQRSTUV
I am looking for a way to check (blast?) all sequences for being a fragment of another one (perfect hits in full length only) and to remove these sequences.
thanks alot!
I have a data set that contains millions of sequences from 50 up to 600bp.
A lot of the sequences are redundant in that respect, that they are fragments of the bigger chunks.
like this:
>12124334
ABCDEFGHIJKLMNOPQRSTUVXYZ
>121
ABCD
>2343456
ABCDEFGHIJKLMNOPQRSTUV
>23123443556
CDEFGHIJKLMNOPQRSTUV
I am looking for a way to check (blast?) all sequences for being a fragment of another one (perfect hits in full length only) and to remove these sequences.
thanks alot!
Comment