dedupe.sh
Dear Brian,
I have been looking for a tool that would quickly dereplicate (100% containments) nucleotide sequences and track for each unique sequence the identifiers of the removed duplicates.
Something like:
dedupe.sh in=in.fa out=out.fa outd=outd.fa mid=100 mop=100
where:
in.fa:
seq1
seq2 (contained in seq1)
seq3 (contained in seq1)
seq4
out.fa:
seq1
seq4
outd.fa:
seq2
seq3
I am interested in:
seq1<tab>seq2,seq3
seq4
dedupe.sh does a fantastic job in returning out and outd, but I cannot find any option that would return the information I am interested in. Is this something that I am missing? Otherwise, I believe this could be a great feature, since compared to other tools that return this information, dedupe is so much faster.
Best,
Shini
Dear Brian,
I have been looking for a tool that would quickly dereplicate (100% containments) nucleotide sequences and track for each unique sequence the identifiers of the removed duplicates.
Something like:
dedupe.sh in=in.fa out=out.fa outd=outd.fa mid=100 mop=100
where:
in.fa:
seq1
seq2 (contained in seq1)
seq3 (contained in seq1)
seq4
out.fa:
seq1
seq4
outd.fa:
seq2
seq3
I am interested in:
seq1<tab>seq2,seq3
seq4
dedupe.sh does a fantastic job in returning out and outd, but I cannot find any option that would return the information I am interested in. Is this something that I am missing? Otherwise, I believe this could be a great feature, since compared to other tools that return this information, dedupe is so much faster.
Best,
Shini
Comment