Seqanswers Leaderboard Ad

**DrYak** · 04-07-2016, 06:58 AM

Hi,

Well, I found (to my chagrin) that cd-hit has an aux tools package containing the cd-hit-dup tool.

I do not, however, get the same results using cd-hit-est and cd-hit-dup.

If I use cd-hit with the following parameters:

cd-hit-est -i in.fasta -o out -c 0.95 -n 10 -d 0 - T 20

I get 85497 finished 69413 clusters

i.e. 69413 clusters from 85497 starting sequences.

If I use cd-hit-dup with the following parameters:

cd-hit-dup -i in.fasta -o out-nodupes.fasta -m false -e 0.05 -f true

Which as far as I know should have the same similarity cut-off (95%) and remove smaller sequences (-m false) and chimeras, I get:

Number of reads: 85497
Number of clusters found: 82927
Number of chimeric clusters found: 6

i.e 82921 clusters from 85497 starting sequences.

Can someone suggest an explanation for the such a huge difference?

Thanks in advance.

**mastal** · 04-07-2016, 07:05 AM

I think what you want is software that calls a consensus sequence from each cluster, rather than dedupe.

Topics	Statistics	Last Post
New Toolkit Enhances Plant Mitochondrial Genome Research by seqadmin Started by seqadmin, 05-24-2024, 07:15 AM	0 responses 16 views 0 likes	Last Post by seqadmin 05-24-2024, 07:15 AM
Catalog of Gene-Isoform Variation in Developing Human Brain by seqadmin Started by seqadmin, 05-23-2024, 10:28 AM	0 responses 18 views 0 likes	Last Post by seqadmin 05-23-2024, 10:28 AM
Ancient Viral Sequences in Human Brain Linked to Psychiatric Disorders by seqadmin Started by seqadmin, 05-23-2024, 07:35 AM	0 responses 22 views 0 likes	Last Post by seqadmin 05-23-2024, 07:35 AM
New Milestone for COSMIC with Extensive Cancer Mutation Data by seqadmin Started by seqadmin, 05-22-2024, 02:06 PM	0 responses 11 views 0 likes	Last Post by seqadmin 05-22-2024, 02:06 PM

Seqanswers Leaderboard Ad

Announcement

Dedupe on assembled RNA-Seq?

Comment

Comment

Latest Articles

ad_right_rmr

News