Seqanswers Leaderboard Ad

**Melissa** · 12-10-2019, 05:32 AM

What I will do is to write my own script to
1) blastn the sequences against itself (hopefully your sequences are long enough to justify using blast)
2) filter the results to remove blastn results of the same sequences and min e-value
3) Do single linkage clustering based on the blastn results
4) Choose the longest sequence for each cluster

There should be an easier way by using k-mer?!

**yzzhang** · 12-29-2019, 10:01 PM

have you tried CD-hit?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 33 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Recover only longest version of sequence from multiple sequence fasta file - help

Comment

Comment

Latest Articles

ad_right_rmr

News