Unconfigured Ad

**parasitehunter** · 01-29-2013, 02:08 PM

Hi all -
I have basically the same question. Amplicon Illumina data that's been quality filtered (fastx), duplicate reads removed (fastx), aligned to the reference (~1.2kb) with bwa. Used ShoRAH to predict haplotypes - returned 13 haps with a frequency >1%. Most of these are frequency haplotypes and are due to a single SNP in a single read. Is there a way to collapse such sequences into their nearest relatives. Have done some searching, but no luck yet.
Thanks!

**Kennels** · 01-29-2013, 07:32 PM

Hi,
You probably can use the program, cd-hit-est in the cd-hit suite: http://weizhong-lab.ucsd.edu/cd-hit/
You can set a threshold identity (e.g. 90%, 95%), and it clusters all smaller lengthed sequences into a longer representative one when it falls above this threshold, much like generating a unigene set.

**parasitehunter** · 01-30-2013, 07:59 AM

Kennels -
Thanks for the idea - looks promising. I've been trying to get cd-hit-est to work (both locally and on their servers), but it keeps returning an error. Probably something I'm doing wrong. Or perhaps it's because all my predicted haplotypes are the same length. However, their cd_454 clusterer seems to work with my data. Hope that's legit to use ...

Topics	Statistics	Last Post
Study Captures the First Moments of DNA Replication by SEQadmin2 Started by SEQadmin2, 07-24-2026, 12:17 PM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 07-24-2026, 12:17 PM
Chemotherapy Leaves Detectable DNA Signatures in Childhood Tumors by SEQadmin2 Started by SEQadmin2, 07-23-2026, 11:41 AM	0 responses 18 views 0 reactions	Last Post by SEQadmin2 07-23-2026, 11:41 AM
Single-Cell Atlases Skew Toward European Ancestry, Analysis Finds by SEQadmin2 Started by SEQadmin2, 07-20-2026, 11:10 AM	0 responses 24 views 0 reactions	Last Post by SEQadmin2 07-20-2026, 11:10 AM
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, 07-13-2026, 10:26 AM	0 responses 37 views 0 reactions	Last Post by SEQadmin2 07-13-2026, 10:26 AM

Unconfigured Ad

How to collapse sequences excluding sequencing artifacts? (454)

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News