Seqanswers Leaderboard Ad

**SES** · 01-31-2012, 09:01 AM

Originally posted by Retro View Post

We have Ion Torrent reads from retrovirus (transposon) integration sites in unsequenced genome and we need to cluster them by sequence identity. The first fifty bases of each read is always the transposon end and the rest is basically random piece of genomic DNA that flanks the insertion. We need to collapse or cluster the reads from each unique integration site together. Currently we use de novo assembly algorithms, but those perform poorely. We need to relax the stringency of alignment because of the sequencing errors, and then de novo assembly joins artificially clusters together. Our clusters should have length of only one read.

Would anybody know of suitable algorithm to create these single read clusters?

As I was preparing a response it became less clear exactly what you are trying to achieve. When you say that you want to relax the stringency of alignment associated with assembly and use a clustering approach, that makes since. When you say that clusters should contain one read, that seems completely in conflict with the previous statement. Could you clarify your post?

**Retro** · 01-31-2012, 09:16 AM

Thanks for your response. The clusters should have a length of one read. They can contain for example 50 reads, but all reads start at position 1 ("left side" in aligned cluster). The reads in a cluster might differ in length based on the initial fragmentation.

To make it more difficult, our reads come from a pool of animals, so in addition to sequencing errors we also see SNPs. That is why we cannot use assembly based on let's say 99% homology. The de novo algorithm then starts adding read to our clusters that extend the cluster in length, mosty based on random inverted repeats in the genomic tags.

**Retro** · 02-21-2012, 06:58 PM

OK, finally I found a great program USEARCH (http://www.drive5.com/usearch/usearch_docs.html) that does exactly that.

Topics	Statistics	Last Post
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, Yesterday, 10:17 AM	0 responses 7 views 0 reactions	Last Post by seqadmin Yesterday, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 59 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM
Mapping the snoRNAome in Zebrafish to Advance Disease Research by seqadmin Started by seqadmin, 03-18-2025, 12:50 PM	0 responses 50 views 0 reactions	Last Post by seqadmin 03-18-2025, 12:50 PM

Seqanswers Leaderboard Ad

clustering algorithm for single reads from transposon integrations

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News