Seqanswers Leaderboard Ad

**swbarnes2** · 11-29-2011, 09:52 AM

That sounds horribly memory intensive, that's probably why almost no one does it that way.

**HESmith** · 11-29-2011, 10:59 AM

I agree but, without a reference genome for alignment, it seems like the only option. A simplistic approach would be to generate hash tables using the first 10 nucleotides from read 1 and read 2 as the key, and keep only one sequence per key. It doesn't account for sequencing errors, but would probably be good enough for my purposes (or at least give me a sense of how much duplication is present). Alternatively, I suppose I could build an assembly from the whole data set, then align to that assembly to identify duplicates.

Any advice/recommendations/alternative approaches would be welcome.

**stuka** · 11-29-2011, 11:03 AM

I've developed a naive tool to brute force compare to do some basic removal using hadoop

GitHub - oklasoft/b-tangs: Binning Trimmer of Artifacts in Next Gen Sequence - Clean out possible PCR artifacts by searching for like sequence reads via some map reduce

https://github.com/oklasoft/b-tangs

Binning Trimmer of Artifacts in Next Gen Sequence - Clean out possible PCR artifacts by searching for like sequence reads via some map reduce - oklasoft/b-tangs

**rudi283** · 11-30-2011, 03:13 AM

In Genomics Workbench, from CLC Bio, you can remove PCR duplicates before alignment

Topics	Statistics	Last Post
Study Highlights Challenges in Cellular Reprogramming for Regenerative Medicine by seqadmin Started by seqadmin, Today, 06:25 AM	0 responses 13 views 0 likes	Last Post by seqadmin Today, 06:25 AM
New DNA Modification Discovered as Key to Gene Activation in Early Development by seqadmin Started by seqadmin, Yesterday, 01:02 PM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 01:02 PM
Wastewater Analysis Unlocks New Method for Identifying Public Health Threats by seqadmin Started by seqadmin, 09-18-2024, 06:39 AM	0 responses 14 views 0 likes	Last Post by seqadmin 09-18-2024, 06:39 AM
Molecular Markers Shared Across Dementias by seqadmin Started by seqadmin, 09-11-2024, 02:44 PM	0 responses 14 views 0 likes	Last Post by seqadmin 09-11-2024, 02:44 PM

Seqanswers Leaderboard Ad

Announcement

how to filter unaligned duplicate reads

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News