Seqanswers Leaderboard Ad

**donquijotes** · 07-28-2015, 08:02 AM

Hi sudders,

From my very limited experience with random barcodes, I've heard that this bias could be due to 2 reasons.
a) Synthesis bias. Some oligo companies suggest manually mixing the 4 bases when they do random synthesis. One rep once told me that they have seen 20-30% variation/bias in base incorporation with automatic mixing.
b) Ligation bias. Some bases/sequences ligate less efficiently to your library.

BTW, I would like to design a 5 random barcode and have it in both ends of my DNA library. Do you know of any open source pipeline out there that will help me get rid of the PCR duplicates using both paired end barcodes? (1 million combinations total)

**nucacidhunter** · 07-28-2015, 05:25 PM

An easy option is to use indexes with 6 bases from a well-established kit and add 6 random bases to follow index read. By doing a 12 cycle index read one can identify PCR dups based on those 6 random bases. For more diverse UMI one can add 8 random bases and increase sequencing index to 14 cycles. Obviously, this is applicable to Y adapters and library must be amplified with short P5 and P7 primers.

**donquijotes** · 07-30-2015, 05:46 AM

Hi nucacidhunter,

Thank you for your input. What I've seen many people do out there is read the UMI as part of their DNA library insert and not as part of the index. I still don't know how Agilent does this with their HaloPlex HS.

I saw few software out there that can find the UMI and add it in the header (TagDust2 and MiGec for example) but I don't have a clue how to proceed from that point and get rid of PCR duplicates...

Any idea? I'll start a thread since I am rather clueless with the whole procedure.

**mikesh** · 08-11-2015, 11:01 AM

Hi donquijotes,

Sorry for a late reply, I'm not checking this forum very often.

First, in our practice we integrate UMIs using RT-PCR template-switching. We don't see a severe synthesis biases in UMI sequences (see http://www.jimmunol.org/content/194/...ml?with-ds=yes figure B). Note that there is another good study covering possible biases in UMI-based sequencing (see http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3562004/).

Indeed, UMI usage distribution is log-normal. We observe it in all our datasets and explain it by PCR amplification. Once you append read mapping position to you UMI in header, you can assemble consensus sequences and forget about raw reads, counting only UMIs.

Unfortunately it is not possible to assemble reads in MIGEC based on UMI+position as it was designed for amplicon libraries.

With 10-12bp the diversity of UMIs would be 10^5 - 1.7x10^7. If you estimate it to be >> number of starting molecules, you can simply run "Checkout" and "Assemble" routines of MIGEC (see docs here) to get a list of assembled consensuses.

Hope this helps,
Mike

**sudders** · 11-23-2015, 02:30 AM

I just though people might like to know that we never did get to the bottom of the baised UMI usage, but we did find an even bigger problem - PCR and sequencing errors in UMIs.

We've created some tools for dealing with UMIs - they can process fastq files to move the UMI sequence from the read to the name pre-mapping and then a tool that implements a number of schemes for error aware deduplication post mapping.

See https://github.com/CGATOxford/UMI-tools

Ian
---

Topics	Statistics	Last Post
Study Highlights Challenges in Cellular Reprogramming for Regenerative Medicine by seqadmin Started by seqadmin, Today, 06:25 AM	0 responses 13 views 0 likes	Last Post by seqadmin Today, 06:25 AM
New DNA Modification Discovered as Key to Gene Activation in Early Development by seqadmin Started by seqadmin, Yesterday, 01:02 PM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 01:02 PM
Wastewater Analysis Unlocks New Method for Identifying Public Health Threats by seqadmin Started by seqadmin, 09-18-2024, 06:39 AM	0 responses 14 views 0 likes	Last Post by seqadmin 09-18-2024, 06:39 AM
Molecular Markers Shared Across Dementias by seqadmin Started by seqadmin, 09-11-2024, 02:44 PM	0 responses 14 views 0 likes	Last Post by seqadmin 09-11-2024, 02:44 PM

Seqanswers Leaderboard Ad

Announcement

Bias in unique molecular identifier usage

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News