Unconfigured Ad

**son_nexg** · 09-13-2011, 07:00 PM

Hi Just to add to the above question - Could estimate amount of PCR duplication in the RNA-seq data ?

Thank you!

**robs** · 09-13-2011, 07:15 PM

There are some papers on this topic if you search in Google Scholar and other posts at seqanswer that discussed this topic (use the search function).

The short answer is that you can't tell for sure if the read is artificial or real. It is dependent on a number of factors such as sequencing technology used, expected coverage, read length, etc. There are some approaches that make some assumptions to identify artificial duplicates (e.g. metagenomic reads starting with the same bases are assumed to be duplicates).

I see a similar number of duplicates for 454/Roche sequencing independent of the type of sample sequenced (metagenome, metatranscriptome, ...).

Maybe you can give some more details about your data.

**son_nexg** · 09-13-2011, 07:25 PM

Thanks for your reply 'robs'.

I will have a look at the literature on this.
I was just wondering about it ... so far I was dealing with the DNA seq data and I would expect roughly 10% duplicates in a typical run. But with RNA-seq the story is little different. We start with a very-2 low amount of starting ploy-A capture RNA and then have to amplify it many fold to get decent amount for the sequencing run. Which makes it more prune to having PCR duplicates in the final data.

I can see people are working on protocols for transcriptome data where you can do away with PCR amplification step (e.g. http://www.nature.com/nmeth/journal/...meth.1417.html) but as of now Illumina's protocols use PCR and we need to have reasonal filters to get some real information out of the sequence data.

**james hadfield** · 09-15-2011, 11:47 AM

you could add a 4bp random sequence in your barcode read or at the 5'end of your oligo for ligation. This way you can see if a read is a duplicate of PCR. You should not see the same random sequence, unless PCR has amplified it so.

**rskr** · 09-15-2011, 12:05 PM

If 90% of the reads in your data is identical to one read, then they are probably duplicates

Topics	Statistics	Last Post
Engineered Protein Motor Takes Its First Steps Along DNA Track by SEQadmin2 Started by SEQadmin2, Yesterday, 11:05 AM	0 responses 7 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:05 AM
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, 07-02-2026, 11:08 AM	0 responses 28 views 0 reactions	Last Post by SEQadmin2 07-02-2026, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 27 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 26 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM

Unconfigured Ad

How to differentiate between PCR duplicates and real data?

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News