Seqanswers Leaderboard Ad

**arvid** · 01-30-2012, 11:57 PM

I usually see 60-80 % duplication levels in non-normalized RNA-Seq samples. Did you de-multiplex the sequences to see whether there are differences in duplication levels between the normalized and non-normalized libraries? I'd suspect that the normalization didn't work out very well.

FastQC looks at initial 50-mers for overrepresentation, but as you pointed out yourself, only some adapters were found on the fw strand. You can remove those with e.g. trimmomatic.

I'm not sure about Picard, but samtools rmdup only works on mapped reads...

Did you check the rRNA contamination levels already?

**arvid** · 01-31-2012, 12:00 AM

You could run a k-mer counter on the data to check for overrepresentation, e.g. Meryl or Jellyfish...

**simonandrews** · 01-31-2012, 12:34 AM

High duplication levels in RNA-Seq are not necessarily a problem. Duplication simply means that you're getting very high fold coverage. For RNA-Seq it's quite normal to oversequence highly expressed transcripts in order to be able to see lowly expressed transcripts. Duplication warnings are more of a concern when they occur in libraries where you're expecting more equal coverage. 60% also isn't very high - a badly PCR duplicated library might have duplication levels above 90% (our personal record is 98%!). For more details of how to interpret this plot you can look at this blog post.

**harryzs** · 01-31-2012, 03:55 AM

I agree.

see this http://seqanswers.com/forums/showthr...ght=duplicates

Originally posted by simonandrews View Post

High duplication levels in RNA-Seq are not necessarily a problem. Duplication simply means that you're getting very high fold coverage. For RNA-Seq it's quite normal to oversequence highly expressed transcripts in order to be able to see lowly expressed transcripts. Duplication warnings are more of a concern when they occur in libraries where you're expecting more equal coverage. 60% also isn't very high - a badly PCR duplicated library might have duplication levels above 90% (our personal record is 98%!). For more details of how to interpret this plot you can look at this blog post.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Source of duplication in illumina hiseq paired-end reads?

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News