Seqanswers Leaderboard Ad

**simonandrews** · 10-06-2009, 11:26 PM

If your library has been randomly fragmented then it should be possible to look for fragments which appear way more frequently than would be expected by chance. In even a short mRNA you should have a range of different possible fragments and if this is an abundant transcript most or all of these should appear more frequently. PCR artefacts usually affect only a small subset of all possible fragments and would produce a very uneven distribution of fragments over the transcript.

**lihairi** · 10-07-2009, 04:46 PM

Thank you for interpretation. You mean we can ignore PCR artefacts because they affect only a small subsets of fragments and it will not affect final analysis results. However, sometimes we found so many tags hit the same positions so that we cannot ingore them.

Again, is there any other way in the above situation to discriminate between a genuine reflection of abundant RNA or PCR artefacts?

Thanks,

Hai-Ri Li

**simonandrews** · 10-07-2009, 11:42 PM

Originally posted by lihairi View Post

You mean we can ignore PCR artefacts because they affect only a small subsets of fragments and it will not affect final analysis results.

No, that's not what I'm saying. What I was trying to say was that you can normally distinguish PCR artefacts from expression changes because expression changes normally involved the even enrichment of a large number of different fragments over the expressed region, whereas PCR artefacts usually take only a small number of fragments and amplify them to an unnatural degree when viewed in the context of the surrounding fragments.

We usually filter our data by measuring the percentage of reads in a region which come from exact overlaps. If this value is above 5-10% then we reject it as a likely PCR artefact. I'm intending to move this to an observed/expected calculation though as this is less prone to errors in very short regions with high coverage.

**lihairi** · 10-22-2009, 07:50 AM

Last time you mentioned "We usually filter our data by measuring the percentage of reads in a region which come from exact overlaps". Here region must mean a window, 100 base? 200 base?

How to do observed/expected calculation?

Thanks.

**simonandrews** · 10-22-2009, 11:50 PM

Originally posted by lihairi View Post

Last time you mentioned "We usually filter our data by measuring the percentage of reads in a region which come from exact overlaps". Here region must mean a window, 100 base? 200 base?

In our case region is pretty generic - sometimes we use fixed size windows (with a size which depends normally on our data density), in other cases we construct contigs from sets of overlapping reads, or we might design probes over particular classes of annotation feature (genes, exons, microRNAs, whatever). These things change depending on what kind of experiment you're running.

Originally posted by lihairi View Post

How to do observed/expected calculation?

We're actually not using a proper O/E calculation at the moment (though it would be nice to move to that). Our filter calculates what percentage of reads which overlap a particular region come from exact overlaps, with the same start and end position. For randomly placed reads this value is usually very low (below 5%), but in some cases you will see towers of exactly duplicated reads which usually indicate a mapping or PCR problem, and we filter these out. You can also get high values from low absolute numbers of reads, so you either need to account for this, or ignore it if you're going to filter those regions anyway.

I hope that makes things a bit clearer.

**lihairi** · 10-23-2009, 12:23 PM

Recently I downloaded a lot of RNA-seq data from NCBI and mapped to Reference RNA using eland_25. I found around 30-40% of tags were mapped to the exactly same postions as others (even though removing the effect of RNA isorforms), much higher than 5%. I do not know how to interpretate this results.

Topics	Statistics	Last Post
New Method for DNA Sequence Amplification by seqadmin Started by seqadmin, Today, 08:18 AM	0 responses 8 views 0 likes	Last Post by seqadmin Today, 08:18 AM
New Tools Enhance Single-Molecule DNA Analysis with Minimal Samples by seqadmin Started by seqadmin, Today, 08:04 AM	0 responses 10 views 0 likes	Last Post by seqadmin Today, 08:04 AM
SIX2 Protein Identified as a Key Player in Prostate Cancer Treatment Resistance by seqadmin Started by seqadmin, 06-03-2024, 06:55 AM	0 responses 13 views 0 likes	Last Post by seqadmin 06-03-2024, 06:55 AM
Genetic Mosaicism More Prevalent Than Previously Thought by seqadmin Started by seqadmin, 05-30-2024, 03:16 PM	0 responses 27 views 0 likes	Last Post by seqadmin 05-30-2024, 03:16 PM

Seqanswers Leaderboard Ad

Announcement

question about RNA-seq

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News