Originally posted by asiangg
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
-
Even if the genome is heterogeneous, it's still very unlikely that the same location will be broken twice or more. When the library goes up, we can compensate this by allowing 2 duplicates or 3 duplicates.
If the pull down of the antibody is very small, the materials for sequencing are very little as well. Then the machine will end up sequencing PCR products all the time, which results in a lot of duplicated reads. I don't think filtering by mapping quality can solve this problem.
- L
Originally posted by jwfoley View PostThe telltale sign is "stacks" of reads that all start at the same position. Of course, those could also be mapping artifacts, and when I filter by posterior probability I basically stop seeing those in my data. But if your sample still has an abundance of stacks after filtering, it might be bottlenecked. Ideally you should just redo the experiment with fewer PCR cycles and more input.
If you have more than one sample and a stack is in the same place in all of them, that's probably a mapping artifact. If the stacks seem to occur in random places (and they're in exons where you expect them), it's more suggestive of bottlenecking.
I still disagree. ChIP-seq is as quantitative as RNA-seq and you lose sensitivity by discarding data. A peak-caller won't work well if you've flattened all the peaks. As the throughput of our machines goes up, you'll be throwing away more and more perfectly good data; I suspect they're already well beyond the point where you toss out more signal than noise by doing this.
Really, the domain of sequences you'll pull out in transcription-factor ChIP-seq may be smaller than in RNA-seq: regardless of what proportion of the genome you think is transcribed, surely even less of it is bound by any individual protein. So the odds of duplicate reads containing signal instead of technical noise would actually by higher for TF ChIP-seq.
No, DNA is not equally strong in all places, and there are biases in where it likes to shear. Of course that's also true for RNA, and ligases have nucleotide preferences too. No amount of technical perfection will get around the heterogeneity of molecular biology.
Leave a comment:
-
Originally posted by asiangg View PostHi "jwfoley":
Can you tell me the definition of "PCR bottlenecked"? How do you judge whether a sample is "PCR bottlenecked"?
If you have more than one sample and a stack is in the same place in all of them, that's probably a mapping artifact. If the stacks seem to occur in random places (and they're in exons where you expect them), it's more suggestive of bottlenecking.
Originally posted by asiangg View PostAlthough I agree the use of posterior probability of mapping for RNA-seq, I still think we should remove those redundant reads for ChIP-seq. For RNA-seq, we cannot distinguish the PCR amplification from independent fragments so let's keep those duplicates if they appear and hope the PCR amplification applies uniformly to all fragments.
Really, the domain of sequences you'll pull out in transcription-factor ChIP-seq may be smaller than in RNA-seq: regardless of what proportion of the genome you think is transcribed, surely even less of it is bound by any individual protein. So the odds of duplicate reads containing signal instead of technical noise would actually by higher for TF ChIP-seq.
Originally posted by asiangg View PostBut for ChIP-seq, it's extremely unlikely for the sonicator to break the same genomic location for excessive times.
Leave a comment:
-
Hi "jwfoley":
Thank you for suggesting the use of MAPQ score. This seems to be very useful for RNA-seq.
Can you tell me the definition of "PCR bottlenecked"? How do you judge whether a sample is "PCR bottlenecked"?
Although I agree the use of posterior probability of mapping for RNA-seq, I still think we should remove those redundant reads for ChIP-seq. For RNA-seq, we cannot distinguish the PCR amplification from independent fragments so let's keep those duplicates if they appear and hope the PCR amplification applies uniformly to all fragments.
But for ChIP-seq, it's extremely unlikely for the sonicator to break the same genomic location for excessive times. So removing redundant reads basically eliminates PCR and mapping artifacts all together. Only in rare events it will throw away the useful information!
- L
Originally posted by jwfoley View PostThis came up in another thread and I explained it at the bottom. I'm not aware that anyone has published this, but I've heard a rumor that some similar approach will be used in new versions of popular short-read aligners.
Leave a comment:
-
Originally posted by asiangg View PostWell, it sounds so easy!!
However, would you let me know how you "spot it by eye"? In terms of "a confidence metric like posterior probability", any details regarding the calculation of it? Any references? Thx!
- L
Leave a comment:
-
Originally posted by asiangg View Postisn't PCR amplification or sequencing biased towards certain genes that are highly GC-rich or having certain sequence features? Do we really expect the redundancy to be the same for all genes?
In my case, the replicates contain similar read counts, so sequencing depth should not cause so much differences.
However, all of these things, including GC content are not changing between your replicates. How would GC content of a particular gene result in bias in one replicate but not others? Do you have reason to suspect that something went wrong (or at least differently) in construction or sequencing of one replicate versus the other?
It would be interesting to know more about the layout of your experiment. How many replicates are we talking about here? Is one library an outlier while all other replicates are highly similar by comparison? Is a standard number of PCR cycles used for amplification or is it varied to compensate for varying input amounts?
I agree with 'jwfoley' regarding throwing out data. If you can identify and characterize some bias and it is systematic in a way that can be corrected, then you can deal with it. If you can identify a particular biological replicate as being a serious failure (i.e. something went wrong during sample prep., library construction or sequencing) then you might consider discarding the whole replicate (as long as you can justify this). Failing these options you may have to live with the variability. It could well be biological variability... in which case you are stuck with it.
Leave a comment:
-
Well, it sounds so easy!!
However, would you let me know how you "spot it by eye"? In terms of "a confidence metric like posterior probability", any details regarding the calculation of it? Any references? Thx!
- L
Originally posted by jwfoley View PostIf you're keeping your PCR cycles reasonable (less than 20 cycles, ideally less than 15) bottlenecking just doesn't tend to be a problem, and if it is, you can just spot it by eye. Mapping artifacts are a problem, but they can be solved with a confidence metric like posterior probability instead of just using all unique best hits. If you think you have bias, check for it. Don't just throw away good data and make your results less quantitative to get rid of artifacts you might not have anyway.
Leave a comment:
-
If you're keeping your PCR cycles reasonable (less than 20 cycles, ideally less than 15) bottlenecking just doesn't tend to be a problem, and if it is, you can just spot it by eye. Mapping artifacts are a problem, but they can be solved with a confidence metric like posterior probability instead of just using all unique best hits. If you think you have bias, check for it. Don't just throw away good data and make your results less quantitative to get rid of artifacts you might not have anyway.
Leave a comment:
-
Both "malachig" and "RockChalkJayhawk" have made a few good points. Yes, we should keep the redundant reads in the library.
However, my response is: isn't PCR amplification or sequencing biased towards certain genes that are highly GC-rich or having certain sequence features? Do we really expect the redundancy to be the same for all genes?
In my case, the replicates contain similar read counts, so sequencing depth should not cause so much differences.
Leave a comment:
-
The dynamic range between the lowest and highest expressed mRNAs in a typical cell has been estimated at 10^5 to 10^7. If you remove redundant reads you can lose the ability to accurately measure this dynamic range. Duplicates might result from PCR amplification but as library depth increases you expect duplicates to occur even if your library has no PCR introduced amplification bias. In particular for short mRNAs that are highly expressed you will see a lot of duplicates, especially if your reads are not paired or you are not evaluating duplicates at the level of read pairs. As 'RockChalkJayhawk' indicates, if your libraries are of different depths, this can result in a large apparent difference in read counts for a particular gene between your replicates. Read counts and the occurrence of duplicates are pretty much meaningless when not considered in the context of library depth AND quality (high error rates can cause you to underestimate the presence of PCR introduced amplification bias...).
When we compare tag redundancy levels across RNA-Seq libraries we examine the mapping position of read pairs (outer genome coordinates of the sequenced cDNA fragments) on a per N reads mapped basis.
Leave a comment:
-
Originally posted by asiangg View PostI have had this concern because I have seen certain genes containing much higher read count from one biological replicate than the other replicates. Probably more than 100 folds! It's very unlikely to happen b/c of biological variation. They are more likely to be related with PCR bias.
Any thoughts?
- L
Rather than looking at one gene, why don't you instead look at the entire genome? If this particular gene is an outlier, it may possibly be due to biological variation.
Leave a comment:
-
Redundant reads are removed from ChIP-seq, what about RNA-seq?
I have dealt with both ChIP-seq and RNA-seq analysis. In ChIP-seq, it's almost a standard procedure to remove those redundant reads that map to the same location with the same orientation. It's reasonable because by chance it's very unlikely for the sonication to break the genomic sequence at the same location for more than twice during sample preparation. So, if we see the redundant reads, they are most likely PCR amplifications.
However, it seems NOT to be a standard to remove those redundant reads for RNA-seq. My understanding is that the total coding sequence length is much shorter than the genomic sequence length, which significantly increase the chance for the same location to be selected for sequencing. However, how do you distinguish the redundancy of amplification from random selection?
I have had this concern because I have seen certain genes containing much higher read count from one biological replicate than the other replicates. Probably more than 100 folds! It's very unlikely to happen b/c of biological variation. They are more likely to be related with PCR bias.
Any thoughts?
- L
Latest Articles
Collapse
-
by seqadmin
Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.
[Article Coming Soon!]...-
Channel: Articles
Yesterday, 08:07 AM -
-
by seqadmin
Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...-
Channel: Articles
09-23-2024, 06:35 AM -
-
by seqadmin
During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.
Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...-
Channel: Articles
09-09-2024, 10:59 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 10-02-2024, 04:51 AM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
10-02-2024, 04:51 AM
|
||
Started by seqadmin, 10-01-2024, 07:10 AM
|
0 responses
25 views
0 likes
|
Last Post
by seqadmin
10-01-2024, 07:10 AM
|
||
Started by seqadmin, 09-30-2024, 08:33 AM
|
1 response
31 views
0 likes
|
Last Post
by EmiTom
Yesterday, 06:46 AM
|
||
Started by seqadmin, 09-26-2024, 12:57 PM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
09-26-2024, 12:57 PM
|
Leave a comment: