Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Depth of coverage variation in RNA Seq

    We are doing RNA Seq on Illumina GAII, and are seeing a lot of variation in coverage, which doesn't look as though it's Poisson distributed. Our sample originated from an RNA virus, but also contains some host rRNA, and both are showing this variation in coverage.

    There is some correlation of coverage depth with GC content, but it doesn't explain most of the variation. Do others see this kind of variability of coverage?

    We have hypothesised that secondary structure and amplification artefacts would explain some of the variation, but has anyone seen any research explaining variation in depth of coverage of RNA-Seq?
    I've looked but haven't found anything remotely conclusive.

  • #2
    Originally posted by jay View Post
    We are doing RNA Seq on Illumina GAII, and are seeing a lot of variation in coverage, which doesn't look as though it's Poisson distributed. Our sample originated from an RNA virus, but also contains some host rRNA, and both are showing this variation in coverage.

    There is some correlation of coverage depth with GC content, but it doesn't explain most of the variation. Do others see this kind of variability of coverage?

    We have hypothesised that secondary structure and amplification artefacts would explain some of the variation, but has anyone seen any research explaining variation in depth of coverage of RNA-Seq?
    I've looked but haven't found anything remotely conclusive.
    Variation in depth of coverage inside genes, right? I mean, you expect variation in coverage among genes.

    Coverage depth inside a gene can result from the library construction methodology. What method did you use?

    --
    Phillip

    Comment


    • #3
      Well the main target is an RNA virus, so we would expect relatively constant coverage across the genome (even if it is being expressed, ot should produce a single full-length message), but also there is some host rRNA in there, where we would expect it to be relatively constant across the gene, all other things being equal.

      Fragmentation was with NaOH, may be some bias there?
      First strand cDNA synthesis with random primers - possible secondary structure effects?
      Second strand synthesis - possible chance to miss 3' ends corresponding to first strands?
      End repair - possible removal of some 3' from first strand?
      Addition of A bases
      Ligatio of adaptors
      PCR


      I supose my question is do other people really observe 'constant' coverage within genes (or at least Poisson-like noise in coverage)?

      Comment


      • #4
        Originally posted by jay View Post
        Well the main target is an RNA virus, so we would expect relatively constant coverage across the genome (even if it is being expressed, ot should produce a single full-length message), but also there is some host rRNA in there, where we would expect it to be relatively constant across the gene, all other things being equal.

        Fragmentation was with NaOH, may be some bias there?
        One thinks of chemical fragmentation being random, but I am unaware of evidence one way or another.

        Originally posted by jay View Post
        First strand cDNA synthesis with random primers - possible secondary structure effects?
        Yes, this seems plausible to me.

        Originally posted by jay View Post
        Second strand synthesis - possible chance to miss 3' ends corresponding to first strands?
        End repair - possible removal of some 3' from first strand?
        Addition of A bases
        Ligatio of adaptors
        PCR


        I supose my question is do other people really observe 'constant' coverage within genes (or at least Poisson-like noise in coverage)?
        The SOLiD results I have seen is not consistent with 'constant' coverage.

        --
        Phillip

        Comment


        • #5
          Do you think it would be fair to say then that RNA-Seq depth of coverage data tends to have large amounts of bias, some of which cannot currently explained, but that can be controlled at the scale of individual genes by averaging - hence the correlations between read counts per kb and qPCR results?

          Comment


          • #6
            Originally posted by jay View Post
            Do you think it would be fair to say then that RNA-Seq depth of coverage data tends to have large amounts of bias, some of which cannot currently explained, but that can be controlled at the scale of individual genes by averaging - hence the correlations between read counts per kb and qPCR results?
            That would be my take. For the SOLiD WT method I can suggest causes for the coverage bias. For example RNAseIII, used to fragment RNA prior to adaptor ligation is a dsRNAse. Ambion says they have conditions modified so it will degrade ssRNA as well. But my guess is that it that RNAseIII will be partial to dsRNA even under those conditions. Hence coverage bias.

            --
            Phillip

            Comment


            • #7
              I concure. I know of 4-5 labs now that have tested RNAseIII vs chemical cleavage in SOLiD WT and the chemical cleavage is far more uniform. Other biases could be the A-tail step (if this is polyA not plusA). Which enzyme are you using for this? If its Ecoli DAPI commonly used for polyA then there are published papers on how this wont polyA hairpinned matrial.

              Even if you nail this there will be sampling randomness which makes the coverage maps appear to be oscillating. If the coverage skews from poisson, you can have error based variance (regions which have higher error dont map with mismatch threshold), uniqueness issues. Amplification steps can also effect this. We try to keep all amplification steps below 8 cycles of PCR to limit this.

              There is also a nice paper from the Rosetta folks showing how heximer amplification can have artifacts and by reducing the hexamer pool to not have rRNA sequences (to about 700 6mers) they get more uniform coverage.

              Comment


              • #8
                Hi,

                I am wondering if there some related literatures available on this issue?
                Or some correction methods discussed in papers?

                Thanks.
                Xi Wang

                Comment


                • #9
                  Hi, new here so please bare with me and any mistakes. Have you checked the article by Mortazavi et all - Mapping and quantifying mammalian transcriptomes by RNA-Seq, Nature Methods 2008 v.5(7): 621. They discuss this issue and the effects of shearing the RNA versus cDNA. Also, the review of Wang and others, in Nature reviews (RNA-Seq: a revolutionary tool for transcriptomics) discusses this aspect. Hope this helps. Will follow the thread for more info, if anyone has more to add.

                  Comment


                  • #10
                    Originally posted by larissa View Post
                    Hi, new here so please bare with me and any mistakes. Have you checked the article by Mortazavi et all - Mapping and quantifying mammalian transcriptomes by RNA-Seq, Nature Methods 2008 v.5(7): 621. They discuss this issue and the effects of shearing the RNA versus cDNA. Also, the review of Wang and others, in Nature reviews (RNA-Seq: a revolutionary tool for transcriptomics) discusses this aspect. Hope this helps. Will follow the thread for more info, if anyone has more to add.
                    Thanks Larissa! I guess I should have previously read that one, but I never did. Here is the direct link to that manuscript:

                    http://www.nature.com/nmeth/journal/...meth.1226.html

                    The Ambion (SOLiD WT) method will have avoided the 5' bias issue that might otherwise result from unfragmented template RNA, but introduced other bias by using an enzymatic fragmentation.

                    Seems like all it would take to use a chemical fragmentation would be that it be followed by T4 PNK treatment to remove 3' phosphates and add 5' phosphates. But that would probably require a buffer change, etc...

                    --
                    Phillip

                    Comment


                    • #11
                      Thanks Larissa for the references, Phillip for the link, I will definitely follow up, and all for ideas and discussion. I feel much better about my bias now

                      I'm planning at the moment on working on trying to see whether coverage correlates with other factors than GC, such as free energy (re secondary structure) and point mutation rate (as it's viral). I'll post here if I get anywhere. Does anyone have a favourite command line tool/website for working out free energy by locus? I've found a couple that will plot it, but not extract the numeric values.
                      Last edited by jay; 03-17-2010, 04:02 PM.

                      Comment


                      • #12
                        Hi Jay, I like this one:

                        It gives a lot of info, more than i really can understand. You can use it for DNA or RNA and play around with conditions such as temperature, ionic conditions... Not fure if this is what you need/want. But hope it helps! Good luck, will keep checking the thread for more news from all of you. Have a great day!
                        Larissa

                        Comment


                        • #13
                          Hi Larissa, thanks for the link I will try that one, although it will only take 6kb and my genomes are 10, but it will be a good start to do them in two halfs.
                          Best wishes, Jay

                          Comment


                          • #14
                            Originally posted by jay View Post
                            Thanks Larissa for the references, Phillip for the link, I will definitely follow up, and all for ideas and discussion. I feel much better about my bias now

                            I'm planning at the moment on working on trying to see whether coverage correlates with other factors than GC, such as free energy (re secondary structure) and point mutation rate (as it's viral). I'll post here if I get anywhere. Does anyone have a favourite command line tool/website for working out free energy by locus? I've found a couple that will plot it, but not extract the numeric values.
                            Hello Jay, I am noticing very high variation in coverage of genes with extremely high expression between my replicate samples. I was wondering if you found any correlation between coverage variation and other factors as you mention above. Also, in your investigation, did you come across any correlation between coverage variation and mRNA expression? Thank you so much.

                            Comment


                            • #15
                              Hi thinkRNA. I found 25% of the variation was accounted for by GC content over 100bp windows, and I failed to account for the rest. I didn't find anything in free energy that I could use. I think PCR bias is a definite source of coverage variation, so I think RNA-Seq protocols generally call for minimising PCR cycles, or not amplifiying if possible, and the more I look at it the more I am convinced that RNA secondary structure, interfering with reverse transcriptase, is another major source.

                              We were looking at RNA genome sequencing, hence expectation of constant coverage along the genome, so we couldn't look into correlation of variation with expression level. If you are doing any amplfication (PCR) before sequencing, I can imagine this would increase variation in line with expression, as well as introducing PCR bias - do you think?

                              I talked this over with a few people since, and uncontrolled bias and variance in coverage seem pretty ubiquitous in RNA-Seq experiments, and the only reason why people have been able to correlate with array experiments or qPCR is by averaging over 1kb.

                              Hope this helps. If anyone else has any updates I'd be interested, and I'm sure others too.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X