Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Depth of coverage variation in RNA Seq

    We are doing RNA Seq on Illumina GAII, and are seeing a lot of variation in coverage, which doesn't look as though it's Poisson distributed. Our sample originated from an RNA virus, but also contains some host rRNA, and both are showing this variation in coverage.

    There is some correlation of coverage depth with GC content, but it doesn't explain most of the variation. Do others see this kind of variability of coverage?

    We have hypothesised that secondary structure and amplification artefacts would explain some of the variation, but has anyone seen any research explaining variation in depth of coverage of RNA-Seq?
    I've looked but haven't found anything remotely conclusive.

  • #2
    Originally posted by jay View Post
    We are doing RNA Seq on Illumina GAII, and are seeing a lot of variation in coverage, which doesn't look as though it's Poisson distributed. Our sample originated from an RNA virus, but also contains some host rRNA, and both are showing this variation in coverage.

    There is some correlation of coverage depth with GC content, but it doesn't explain most of the variation. Do others see this kind of variability of coverage?

    We have hypothesised that secondary structure and amplification artefacts would explain some of the variation, but has anyone seen any research explaining variation in depth of coverage of RNA-Seq?
    I've looked but haven't found anything remotely conclusive.
    Variation in depth of coverage inside genes, right? I mean, you expect variation in coverage among genes.

    Coverage depth inside a gene can result from the library construction methodology. What method did you use?

    --
    Phillip

    Comment


    • #3
      Well the main target is an RNA virus, so we would expect relatively constant coverage across the genome (even if it is being expressed, ot should produce a single full-length message), but also there is some host rRNA in there, where we would expect it to be relatively constant across the gene, all other things being equal.

      Fragmentation was with NaOH, may be some bias there?
      First strand cDNA synthesis with random primers - possible secondary structure effects?
      Second strand synthesis - possible chance to miss 3' ends corresponding to first strands?
      End repair - possible removal of some 3' from first strand?
      Addition of A bases
      Ligatio of adaptors
      PCR


      I supose my question is do other people really observe 'constant' coverage within genes (or at least Poisson-like noise in coverage)?

      Comment


      • #4
        Originally posted by jay View Post
        Well the main target is an RNA virus, so we would expect relatively constant coverage across the genome (even if it is being expressed, ot should produce a single full-length message), but also there is some host rRNA in there, where we would expect it to be relatively constant across the gene, all other things being equal.

        Fragmentation was with NaOH, may be some bias there?
        One thinks of chemical fragmentation being random, but I am unaware of evidence one way or another.

        Originally posted by jay View Post
        First strand cDNA synthesis with random primers - possible secondary structure effects?
        Yes, this seems plausible to me.

        Originally posted by jay View Post
        Second strand synthesis - possible chance to miss 3' ends corresponding to first strands?
        End repair - possible removal of some 3' from first strand?
        Addition of A bases
        Ligatio of adaptors
        PCR


        I supose my question is do other people really observe 'constant' coverage within genes (or at least Poisson-like noise in coverage)?
        The SOLiD results I have seen is not consistent with 'constant' coverage.

        --
        Phillip

        Comment


        • #5
          Do you think it would be fair to say then that RNA-Seq depth of coverage data tends to have large amounts of bias, some of which cannot currently explained, but that can be controlled at the scale of individual genes by averaging - hence the correlations between read counts per kb and qPCR results?

          Comment


          • #6
            Originally posted by jay View Post
            Do you think it would be fair to say then that RNA-Seq depth of coverage data tends to have large amounts of bias, some of which cannot currently explained, but that can be controlled at the scale of individual genes by averaging - hence the correlations between read counts per kb and qPCR results?
            That would be my take. For the SOLiD WT method I can suggest causes for the coverage bias. For example RNAseIII, used to fragment RNA prior to adaptor ligation is a dsRNAse. Ambion says they have conditions modified so it will degrade ssRNA as well. But my guess is that it that RNAseIII will be partial to dsRNA even under those conditions. Hence coverage bias.

            --
            Phillip

            Comment


            • #7
              I concure. I know of 4-5 labs now that have tested RNAseIII vs chemical cleavage in SOLiD WT and the chemical cleavage is far more uniform. Other biases could be the A-tail step (if this is polyA not plusA). Which enzyme are you using for this? If its Ecoli DAPI commonly used for polyA then there are published papers on how this wont polyA hairpinned matrial.

              Even if you nail this there will be sampling randomness which makes the coverage maps appear to be oscillating. If the coverage skews from poisson, you can have error based variance (regions which have higher error dont map with mismatch threshold), uniqueness issues. Amplification steps can also effect this. We try to keep all amplification steps below 8 cycles of PCR to limit this.

              There is also a nice paper from the Rosetta folks showing how heximer amplification can have artifacts and by reducing the hexamer pool to not have rRNA sequences (to about 700 6mers) they get more uniform coverage.

              Comment


              • #8
                Hi,

                I am wondering if there some related literatures available on this issue?
                Or some correction methods discussed in papers?

                Thanks.
                Xi Wang

                Comment


                • #9
                  Hi, new here so please bare with me and any mistakes. Have you checked the article by Mortazavi et all - Mapping and quantifying mammalian transcriptomes by RNA-Seq, Nature Methods 2008 v.5(7): 621. They discuss this issue and the effects of shearing the RNA versus cDNA. Also, the review of Wang and others, in Nature reviews (RNA-Seq: a revolutionary tool for transcriptomics) discusses this aspect. Hope this helps. Will follow the thread for more info, if anyone has more to add.

                  Comment


                  • #10
                    Originally posted by larissa View Post
                    Hi, new here so please bare with me and any mistakes. Have you checked the article by Mortazavi et all - Mapping and quantifying mammalian transcriptomes by RNA-Seq, Nature Methods 2008 v.5(7): 621. They discuss this issue and the effects of shearing the RNA versus cDNA. Also, the review of Wang and others, in Nature reviews (RNA-Seq: a revolutionary tool for transcriptomics) discusses this aspect. Hope this helps. Will follow the thread for more info, if anyone has more to add.
                    Thanks Larissa! I guess I should have previously read that one, but I never did. Here is the direct link to that manuscript:

                    http://www.nature.com/nmeth/journal/...meth.1226.html

                    The Ambion (SOLiD WT) method will have avoided the 5' bias issue that might otherwise result from unfragmented template RNA, but introduced other bias by using an enzymatic fragmentation.

                    Seems like all it would take to use a chemical fragmentation would be that it be followed by T4 PNK treatment to remove 3' phosphates and add 5' phosphates. But that would probably require a buffer change, etc...

                    --
                    Phillip

                    Comment


                    • #11
                      Thanks Larissa for the references, Phillip for the link, I will definitely follow up, and all for ideas and discussion. I feel much better about my bias now

                      I'm planning at the moment on working on trying to see whether coverage correlates with other factors than GC, such as free energy (re secondary structure) and point mutation rate (as it's viral). I'll post here if I get anywhere. Does anyone have a favourite command line tool/website for working out free energy by locus? I've found a couple that will plot it, but not extract the numeric values.
                      Last edited by jay; 03-17-2010, 04:02 PM.

                      Comment


                      • #12
                        Hi Jay, I like this one:

                        It gives a lot of info, more than i really can understand. You can use it for DNA or RNA and play around with conditions such as temperature, ionic conditions... Not fure if this is what you need/want. But hope it helps! Good luck, will keep checking the thread for more news from all of you. Have a great day!
                        Larissa

                        Comment


                        • #13
                          Hi Larissa, thanks for the link I will try that one, although it will only take 6kb and my genomes are 10, but it will be a good start to do them in two halfs.
                          Best wishes, Jay

                          Comment


                          • #14
                            Originally posted by jay View Post
                            Thanks Larissa for the references, Phillip for the link, I will definitely follow up, and all for ideas and discussion. I feel much better about my bias now

                            I'm planning at the moment on working on trying to see whether coverage correlates with other factors than GC, such as free energy (re secondary structure) and point mutation rate (as it's viral). I'll post here if I get anywhere. Does anyone have a favourite command line tool/website for working out free energy by locus? I've found a couple that will plot it, but not extract the numeric values.
                            Hello Jay, I am noticing very high variation in coverage of genes with extremely high expression between my replicate samples. I was wondering if you found any correlation between coverage variation and other factors as you mention above. Also, in your investigation, did you come across any correlation between coverage variation and mRNA expression? Thank you so much.

                            Comment


                            • #15
                              Hi thinkRNA. I found 25% of the variation was accounted for by GC content over 100bp windows, and I failed to account for the rest. I didn't find anything in free energy that I could use. I think PCR bias is a definite source of coverage variation, so I think RNA-Seq protocols generally call for minimising PCR cycles, or not amplifiying if possible, and the more I look at it the more I am convinced that RNA secondary structure, interfering with reverse transcriptase, is another major source.

                              We were looking at RNA genome sequencing, hence expectation of constant coverage along the genome, so we couldn't look into correlation of variation with expression level. If you are doing any amplfication (PCR) before sequencing, I can imagine this would increase variation in line with expression, as well as introducing PCR bias - do you think?

                              I talked this over with a few people since, and uncontrolled bias and variance in coverage seem pretty ubiquitous in RNA-Seq experiments, and the only reason why people have been able to correlate with array experiments or qPCR is by averaging over 1kb.

                              Hope this helps. If anyone else has any updates I'd be interested, and I'm sure others too.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Latest Developments in Precision Medicine
                                by seqadmin



                                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                                Somatic Genomics
                                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                                05-24-2024, 01:16 PM
                              • seqadmin
                                Recent Advances in Sequencing Analysis Tools
                                by seqadmin


                                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                                05-06-2024, 07:48 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 05-24-2024, 07:15 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-23-2024, 10:28 AM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-23-2024, 07:35 AM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-22-2024, 02:06 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X