Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Agilent SureSelect - coverage of high GC regions

    We have successfully run a targeted enrichment with SureSelect and were able to achieve similar results to the Tewhey et al (2009 Genome Biol) for our own targeted subset. As shown in their paper, we also noticed that regions of high GC content were difficult to capture - we see lower read coverage in these areas. Does anyone have any experience trying to increase the coverage of these more difficult regions? Say, for example, by increasing the number of baits overlapping a high GC region?

    We are wondering if this is a worthwhile approach and if by chance anyone has tried it already with useful results. We have some extra design space on our SureSelect and are considering "piling on" the baits in these regions for a few important genes.

  • #2
    I would be very interested in how you fare with this.

    One possible explanation for dropout of extremes of %GC is not so much the SureSelect hybridization but the various PCR steps. Do you think you could significantly shave the total number of PCR cycles the library is exposed to?

    Comment


    • #3
      Originally posted by NGSfan View Post
      We have successfully run a targeted enrichment with SureSelect and were able to achieve similar results to the Tewhey et al (2009 Genome Biol) for our own targeted subset. As shown in their paper, we also noticed that regions of high GC content were difficult to capture - we see lower read coverage in these areas. Does anyone have any experience trying to increase the coverage of these more difficult regions? Say, for example, by increasing the number of baits overlapping a high GC region?

      We are wondering if this is a worthwhile approach and if by chance anyone has tried it already with useful results. We have some extra design space on our SureSelect and are considering "piling on" the baits in these regions for a few important genes.
      How do you define the sequence coverage? If you take log values of the coverage, what is the results?
      Xi Wang

      Comment


      • #4
        The drop in high-GC content is largely from secondary structure formation. Adding formamide and increasing the temperature might tilt the table toward hybridization with RNA oligo.

        Comment


        • #5
          Originally posted by krobison View Post
          I would be very interested in how you fare with this.

          One possible explanation for dropout of extremes of %GC is not so much the SureSelect hybridization but the various PCR steps. Do you think you could significantly shave the total number of PCR cycles the library is exposed to?

          For the PCR steps, I mostly worry over "PCR duplicates". And if I remember, wouldn't the PCR bias the coverage in favor of high GC?

          Novel sequencing technologies permit the rapid production of large sequence data sets. These technologies are likely to revolutionize genetics and biomedical research, but a thorough characterization of the ultra-short read output is necessary. We generated and analyzed two Illumina 1G ultra-short r …

          Comment


          • #6
            Originally posted by Xi Wang View Post
            How do you define the sequence coverage? If you take log values of the coverage, what is the results?
            Good question - I am just doing this "by eye" so to speak. So for example, the average bp coverage of a target region is 20X and then drops to 2 or 0 in high GC regions.

            Comment


            • #7
              Originally posted by upenn_ngs View Post
              The drop in high-GC content is largely from secondary structure formation. Adding formamide and increasing the temperature might tilt the table toward hybridization with RNA oligo.
              The secondary structure issue would be my guess as well. My only concern with adding formamide and/or increasing the temp is the effect on lower GC targets. I like the idea, but shifting the binding energies might cause as many problems as it solves.

              The idea of adding more baits was to help increase coverage of a subset of targets without affecting the enrichment of other targets.

              Comment


              • #8
                Another factor, many GC rich regions are dropped from both whole genome sequencing as well as the exome capture. This image from the Broad.

                Last edited by upenn_ngs; 02-17-2010, 08:05 AM.

                Comment


                • #9
                  Originally posted by NGSfan View Post
                  Good question - I am just doing this "by eye" so to speak. So for example, the average bp coverage of a target region is 20X and then drops to 2 or 0 in high GC regions.
                  I guess if you take log value of the bp coverage for each region, and then take the average, the phenomenon will be different. I am just wondering the amplification is exponentially increased the DNA fragments.
                  Xi Wang

                  Comment


                  • #10
                    Originally posted by NGSfan View Post
                    We have successfully run a targeted enrichment with SureSelect and were able to achieve similar results to the Tewhey et al (2009 Genome Biol) for our own targeted subset.
                    @ NGSfan : What sequencer did you use? Illumina GA II?
                    I am interested in using Agilent's SureSelect for sequence enrichment to get the targets to sequence with a 454 FLX. Do you think using a long fragmented 454 library with SureSelect can create any problem with the hybridization? Agilent do not provide any ufficial protocol for 454 libraries, but I assume that their long baits could work well with our ~400-500 bp fragments.

                    Comment


                    • #11
                      The main platform-customization of the SureSelect as I understand it is there are blocking oligos to prevent daisy-chaining of products -- without these sometimes a correctly hybridized fragment will hybridize to an off-target fragment via the adapter regions.

                      Comment


                      • #12
                        Originally posted by Xi Wang View Post
                        I guess if you take log value of the bp coverage for each region, and then take the average, the phenomenon will be different. I am just wondering the amplification is exponentially increased the DNA fragments.
                        Yes, that is a good point. Amplification is a concern and will certainly bias things. However, I am not seeing PCR duplicates to be a big issue in my data set.

                        I have talked to some big sequencing centers about the GC issue and they also have encountered it, however their approach is to simply bump up the sequencing - to 70X coverage (we are at 30-40X).

                        I should have mentioned we did single end reads. We should be getting paired end reads soon, and I hope this might help a little, since we'll be able to sequence a GC-rich region which was partially bound at the other end with an average GC content. Maybe?
                        Last edited by NGSfan; 03-11-2010, 04:59 AM.

                        Comment


                        • #13
                          Originally posted by dottomarco View Post
                          @ NGSfan : What sequencer did you use? Illumina GA II?
                          I am interested in using Agilent's SureSelect for sequence enrichment to get the targets to sequence with a 454 FLX. Do you think using a long fragmented 454 library with SureSelect can create any problem with the hybridization? Agilent do not provide any ufficial protocol for 454 libraries, but I assume that their long baits could work well with our ~400-500 bp fragments.
                          Their long baits ~120bp are quite reasonable - but I have no clue on the behavior of hybridization when having longer fragments (400-500). I suspect you might run into self-hybridizing issues more often, but who really knows!

                          We generally followed the Agilent protocol - fragmenting ~200bp . Something to note: if you are after exons, then you don't want too long a fragment because you'll be sequencing at the ends and your aligned reads will be more often "off target" so to speak - in the sense that they will be around the exon, rather than on the exon.

                          Comment


                          • #14
                            Originally posted by NGSfan View Post
                            Yes, that is a good point. Amplification is a concern and will certainly bias things. However, I am not seeing PCR duplicates to be a big issue in my data set.

                            I have talked to some big sequencing centers about the GC issue and they also have encountered it, however their approach is to simply bump up the sequencing - to 70X coverage (we are at 30-40X).

                            I should have mentioned we did single end reads. We should be getting paired end reads soon, and I hope this might help a little, since we'll be able to sequence a GC-rich region which was partially bound at the other end with an average GC content. Maybe?
                            Oh. But if you have the data, you can try what just I mentioned.

                            And for PE reads, I don't think it can improve a lot. Because it is the DNA fragments that amplified. So the coverage should have some relationship with the GC-content of the DNA fragments. On the other hand, the read GC-content and the DNA fragment GC-content have a high correlation. As a result, the relationship between the read GC-content and the coverage reflects a lot the reality.
                            Xi Wang

                            Comment


                            • #15
                              Another point which I did not notice here is, # of reads actually sequenced to get 30x exome coverage for the agilent capture stuff.

                              We notice that only 20% of reads map on-target! Is that a common thing? (Illumina 75bp PE)
                              --
                              bioinfosm

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Non-Coding RNA Research and Technologies
                                by seqadmin




                                Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                Nobel Prize for MicroRNA Discovery
                                This week,...
                                10-07-2024, 08:07 AM
                              • seqadmin
                                Recent Developments in Metagenomics
                                by seqadmin





                                Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                                09-23-2024, 06:35 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 10-11-2024, 06:55 AM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-02-2024, 04:51 AM
                              0 responses
                              110 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-01-2024, 07:10 AM
                              0 responses
                              115 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 09-30-2024, 08:33 AM
                              1 response
                              121 views
                              0 likes
                              Last Post EmiTom
                              by EmiTom
                               
                              Working...
                              X