Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • rrr
    Junior Member
    • Aug 2011
    • 5

    sequencing a sample containing ONLY sequence repeats

    What's the best NGS way to go (if even possible) to produce a precise sequencing report from a sample containing ONLY sequence repeats; various lengths of 3-nucl repeats (e.g. 90, 150, 270 bases etc) so all lengths are in multiples of 3. The goal is to get the distribution of each length in the assay. Is this a case for techs like Hellicos and PacBio, which use single molecule sequencing ? Or can Illumina tech help also?
    Thanks.
  • krobison
    Senior Member
    • Nov 2007
    • 734

    #2
    Helicos won't help; it has very short read lengths.

    I think PacBio is your best bet. It wouldn't work well for single nucleotide repeats, but if you have di/tri/tetra or higher I'm guessing it would work. 454 would be a candidate as well, though you would need to pay attention to any issues in the PCR step.

    Comment

    • GenoMax
      Senior Member
      • Feb 2008
      • 7142

      #3
      If you are going to sequence "de novo" you will need to keep in mind PacBio's high error rate (10-15%).

      If you have reference sequence available then illumina would likely work as well.
      Last edited by GenoMax; 08-31-2011, 11:18 AM. Reason: add info

      Comment

      • rrr
        Junior Member
        • Aug 2011
        • 5

        #4
        Yeah, PacBio's high error rate + being based on assembling sub reads of the single molecule, makes it unfitting for this specific scenario because of the repeats. Regarding the 454, as krobison suggested, the PCR step can ruin the sample, cause the whole point is to have an accurate assessment of the distribution of each of the lengths.
        In a way, I'm starting to rethink the Illumnina option. Using a GAII Single reads I could get reliable reads upto 150 bases; maybe that's a partial solution.

        Comment

        • HESmith
          Senior Member
          • Oct 2009
          • 512

          #5
          If you're only interested in the repeat number for a relatively small number of loci and samples, there's an old-fashioned technique called 'Sanger sequencing' :-).

          Seriously, unless the copy length is longer than the read length, Illumina sequencing should work. A couple of PCR-free protocols have been published by (IIRC) Wash U. and the Broad Institute, so you can avoid amplification problems.

          Comment

          • krobison
            Senior Member
            • Nov 2007
            • 734

            #6
            I still think PacBio would be an interesting experiment, though for any of these you'll want to have some reference sequences of known length & see how accurately you can measure them. If PacBio mostly drops out single nucleotides, then in a di or trinucleotide repeat array you could detect those & infer them -- if you know the array should be pure AT repeats and you see ATATATTATATAT, then it is reasonable to infer that one A was dropped. At some frequency in my scenario above the real answer will be a spurious T insertion, so my solution would cause a miscount by one repeat. For tri and higher, that shouldn't be a problem -- the probability of the correct two nucleotides being falsely called in a row is low.

            It is also a question of what precision do you require? If being off by one repeat unit isn't a problem then this approach would definitely work.

            Comment

            • rrr
              Junior Member
              • Aug 2011
              • 5

              #7
              Added info

              Added info:
              we can include some of the non-CAG flanking regions in both sides of the DNA segments.
              I'm not sure this can expand the solution spectrum. Maybe now GA2 PE can be more relevant ... and PacBio also. Because now we have flanking regions. but we still have the main CAG-repeats area which is the one we want to reliably account for.
              ------
              As for the Q if a +/- one triplet resolution acceptable. I am not sure. That would mean for ex that a few 270b segments are counted as 300b segments or v-versa. Maybe it's fine, but I'm not sure. I'd have to check.

              Comment

              • HESmith
                Senior Member
                • Oct 2009
                • 512

                #8
                For accurate repeat counts, the read has to span the entire repeated sequence AND include the non-repeat flanking sequences on both sides. Paired-end reads won't help unless you know your insert size to base-pair resolution. Here's an example for clarity (assuming Illumina PE-151bp sequencing):

                Sample: 5'-10bp unique - 70 copies CAG - 10bp unique-3'

                Read 1 sequence: 5'-10bp unique-47 copies CAG
                Read 2 sequence: 3'-10bp unique-47 copies CAG

                Any CAG can align with any other, so the only information from this sequence is that you have at least 47 copies.

                Comment

                • rrr
                  Junior Member
                  • Aug 2011
                  • 5

                  #9
                  Yes indeed. as you said, it has to reliably span the entire sequence.
                  So, now I'm back to PacBio vs GA2 Single (but that's limited to 150b).
                  If I can get PacBio to run with very low error rate, maybe I'll get a reliable count.

                  Comment

                  • ELoomis
                    Junior Member
                    • Sep 2011
                    • 9

                    #10
                    PacBio is the way to go for trinculeotide repeats. As mentioned already, other methodologies will be limited by short readlength even if they can get past problems resulting from PCR amplifying your repeats.
                    The high single-read error rate is a genuine concern, but with your insert size you should be able to easily get multiple coverage with much higher quality in the single molecule consensus.
                    What is your goal? ie. how many high quality reads of your repeats would you consider a success?

                    Comment

                    • rrr
                      Junior Member
                      • Aug 2011
                      • 5

                      #11
                      Hi Loomis,
                      thanks for the reply. the goal is to get an exact measure of the proportion of each sequence length in the sample, which contains these tri-repeats segments ranging 200-500 bases, including non-repeats flanking regions.

                      Comment

                      • ELoomis
                        Junior Member
                        • Sep 2011
                        • 9

                        #12
                        I see. PacBio might not be able to answer the proportion question. The RS currently uses a passive loading approach for the SMRTcell, so smaller molecules have a competitive advantage.
                        In other words, you would have no problem sequencing 500 bp of repeats, but if you put in a sample containing 200-500 bp inserts, your distribution of reads would be skewed towards the smaller insert sizes.
                        Unfortunately I don't see a way to use current NGS platforms to answer your question...
                        Have you tried something like capillary electrophoresis, or PAGE? That could give you an accurate distribution in bp and then you could use PacBio sequencing to verify the exact repeat length of the major bands or sizing standards...

                        Comment

                        Latest Articles

                        Collapse

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by SEQadmin2, 06-09-2026, 11:58 AM
                        0 responses
                        25 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-05-2026, 10:09 AM
                        0 responses
                        31 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-04-2026, 08:59 AM
                        0 responses
                        39 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-02-2026, 12:03 PM
                        0 responses
                        62 views
                        0 reactions
                        Last Post SEQadmin2  
                        Working...