Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Jannn
    Junior Member
    • Jun 2012
    • 4

    First and last nucleotides bad in sequencing?

    Dear all,
    I am pretty new to sequencing and have a newby question:

    How come the last nucleotides and the first ones are always bad?

    I can understand that the last ones are bad because of polymerase not working ok anymore and perhaps other inhibition factors.
    But how come that also the first few nucleotides are also bad?
  • Richard Finney
    Senior Member
    • Feb 2009
    • 701

    #2
    What sequencing machine are you using?
    Are all the "bad" beginning sub-sequences the same length?
    Is there only one or two distinct "bad" beginning sub-sequences?

    Something like this should tell you:

    cat SAMPLENAME.fastq | awk '{ if ((p%4)==1) print $0;p++}' | cut -b1-6 | sort | uniq -c

    Sometimes sample reads stil have the primer attached. I've seen it but it's pretty rare in my experience.

    Comment

    • Jannn
      Junior Member
      • Jun 2012
      • 4

      #3
      I am not sequencing at all...
      But when we get sequencing results back, the last bases are always bad. And the first ones too.

      Co-workers tell me this is normal, but I cant seem to figure out why this is.

      Comment

      • pmiguel
        Senior Member
        • Aug 2008
        • 2328

        #4
        This is sequencing instrument dependent. What instrument is being used?

        My guess is that you are talking about what is now called "Sanger" or "first generation" sequences. If so, the answer is that it is largely due to limits imposed by electrophoretic "sorting" of reaction products by their lengths utilized by all Sanger sequencers.

        For any of this to make sense you need some general idea what "electrophoresis" does and how it does it. In general electrophoresis works by applying an electric field through which charged molecules will move. Resisting this motion is a sieving matrix of some sort. If you work in a lab you may run agarose or polyacrylamide gels. For this type of "gel" electrophoresis the agarose or polyacylamide provide resisting medium through which your molecules of interest (probably RNA, DNA or protein) must migrate. Most Sanger sequencers now employ some sort of pumpable sieving matrix to avoid having to cast slab gels and mount them prior to each run. These matricies tend to have the consistency of liquid detergent but, like agarose and polyacrylamide, nevertheless resist the movement of molecules through them. Further the amount of resistance is roughly proportional to the molecular weight of these molecules traversing the sieving matrix. Hence short DNA reaction products migrate more quickly than long ones. By the time the products reach the detector, they are sorted by size. Obviously charge will also play a role, since more highly charged molecules will migrate more quickly in an electric field. But DNA and RNA polymers have subunits with roughly equivalent charge, so this factors out for the most part.

        "End bases" are acquired from the longest sequencing reaction products. They are lower quality because the reaction product peaks are not well resolved and bleed into each other. As to why this would be, consider that resolving a 100 nucleotide fragment from a 101 nucleotide fragment by electrophoresis requires your system to resolve fragments differing in length by 1%. Whereas at 1000 and 1001 nucleotides, you would need to be able to resolve fragments differing by 0.1% -- a task at least 10-fold more difficult.

        However this would suggest that your primer+1 peak should be easily resolvable from your primer+2 base peak. (Possibly 20 bases versus 21 bases -- 5% difference in length!) However electrophoresis employs a sieving matrix of some sort, and all sieving matrices will have their resolution optimum in some range. It is easily possible to design a sieving matrix that would resolve well 20 from 21 bases and give you very high quality sequence right at the beginning of the read. However this matrix would be very poor for resolving longer (300+ base) fragments from each other.

        So, if you are optimizing for largest numbers of high quality bases, you lose some sequence from the beginning and some from the end.

        There are some other complications as well that hinder the obtaining high quality sequence in the first few bases. For example, the most commonly used Sanger sequencing reagent is, I would guess, Applied Biosystems (Life Technologies) "Big Dye". The "Big" implies bulky dyes attached to the terminator bases. Further the molecular weights and even the charges of these dyes may be different for each "color". Hence these "dye" terminators cause shifts in the migration of sequencing reaction products being electrophoresed. AB actually compensates for these shifts, to the extent it can, to produce the "processed" electropherogram you are used to seeing. But at very short fragment lengths, these shifts will be more dramatic because the dye itself composes a greater percentage of the total molecular weight and charge of the reaction product DNA strand.

        This is only meant to be a general overview, I am glossing over many details, some of which (eg, the physics of electrophoresis) I have only a passing familiarity with myself. But it should give you a vague sense of what the issues in play are.
        --
        Phillip

        Comment

        • Dario1984
          Senior Member
          • Jun 2011
          • 166

          #5
          We had a problem where 100% of the first nucleotide was the same as the last base of the multiplexing index. Turned out our provider's machine settings weren't set up right for masking the index.

          Comment

          • Jannn
            Junior Member
            • Jun 2012
            • 4

            #6
            Originally posted by pmiguel View Post
            This is sequencing instrument dependent. What instrument is being used?

            My guess is that you are talking about what is now called "Sanger" or "first generation" sequences. If so, the answer is that it is largely due to limits imposed by electrophoretic "sorting" of reaction products by their lengths utilized by all Sanger sequencers.

            For any of this to make sense you need some general idea what "electrophoresis" does and how it does it. In general electrophoresis works by applying an electric field through which charged molecules will move. Resisting this motion is a sieving matrix of some sort. If you work in a lab you may run agarose or polyacrylamide gels. For this type of "gel" electrophoresis the agarose or polyacylamide provide resisting medium through which your molecules of interest (probably RNA, DNA or protein) must migrate. Most Sanger sequencers now employ some sort of pumpable sieving matrix to avoid having to cast slab gels and mount them prior to each run. These matricies tend to have the consistency of liquid detergent but, like agarose and polyacrylamide, nevertheless resist the movement of molecules through them. Further the amount of resistance is roughly proportional to the molecular weight of these molecules traversing the sieving matrix. Hence short DNA reaction products migrate more quickly than long ones. By the time the products reach the detector, they are sorted by size. Obviously charge will also play a role, since more highly charged molecules will migrate more quickly in an electric field. But DNA and RNA polymers have subunits with roughly equivalent charge, so this factors out for the most part.

            "End bases" are acquired from the longest sequencing reaction products. They are lower quality because the reaction product peaks are not well resolved and bleed into each other. As to why this would be, consider that resolving a 100 nucleotide fragment from a 101 nucleotide fragment by electrophoresis requires your system to resolve fragments differing in length by 1%. Whereas at 1000 and 1001 nucleotides, you would need to be able to resolve fragments differing by 0.1% -- a task at least 10-fold more difficult.

            However this would suggest that your primer+1 peak should be easily resolvable from your primer+2 base peak. (Possibly 20 bases versus 21 bases -- 5% difference in length!) However electrophoresis employs a sieving matrix of some sort, and all sieving matrices will have their resolution optimum in some range. It is easily possible to design a sieving matrix that would resolve well 20 from 21 bases and give you very high quality sequence right at the beginning of the read. However this matrix would be very poor for resolving longer (300+ base) fragments from each other.

            So, if you are optimizing for largest numbers of high quality bases, you lose some sequence from the beginning and some from the end.

            There are some other complications as well that hinder the obtaining high quality sequence in the first few bases. For example, the most commonly used Sanger sequencing reagent is, I would guess, Applied Biosystems (Life Technologies) "Big Dye". The "Big" implies bulky dyes attached to the terminator bases. Further the molecular weights and even the charges of these dyes may be different for each "color". Hence these "dye" terminators cause shifts in the migration of sequencing reaction products being electrophoresed. AB actually compensates for these shifts, to the extent it can, to produce the "processed" electropherogram you are used to seeing. But at very short fragment lengths, these shifts will be more dramatic because the dye itself composes a greater percentage of the total molecular weight and charge of the reaction product DNA strand.

            This is only meant to be a general overview, I am glossing over many details, some of which (eg, the physics of electrophoresis) I have only a passing familiarity with myself. But it should give you a vague sense of what the issues in play are.
            --
            Phillip

            Ok, thanks a lot, I see what you mean.
            I understand it know.

            Comment

            • Jannn
              Junior Member
              • Jun 2012
              • 4

              #7
              Originally posted by Dario1984 View Post
              We had a problem where 100% of the first nucleotide was the same as the last base of the multiplexing index. Turned out our provider's machine settings weren't set up right for masking the index.
              This is what I find a bit weird: why doesnt the provider alter/changes the settings to make sure you get a good reading for the first/middle and last bases?
              I mean: rather then just sequencing it once... sequence it 3 times
              I guess this is because of the money it would cost to get that extra correct few (start/end) bases?

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM
              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM
              • SEQadmin2
                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                by SEQadmin2


                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                Introduction

                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                05-22-2026, 06:42 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              21 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              40 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              46 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              49 views
              0 reactions
              Last Post SEQadmin2  
              Working...