Unconfigured Ad

**Richard Finney** · 06-15-2012, 03:08 PM

What sequencing machine are you using?
Are all the "bad" beginning sub-sequences the same length?
Is there only one or two distinct "bad" beginning sub-sequences?

Something like this should tell you:

cat SAMPLENAME.fastq | awk '{ if ((p%4)==1) print $0;p++}' | cut -b1-6 | sort | uniq -c

Sometimes sample reads stil have the primer attached. I've seen it but it's pretty rare in my experience.

**Jannn** · 06-16-2012, 03:51 AM

I am not sequencing at all...
But when we get sequencing results back, the last bases are always bad. And the first ones too.

Co-workers tell me this is normal, but I cant seem to figure out why this is.

**pmiguel** · 06-21-2012, 04:45 AM

This is sequencing instrument dependent. What instrument is being used?

My guess is that you are talking about what is now called "Sanger" or "first generation" sequences. If so, the answer is that it is largely due to limits imposed by electrophoretic "sorting" of reaction products by their lengths utilized by all Sanger sequencers.

For any of this to make sense you need some general idea what "electrophoresis" does and how it does it. In general electrophoresis works by applying an electric field through which charged molecules will move. Resisting this motion is a sieving matrix of some sort. If you work in a lab you may run agarose or polyacrylamide gels. For this type of "gel" electrophoresis the agarose or polyacylamide provide resisting medium through which your molecules of interest (probably RNA, DNA or protein) must migrate. Most Sanger sequencers now employ some sort of pumpable sieving matrix to avoid having to cast slab gels and mount them prior to each run. These matricies tend to have the consistency of liquid detergent but, like agarose and polyacrylamide, nevertheless resist the movement of molecules through them. Further the amount of resistance is roughly proportional to the molecular weight of these molecules traversing the sieving matrix. Hence short DNA reaction products migrate more quickly than long ones. By the time the products reach the detector, they are sorted by size. Obviously charge will also play a role, since more highly charged molecules will migrate more quickly in an electric field. But DNA and RNA polymers have subunits with roughly equivalent charge, so this factors out for the most part.

"End bases" are acquired from the longest sequencing reaction products. They are lower quality because the reaction product peaks are not well resolved and bleed into each other. As to why this would be, consider that resolving a 100 nucleotide fragment from a 101 nucleotide fragment by electrophoresis requires your system to resolve fragments differing in length by 1%. Whereas at 1000 and 1001 nucleotides, you would need to be able to resolve fragments differing by 0.1% -- a task at least 10-fold more difficult.

However this would suggest that your primer+1 peak should be easily resolvable from your primer+2 base peak. (Possibly 20 bases versus 21 bases -- 5% difference in length!) However electrophoresis employs a sieving matrix of some sort, and all sieving matrices will have their resolution optimum in some range. It is easily possible to design a sieving matrix that would resolve well 20 from 21 bases and give you very high quality sequence right at the beginning of the read. However this matrix would be very poor for resolving longer (300+ base) fragments from each other.

So, if you are optimizing for largest numbers of high quality bases, you lose some sequence from the beginning and some from the end.

There are some other complications as well that hinder the obtaining high quality sequence in the first few bases. For example, the most commonly used Sanger sequencing reagent is, I would guess, Applied Biosystems (Life Technologies) "Big Dye". The "Big" implies bulky dyes attached to the terminator bases. Further the molecular weights and even the charges of these dyes may be different for each "color". Hence these "dye" terminators cause shifts in the migration of sequencing reaction products being electrophoresed. AB actually compensates for these shifts, to the extent it can, to produce the "processed" electropherogram you are used to seeing. But at very short fragment lengths, these shifts will be more dramatic because the dye itself composes a greater percentage of the total molecular weight and charge of the reaction product DNA strand.

This is only meant to be a general overview, I am glossing over many details, some of which (eg, the physics of electrophoresis) I have only a passing familiarity with myself. But it should give you a vague sense of what the issues in play are.
--
Phillip

**Dario1984** · 06-21-2012, 04:00 PM

We had a problem where 100% of the first nucleotide was the same as the last base of the multiplexing index. Turned out our provider's machine settings weren't set up right for masking the index.

**Jannn** · 06-27-2012, 01:38 AM

Originally posted by pmiguel View Post

This is sequencing instrument dependent. What instrument is being used?

My guess is that you are talking about what is now called "Sanger" or "first generation" sequences. If so, the answer is that it is largely due to limits imposed by electrophoretic "sorting" of reaction products by their lengths utilized by all Sanger sequencers.

For any of this to make sense you need some general idea what "electrophoresis" does and how it does it. In general electrophoresis works by applying an electric field through which charged molecules will move. Resisting this motion is a sieving matrix of some sort. If you work in a lab you may run agarose or polyacrylamide gels. For this type of "gel" electrophoresis the agarose or polyacylamide provide resisting medium through which your molecules of interest (probably RNA, DNA or protein) must migrate. Most Sanger sequencers now employ some sort of pumpable sieving matrix to avoid having to cast slab gels and mount them prior to each run. These matricies tend to have the consistency of liquid detergent but, like agarose and polyacrylamide, nevertheless resist the movement of molecules through them. Further the amount of resistance is roughly proportional to the molecular weight of these molecules traversing the sieving matrix. Hence short DNA reaction products migrate more quickly than long ones. By the time the products reach the detector, they are sorted by size. Obviously charge will also play a role, since more highly charged molecules will migrate more quickly in an electric field. But DNA and RNA polymers have subunits with roughly equivalent charge, so this factors out for the most part.

"End bases" are acquired from the longest sequencing reaction products. They are lower quality because the reaction product peaks are not well resolved and bleed into each other. As to why this would be, consider that resolving a 100 nucleotide fragment from a 101 nucleotide fragment by electrophoresis requires your system to resolve fragments differing in length by 1%. Whereas at 1000 and 1001 nucleotides, you would need to be able to resolve fragments differing by 0.1% -- a task at least 10-fold more difficult.

However this would suggest that your primer+1 peak should be easily resolvable from your primer+2 base peak. (Possibly 20 bases versus 21 bases -- 5% difference in length!) However electrophoresis employs a sieving matrix of some sort, and all sieving matrices will have their resolution optimum in some range. It is easily possible to design a sieving matrix that would resolve well 20 from 21 bases and give you very high quality sequence right at the beginning of the read. However this matrix would be very poor for resolving longer (300+ base) fragments from each other.

So, if you are optimizing for largest numbers of high quality bases, you lose some sequence from the beginning and some from the end.

There are some other complications as well that hinder the obtaining high quality sequence in the first few bases. For example, the most commonly used Sanger sequencing reagent is, I would guess, Applied Biosystems (Life Technologies) "Big Dye". The "Big" implies bulky dyes attached to the terminator bases. Further the molecular weights and even the charges of these dyes may be different for each "color". Hence these "dye" terminators cause shifts in the migration of sequencing reaction products being electrophoresed. AB actually compensates for these shifts, to the extent it can, to produce the "processed" electropherogram you are used to seeing. But at very short fragment lengths, these shifts will be more dramatic because the dye itself composes a greater percentage of the total molecular weight and charge of the reaction product DNA strand.

This is only meant to be a general overview, I am glossing over many details, some of which (eg, the physics of electrophoresis) I have only a passing familiarity with myself. But it should give you a vague sense of what the issues in play are.
--
Phillip

Ok, thanks a lot, I see what you mean.
I understand it know.

**Jannn** · 06-27-2012, 01:40 AM

Originally posted by Dario1984 View Post

We had a problem where 100% of the first nucleotide was the same as the last base of the multiplexing index. Turned out our provider's machine settings weren't set up right for masking the index.

This is what I find a bit weird: why doesnt the provider alter/changes the settings to make sure you get a good reading for the first/middle and last bases?
I mean: rather then just sequencing it once... sequence it 3 times
I guess this is because of the money it would cost to get that extra correct few (start/end) bases?

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 21 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 40 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 46 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

First and last nucleotides bad in sequencing?

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News