Unconfigured Ad

**westerman** · 11-19-2008, 10:37 AM

Originally posted by foolishbrat View Post

Dear all,

What's the primary difference between Next Gen Sequencing
with SAGE in terms of sequencing error?

In particular the errors affecting the tag counts.

I think that you have rephrase your question to be more specific. I presume you are asking about the difference between doing expressing profiling with Next Gen sequencing versus doing profiling via SAGE with classical Sanger sequencing.

Hands down a single Sanger read will be more accurate than a Next Gen read. That is one answer.

However since Next gen platforms have many more reads per cost than Sanger you can
sequence to a larger depth. If you are willing to throw away any next gen reads that do not have significant depth then your accuracy will go way up. So that may be your answer.

The NextGen technology that you use will greatly influence the answer. I suspect that the SOLiD, despite not being the most accurate platform on a per-read basis, will be a good expression profiling platform simply due to the large quantity of reads at a low cost.

Unfortunately I know of no papers that cover this question. I am not even sure if it is a question that someone would want to go through the effort of answering.

**Josliu** · 11-21-2008, 02:33 PM

There are a few types of errors in the sequence tag counts for Next Gen sequence.
1. The sequence basecall errors are high, .5%-1%. When we count the 17 base tags for long SAGE, we may have up to 17% errors or higher. Since different systems may have different error profiles, we may have difficulty to compare the results from one lab to another taken from different systems.
2. The low abundant gene tags may be affected by the high abundant gene tags with tag sequences differing by 1 bps, since the expression ratio difference may be in seven orders of magnitude.
3. We also have shot noise sqrt(N), N being the number of the tag. This will be problem to low abundant genes.
4. Two or more genes may share the same tag. We have no way to tell how much is from one gene and how much is from the other gene(s).
5. One gene might have two tags because of multiple isoforms. It is challenge to decide how to report them.
6. Many gene tags are short then 17 bps such as 12 bps. We will have high errors to those genes in counting the tags.
7. The errors may also come from the different channel locations in the flowcell.
8. The enzyme efficiency might be dependent on the sequence contents.
You may use NextGENe software to handle such problems. Generally the error will be minimum if the tag reach 500 counts.

josliu

**foolishbrat** · 11-23-2008, 06:21 AM

Originally posted by Josliu View Post

There are a few types of errors in the sequence tag counts for Next Gen sequence.
1. The sequence basecall errors are high, .5%-1%. When we count the 17 base tags for long SAGE, we may have up to 17% errors or higher. Since different systems may have different error profiles, we may have difficulty to compare the results from one lab to another taken from different systems.
2. The low abundant gene tags may be affected by the high abundant gene tags with tag sequences differing by 1 bps, since the expression ratio difference may be in seven orders of magnitude.
3. We also have shot noise sqrt(N), N being the number of the tag. This will be problem to low abundant genes.
4. Two or more genes may share the same tag. We have no way to tell how much is from one gene and how much is from the other gene(s).
5. One gene might have two tags because of multiple isoforms. It is challenge to decide how to report them.
6. Many gene tags are short then 17 bps such as 12 bps. We will have high errors to those genes in counting the tags.
7. The errors may also come from the different channel locations in the flowcell.
8. The enzyme efficiency might be dependent on the sequence contents.
You may use NextGENe software to handle such problems. Generally the error will be minimum if the tag reach 500 counts.

josliu

Thanks so much for the reply. This is truly invaluable.

Do you know any existing program/papers that does correction
on on such tag counts error?

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 33 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 97 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 117 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 111 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

Next Gen versus SAGE sequencing error

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News