Originally posted by bioinfosm
View Post
Unconfigured Ad
Collapse
X
-
It's not easy to compare since throughput changes so fast on both instruments - for example the latest Genome Biology RNA-seq paper used 38 lanes to get 138 M aligned reads which is a number you can get from one SOLiD slide (1/2 run) today. What the current numbers are for the GA-II I do not know. What sort of apples are you interrested in comparing?...
-
-
I am interested in the quality of data. Using say 6million 35bp reads on the same sample, which instrument should one prefer, say for SNP calling. From a celegans comparison paper, it looks SOLiD has a slight advantage in calling rare SNP? Does its 2-base encoding really give more accurate results?--
bioinfosm
Comment
-
-
From a project that I have been working on this week since the data come off the sequencer Monday evening. This is one run. Mate-paired 25-base to a non-human eukaryotic organism. One region/plate.Originally posted by new300 View PostHow many raw and aligned reads per run do you get out of your Solid?
Raw reads: ~142M
Mapped R3 reads: ~114M for unique & random at 3 mismatches
Mapped F3 reads: ~118M (ditto)
Mapped R3 reads: ~77M for uniquely placed reads at 3 mismatches
Mapped F3 reads: ~75M (ditto)
Paired F3-R3 reads: ~78M
So Approximately 3900 Mbases. (78M times 50 bases).
SNP analysis is currently in progress on the paired reads. From my work with the mapped but not-paired reads we should obtain quite a few SNPs.
Comment
-
-
In theory color-space should give more accurate results for SNP calling. The concept is that it takes two adjacent color space mismatch to indicate a SNP. If you see a single color-space mismatch then you can flag read that as a sequencer error. Compare this to traditional base-space where, when you see a single mismatch, you have no idea if this arises from a sequencer error or a SNP. Depth of coverage can take help resolve the problem but there are limits to that especially for rare SNPs.Originally posted by bioinfosm View PostI am interested in the quality of data. Using say 6million 35bp reads on the same sample, which instrument should one prefer, say for SNP calling. From a celegans comparison paper, it looks SOLiD has a slight advantage in calling rare SNP? Does its 2-base encoding really give more accurate results?
In practice the rate of sequencer error could play a major role. Obviously if there is too much sequencer error then too much data will be thrown away and nothing will be found. The SOLiD's error rate may be higher than the Solexa's. I do not have firm numbers on this, however.
Let's do a couple of thought experiments. Say that there is a common SNP that occurs in 50% of the population. Furthermore say that the SOLiD has a 0.5% error rate per base while the Solexa is 1/5 that - 0.1% per base [note that I am just making up those numbers -- the actual rates are probably much different]. If we pool 100 individuals together in a run of 25 mers then -- very roughly since I am doing simple probability here --
The SOLiD run will -- for sequencer errors -- generate 12 - 13 runs with a single mismatch and 0 - 1 runs with adjacent mismatches.
Co-mingled with the above will be 50 runs with 2 adjacent mismatches that represent the SNPs.
So overall there will be about:
44 runs without mismatches -- the non-SNPs
44 runs with adjacent mismatches - the SNPs plus *maybe* 1 error run
12 runs with non-adjacent mismatch(es) -- errors for both non-SNPs and SNPs
When we look at the data we would toss out the non-adjacent mismatch reads as errors. We would then pick up 44 adjacent mismatch runs representing the same SNP and maybe 1 run representing a different (and erroneous) SNP.
For the Solexa there would be:
52 runs with a mismatch(es) -- 50 real SNPs and 2 or maybe 3 runs with errors.
48 runs without mismatches.
Once again it is easy to pick up the true SNP since 50 of the runs all have a mismatch in the same location and the 2 or 3 runs that indicate SNPs are simply errors and could be tossed.
Now ... for the rare variant that occurs in 2% of the population.
The SOLiD has
84 runs with no mismatches
12 runs with non-adjacent mismatch(es)
2 runs with adjacent mismatches and *maybe* 1 adjacent mismatch error run
Those two adjacent mismatches are the real SNP. The errors are simply tossed.
The Solexa has
96 runs with no mismatches
4 (maybe 5) runs with mismatches.
2 of the adjacent mismatches are the real SNP while 2 or 3 are errors.
In neither case does the platform pick up the real SNP unambiguously -- it is hard to do when sequencers generate errors -- but the SOLiD (and color space) does work, in theory, better with the rare variants. It works even better if we assume that the sequencer error is the same as the Solexa's.
Next up: color space and indels. Once my head stops hurting.
Comment
-
-
So, I can't really see the throughput advantage of the Solid there. GA1 runs I've seen are around 4Gb. If you look at the short read archive GA2 runs are around 7Gb+ with 35bp reads. For PhiX around 95% of Illumina reads align within 2 errors. For human I think you tend to see about 80%. Those are 35bp reads I believe. There are 50bp reads in the SRA which appear to go up to 14Gb.Originally posted by westerman View PostSo Approximately 3900 Mbases. (78M times 50 bases).
Comment
-
Latest Articles
Collapse
-
by SEQadmin2
Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.
The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
...-
Channel: Articles
06-02-2026, 10:05 AM -
-
by SEQadmin2
With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.
Introduction
Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...-
Channel: Articles
05-22-2026, 06:42 AM -
ad_right_rmr
Collapse
News
Collapse
| Topics | Statistics | Last Post | ||
|---|---|---|---|---|
|
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism
by SEQadmin2
Started by SEQadmin2, 06-09-2026, 11:58 AM
|
0 responses
30 views
0 reactions
|
Last Post
by SEQadmin2
06-09-2026, 11:58 AM
|
||
|
Started by SEQadmin2, 06-05-2026, 10:09 AM
|
0 responses
37 views
0 reactions
|
Last Post
by SEQadmin2
06-05-2026, 10:09 AM
|
||
|
Started by SEQadmin2, 06-04-2026, 08:59 AM
|
0 responses
42 views
0 reactions
|
Last Post
by SEQadmin2
06-04-2026, 08:59 AM
|
||
|
Started by SEQadmin2, 06-02-2026, 12:03 PM
|
0 responses
64 views
0 reactions
|
Last Post
by SEQadmin2
06-02-2026, 12:03 PM
|
Comment