Originally posted by bioinfosm
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
-
I am interested in the quality of data. Using say 6million 35bp reads on the same sample, which instrument should one prefer, say for SNP calling. From a celegans comparison paper, it looks SOLiD has a slight advantage in calling rare SNP? Does its 2-base encoding really give more accurate results?--
bioinfosm
Comment
-
Originally posted by new300 View PostHow many raw and aligned reads per run do you get out of your Solid?
Raw reads: ~142M
Mapped R3 reads: ~114M for unique & random at 3 mismatches
Mapped F3 reads: ~118M (ditto)
Mapped R3 reads: ~77M for uniquely placed reads at 3 mismatches
Mapped F3 reads: ~75M (ditto)
Paired F3-R3 reads: ~78M
So Approximately 3900 Mbases. (78M times 50 bases).
SNP analysis is currently in progress on the paired reads. From my work with the mapped but not-paired reads we should obtain quite a few SNPs.
Comment
-
Originally posted by bioinfosm View PostI am interested in the quality of data. Using say 6million 35bp reads on the same sample, which instrument should one prefer, say for SNP calling. From a celegans comparison paper, it looks SOLiD has a slight advantage in calling rare SNP? Does its 2-base encoding really give more accurate results?
In practice the rate of sequencer error could play a major role. Obviously if there is too much sequencer error then too much data will be thrown away and nothing will be found. The SOLiD's error rate may be higher than the Solexa's. I do not have firm numbers on this, however.
Let's do a couple of thought experiments. Say that there is a common SNP that occurs in 50% of the population. Furthermore say that the SOLiD has a 0.5% error rate per base while the Solexa is 1/5 that - 0.1% per base [note that I am just making up those numbers -- the actual rates are probably much different]. If we pool 100 individuals together in a run of 25 mers then -- very roughly since I am doing simple probability here --
The SOLiD run will -- for sequencer errors -- generate 12 - 13 runs with a single mismatch and 0 - 1 runs with adjacent mismatches.
Co-mingled with the above will be 50 runs with 2 adjacent mismatches that represent the SNPs.
So overall there will be about:
44 runs without mismatches -- the non-SNPs
44 runs with adjacent mismatches - the SNPs plus *maybe* 1 error run
12 runs with non-adjacent mismatch(es) -- errors for both non-SNPs and SNPs
When we look at the data we would toss out the non-adjacent mismatch reads as errors. We would then pick up 44 adjacent mismatch runs representing the same SNP and maybe 1 run representing a different (and erroneous) SNP.
For the Solexa there would be:
52 runs with a mismatch(es) -- 50 real SNPs and 2 or maybe 3 runs with errors.
48 runs without mismatches.
Once again it is easy to pick up the true SNP since 50 of the runs all have a mismatch in the same location and the 2 or 3 runs that indicate SNPs are simply errors and could be tossed.
Now ... for the rare variant that occurs in 2% of the population.
The SOLiD has
84 runs with no mismatches
12 runs with non-adjacent mismatch(es)
2 runs with adjacent mismatches and *maybe* 1 adjacent mismatch error run
Those two adjacent mismatches are the real SNP. The errors are simply tossed.
The Solexa has
96 runs with no mismatches
4 (maybe 5) runs with mismatches.
2 of the adjacent mismatches are the real SNP while 2 or 3 are errors.
In neither case does the platform pick up the real SNP unambiguously -- it is hard to do when sequencers generate errors -- but the SOLiD (and color space) does work, in theory, better with the rare variants. It works even better if we assume that the sequencer error is the same as the Solexa's.
Next up: color space and indels. Once my head stops hurting.
Comment
-
Originally posted by westerman View PostSo Approximately 3900 Mbases. (78M times 50 bases).
Comment
Latest Articles
Collapse
-
by seqadmin
Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.
Somatic Genomics
“We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...-
Channel: Articles
05-24-2024, 01:16 PM -
-
by seqadmin
The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...-
Channel: Articles
05-06-2024, 07:48 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 06-03-2024, 06:55 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
06-03-2024, 06:55 AM
|
||
Started by seqadmin, 05-30-2024, 03:16 PM
|
0 responses
24 views
0 likes
|
Last Post
by seqadmin
05-30-2024, 03:16 PM
|
||
Comprehensive Sequencing of Great Ape Sex Chromosomes Yields Insights into Evolution and Genetic Variability
by seqadmin
Started by seqadmin, 05-29-2024, 01:32 PM
|
0 responses
29 views
0 likes
|
Last Post
by seqadmin
05-29-2024, 01:32 PM
|
||
Started by seqadmin, 05-24-2024, 07:15 AM
|
0 responses
215 views
0 likes
|
Last Post
by seqadmin
05-24-2024, 07:15 AM
|
Comment