Hi Andris,
Sorry for not logging in here for a while... we've been overloaded recently. Every customer tells their friends, and so forth... We're moving into larger offices, so I hope we'll be able to hire more people, and reduce the load on us (in the short term its just adding MORE work). Now, to answer your good question:
Is this still for SOLiD, or Illumina ?
I'll assume SOLiD for now:
With 25mers, if you want to detect more than 2 substitutions, you need to go to VA (Valid Adjacent) mode. This will detect up to 4 color changes, and then apply VA rules to allow up to 2 SNPs (4 color code substitutions). This takes almost twice as long as regular mode (2 color code mismatches). For 25mers, it doesn't make sense to do more than 2 mismatches w/o VA because then you artificially cause repeats which are not real repeats, in other words, you lose specificity". It does make sense to use 3 mismatches for longer read lengths.
For example "50,3" (shorthand for "readlength=50 MaxAllowedMismatches=3) . Here are some run times
25,2 28 minutes
35,3 50 minutes
35,4 44 minutes
50,3 112 minutes
50,4
All the runs below are on a single old computer: 8 core (dual socket quad core) 2.0GHz Xeon with 24GB 667MHz RAM, but it does have a faster than normal hard disk (300MByte/sec). It is MUCH faster on the new Imagenix Genome Cruncher, which will be in production in about 2 weeks.
Also, why do I ask if SOLiD or Illumina ? Illumina has much lower substitution rates for 2 reasons:
1. A legitimate SNP only causes 1 base change (vs. two color code changes)
2. The raw machinbe error rate is lower, or maybe they
are just clever enough to filter out lower quality calls -
which it doesn't look like SOLiD is doing (yet).
so you run with a lower (MaxAllowedSubstitution) / (ReadLength) ratio on Illumina data. Three mismatches for Illumina is probably good for around 65mers or so. If you're interested, we'll run a test.
Sorry for not logging in here for a while... we've been overloaded recently. Every customer tells their friends, and so forth... We're moving into larger offices, so I hope we'll be able to hire more people, and reduce the load on us (in the short term its just adding MORE work). Now, to answer your good question:
Is this still for SOLiD, or Illumina ?
I'll assume SOLiD for now:
With 25mers, if you want to detect more than 2 substitutions, you need to go to VA (Valid Adjacent) mode. This will detect up to 4 color changes, and then apply VA rules to allow up to 2 SNPs (4 color code substitutions). This takes almost twice as long as regular mode (2 color code mismatches). For 25mers, it doesn't make sense to do more than 2 mismatches w/o VA because then you artificially cause repeats which are not real repeats, in other words, you lose specificity". It does make sense to use 3 mismatches for longer read lengths.
For example "50,3" (shorthand for "readlength=50 MaxAllowedMismatches=3) . Here are some run times
25,2 28 minutes
35,3 50 minutes
35,4 44 minutes
50,3 112 minutes
50,4
All the runs below are on a single old computer: 8 core (dual socket quad core) 2.0GHz Xeon with 24GB 667MHz RAM, but it does have a faster than normal hard disk (300MByte/sec). It is MUCH faster on the new Imagenix Genome Cruncher, which will be in production in about 2 weeks.
Also, why do I ask if SOLiD or Illumina ? Illumina has much lower substitution rates for 2 reasons:
1. A legitimate SNP only causes 1 base change (vs. two color code changes)
2. The raw machinbe error rate is lower, or maybe they
are just clever enough to filter out lower quality calls -
which it doesn't look like SOLiD is doing (yet).
so you run with a lower (MaxAllowedSubstitution) / (ReadLength) ratio on Illumina data. Three mismatches for Illumina is probably good for around 65mers or so. If you're interested, we'll run a test.
Comment