Originally posted by spirit
View Post
Thank you for your interest. I will answer these questions as I could.
1. What are the longest and shortest reads it can handle effectively?
Now, ZOOM could handle reads of length ranging from 15bp to 64bp. In fact, the kernel idea of ZOOM is quite easy to be extended to longer reads. It is the implementation that limits the length to be no more than 64bp. We will come to the 454 data later after the version for Illumina/Solexa and ABI SOLiD is stable.
2. how does it compare to Eland or MAQ in reads aligned per minute?
Since ELAND is the fastest software to deal with Illumina/Solexa data as we know, we compare the speed with ELAND in our benchmark. By mapping reads of length 15bp to 32bp with same sensitivity, ZOOM took half time of ELAND, even 1/3 when short reads are concerned. Furthermore, ELAND can only deal with no more than about 16 million reads. ZOOM has no limitation on the reads number as long as your RAM accepts. Both ELAND and ZOOM hash read and scan the reference sequence. So, if you process more reads in one scan pass, you could even save more time. Since the speed of ZOOM correlates closely to the length of reference sequence and the read length, it’s hard to give the number of reads aligned per minutes. To give you an impression, there is some data from our benchmark. When achieving full sensitivity of two mismatches:
It aligns 3.4 million reads of 36bp BAC reads to the 162k region (where the BAC comes from) in 37 seconds with 1.1G RAM.
It aligns 24 million reads of 36bp (5X of human chromosome 6) to chromosome 6 in 17 minutes 17 seconds with 6.5G RAM.
It aligns 22 million reads of 17bp CHIP-SEQ data to whole human genome in 4 hours and 22 minutes with 4.2G RAM.
For ABI/SOLiD data, the speed is slower than Illumina/Solexa data. ZOOM aligns 28 million reads of 25bp to E.coli genome(4M) with automatic sequencing error correction in 5 minutes.
We tried to compare the speed and sensitivity with MAQ since it’s famous. However, I am totally puzzled with its input format and output format. So lazy me gave up since its website declare it’s slower than ELAND.
3. How many mismatches does it handle?
In principle, you can decide the mismatch number as you like as long as it is less than the read length. ZOOM guarantee 100% sensitivity for a large range of <read length, mismatch number> cases.
When mismatches required is larger than the mismatch number in the cases of <read length, mismatch number> ZOOM used, sensitivity will decrease slightly. For example, mapping read of length 50bp could achieve 100% sensitivity with 4 mismatches. If you require 5 mismatches, then the sensitivity will decrease slightly. However, if you do need 100% sensitivity in these cases, feel free to contact us, we will satisfy you.
4. Does it have a gapped mode?
Yes, ZOOM can handle insertion/deletion between reads and the reference sequence. For Illumina/Solexa data, one gap but with any length you wish are allowed besides mismatches required. However, ZOOM can’t guarantee 100% sensitivity to find alignments with gap. I think nobody using filtering strategy could.
5. What format is required for the reference genome?
The format of reference genome would be a fasta file or multiple fasta files.
The format of Illumina/Solexa reads file can be in fasta, *_seq.txt or *_prb.txt. The format of ABI SOLiD *.csfasta is supported too.
6. What format are the alignments reported in?
For Illumina/Solexa data, the output of this release of ZOOM is reported in the format of “read_name reference_seq_name: position_of_mapped +/- mismatch_number” . If assembly is required, ZOOM will output the assembly consensus, coverage and frequency of {A,C,T,G} on each position of consensus.
For ABI/SOLiD data, besides the alignment information, ZOOM could output the reads decoded into the base space, with polymorphism on base space and sequencing error on color space highlighted.
In our next release, we will show the alignment in a GUI view showing the multiple alignment of mapped reads on the reference sequence and those heterozygous sites.
7. Can you comment on the cost/licenses it will be provided under?
About the cost of full version of ZOOM, maybe it’s a better way to ask the sales person when the website is ready next week. I think an academic-free version for Illumina/Solexa data with limited function will be provided too.
8. Can you give us the link to the download when it's ready?
Sure. I will offer the latest news when it’s ready.
1. What are the longest and shortest reads it can handle effectively?
Now, ZOOM could handle reads of length ranging from 15bp to 64bp. In fact, the kernel idea of ZOOM is quite easy to be extended to longer reads. It is the implementation that limits the length to be no more than 64bp. We will come to the 454 data later after the version for Illumina/Solexa and ABI SOLiD is stable.
2. how does it compare to Eland or MAQ in reads aligned per minute?
Since ELAND is the fastest software to deal with Illumina/Solexa data as we know, we compare the speed with ELAND in our benchmark. By mapping reads of length 15bp to 32bp with same sensitivity, ZOOM took half time of ELAND, even 1/3 when short reads are concerned. Furthermore, ELAND can only deal with no more than about 16 million reads. ZOOM has no limitation on the reads number as long as your RAM accepts. Both ELAND and ZOOM hash read and scan the reference sequence. So, if you process more reads in one scan pass, you could even save more time. Since the speed of ZOOM correlates closely to the length of reference sequence and the read length, it’s hard to give the number of reads aligned per minutes. To give you an impression, there is some data from our benchmark. When achieving full sensitivity of two mismatches:
It aligns 3.4 million reads of 36bp BAC reads to the 162k region (where the BAC comes from) in 37 seconds with 1.1G RAM.
It aligns 24 million reads of 36bp (5X of human chromosome 6) to chromosome 6 in 17 minutes 17 seconds with 6.5G RAM.
It aligns 22 million reads of 17bp CHIP-SEQ data to whole human genome in 4 hours and 22 minutes with 4.2G RAM.
For ABI/SOLiD data, the speed is slower than Illumina/Solexa data. ZOOM aligns 28 million reads of 25bp to E.coli genome(4M) with automatic sequencing error correction in 5 minutes.
We tried to compare the speed and sensitivity with MAQ since it’s famous. However, I am totally puzzled with its input format and output format. So lazy me gave up since its website declare it’s slower than ELAND.
3. How many mismatches does it handle?
In principle, you can decide the mismatch number as you like as long as it is less than the read length. ZOOM guarantee 100% sensitivity for a large range of <read length, mismatch number> cases.
When mismatches required is larger than the mismatch number in the cases of <read length, mismatch number> ZOOM used, sensitivity will decrease slightly. For example, mapping read of length 50bp could achieve 100% sensitivity with 4 mismatches. If you require 5 mismatches, then the sensitivity will decrease slightly. However, if you do need 100% sensitivity in these cases, feel free to contact us, we will satisfy you.
4. Does it have a gapped mode?
Yes, ZOOM can handle insertion/deletion between reads and the reference sequence. For Illumina/Solexa data, one gap but with any length you wish are allowed besides mismatches required. However, ZOOM can’t guarantee 100% sensitivity to find alignments with gap. I think nobody using filtering strategy could.
5. What format is required for the reference genome?
The format of reference genome would be a fasta file or multiple fasta files.
The format of Illumina/Solexa reads file can be in fasta, *_seq.txt or *_prb.txt. The format of ABI SOLiD *.csfasta is supported too.
6. What format are the alignments reported in?
For Illumina/Solexa data, the output of this release of ZOOM is reported in the format of “read_name reference_seq_name: position_of_mapped +/- mismatch_number” . If assembly is required, ZOOM will output the assembly consensus, coverage and frequency of {A,C,T,G} on each position of consensus.
For ABI/SOLiD data, besides the alignment information, ZOOM could output the reads decoded into the base space, with polymorphism on base space and sequencing error on color space highlighted.
In our next release, we will show the alignment in a GUI view showing the multiple alignment of mapped reads on the reference sequence and those heterozygous sites.
7. Can you comment on the cost/licenses it will be provided under?
About the cost of full version of ZOOM, maybe it’s a better way to ask the sales person when the website is ready next week. I think an academic-free version for Illumina/Solexa data with limited function will be provided too.
8. Can you give us the link to the download when it's ready?
Sure. I will offer the latest news when it’s ready.
Comment