Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • First Helicos Publication! Single Molecule DNA Seq of a "Viral" Genome

    Science has just released the first publication detailing real data from the Heliscope.

    They undertook the extremely exciting project of resequencing the M13 genome ( was I hoping for too much?) in collaboration with researchers at Ohio Univ and Stanford U.

    Single-Molecule DNA Sequencing of a Viral Genome

    Timothy D. Harris,1* Phillip R. Buzby,1 Hazen Babcock,1 Eric Beer,1 Jayson Bowers,1 Ido Braslavsky,2 Marie Causey,1 Jennifer Colonell,1 James DiMeo,1 J. William Efcavitch,1 Eldar Giladi,1 Jaime Gill,1 John Healy,1 Mirna Jarosz,1 Dan Lapen,1 Keith Moulton,1 Stephen R. Quake,3 Kathleen Steinmann,1 Edward Thayer,1 Anastasia Tyurina,1 Rebecca Ward,1 Howard Weiss,1 Zheng Xie1

    The full promise of human genomics will be realized only when the genomes of thousands of individuals can be sequenced for comparative analysis. A reference sequence enables the use of short read length. We report an amplification-free method for determining the nucleotide sequence of more than 280,000 individual DNA molecules simultaneously. A DNA polymerase adds labeled nucleotides to surface-immobilized primer-template duplexes in stepwise fashion, and the asynchronous growth of individual DNA molecules was monitored by fluorescence imaging. Read lengths of >25 bases and equivalent phred software program quality scores approaching 30 were achieved. We used this method to sequence the M13 virus to an average depth of >150x and with 100% coverage; thus, we resequenced the M13 genome with high-sensitivity mutation detection. This demonstrates a strategy for high-throughput low-cost resequencing.

    1 Helicos BioSciences Corporation, One Kendall Square, Cambridge, MA 02139, USA.
    2 Department of Physics and Astronomy, Ohio University, Athens, OH 45701, USA.
    3 Department of Bioengineering, Stanford University, and Howard Hughes Medical Institute, Stanford, CA 94305, USA.
    The paper abstract and full text (for subscribers) is located here:

    I'm reading it as we speak, and would welcome interpretations of the data!

  • #2
    Official press release here:

    Helicos BioSciences Announces Single Molecule DNA Sequence Data Published in Science Magazine

    Data Validates the World’s First Single Molecule Sequencing of an Organism

    CAMBRIDGE, Mass.--(BUSINESS WIRE)--Helicos BioSciences (NASDAQ: HLCS), a life science company focused on innovative genetic analysis technologies, today announced the publication of a report in Science Magazine demonstrating the first single molecule sequencing of an organism. The report depicts the use of Helicos’ proprietary True Single Molecule Sequencing (tSMS)™ technology to re-sequence the M13 viral genome. The report will appear in the April 4, 2008 print issue of Science Magazine.

    The report demonstrates that the tSMS technology can reliably re-sequence a moderately complex genome without the associated errors, cost, and experimental complexity of amplification. The tSMS process captures images of single dye labeled nucleotides as they are incorporated to determine the sequence of the individual DNA strands. In addition, the tSMS method simplifies the DNA sample preparation process and maximizes throughput by packing individual strands of DNA at high densities onto the sequencing surface.

    “The ability to sequence individual strands of genomic DNA has been a goal of the scientific community for more than 20 years,” said Timothy Harris, PhD, senior director of research at Helicos BioSciences and the report’s corresponding author. “The data in Science Magazinedemonstrate the robustness of our single molecule method and demonstrate our ability to accurately detect single base mutations. Not only does this data represent the first of its kind, but a significant milestone in the genomics revolution.”
    To validate its technology, Helicos scientists sequenced the M13 virus genome, examining more than 280,000 strands of captured DNA, directly visualizing the sequential incorporation of individual labeled nucleotides. Overall per-base accuracy was better than 99% and the accuracy of the consensus sequence was 100%. To assess accuracy and robustness of mutation detection, Helicos’ scientists introduced in silico single nucleotide changes into the reference M13 virus genome sequence and compared them to Helicos DNA sequences. The tSMS technology correctly found 98% of 500 simulated mutations with zero false positive errors.

    “This data, remarkable as it is, was based on the first generation of our tSMS chemistry,” said Bill Efcavitch, PhD, senior vice president for product R&D at Helicos BioSciences. “We have since developed new generations of ‘one-base-at-a-time’ nucleotides which allow more accurate homopolymer sequencing, and lower overall error rates.”
    The report published in Science Magazine initiates the path to many other scientific reports Helicos plans to publish in the upcoming months. These reports will highlight data recently announced at the AGBT meeting in Marco Island further demonstrating single molecule sequencing being applied to both BAC sequencing accuracy, and the ability to count microRNAs as well as identify putative novel miRNAs.


    • #3
      And the stock is rising

      It does not take much to boost their stock price - up 40% within a week !!

      Their paper is rather a proof of concept than the presentation of a machine that can compete in the nextgen seq market.


      • #4
        Something else I noticed, in their introduction they bad mouth the library preparation protocols for all the other platforms, basically saying that adding adapters is labor intensive, etc, then they go on to prove that they absolutely MUST use adapters to get bidirectional reads because their error rates are so high.

        Seems like C incorporations are a killer...


        • #5
          they "sold" a Heliscope to Expression Analysis

          Actually their stock price went up initially a week ago when they announced the sale of the first machine to . Is actually the second time they announced the first sale. They do not tell what Expression Analysis payed for or how much Helicos had to pay to make them try the machine. May be the will resequence bacteriophage lambda soon. Is about four times the size of M13.
          At the current cash burn rate Helicos has enough cash for about a year or so -> they absolutely need positive news to at least temporarily drive the stock price up.


          • #6
            Originally posted by terabase View Post
            Maybe the will resequence bacteriophage lambda soon. Is about four times the size of M13.
            Tiny genome resequencing service. No one ever said how big the $1000 genome had to be!


            • #7
              Originally posted by ECO View Post
              Tiny genome resequencing service. No one ever said how big the $1000 genome had to be!
              Lol, but remember that this experiment was done on a pre-production machine, using only one lane (out of 2x25 per run) with about 100x coverage per strand. And the obvious advantage is the lack of amplicication bias, not that you dont have to ligate linkers. And multipass readings are not the same as bidirectional reads. I guess we will se more in the coming days but if they could come close to what they say it will be hard times for SOLiD / Solexa sytems to compete at the current reagent costs...


              • #8
                Originally posted by ECO View Post
                Something else I noticed, in their introduction they bad mouth the library preparation protocols for all the other platforms, basically saying that adding adapters is labor intensive, etc, then they go on to prove that they absolutely MUST use adapters to get bidirectional reads because their error rates are so high.
                I noticed that too. I still think the killer for them is going to be the expensive optics. There are other ways of detecting really small amounts that don't require a million dollars in instrumentation, ya know?


                • #9
                  seen helicos data


                  Helicos seems to to be so popular here
                  Well, bells and whistles about M13 was maybe not such a wise decision...

                  However, I currently analyze a data set I received from Helicos. A DGE study from a human tissue. I have to say - looks pretty good.

                  Can´t tell more here... NDA!

                  But to summarize: biological results absolutely comparable to such derived from Solexa! I think they get their act together.




                  • #10
                    Hi Klaus,

                    good to see that they are generating usable data with the Heliscope. Could you share any numbers from the sequencing or is that also under NDA?...


                    • #11
                      I´ll see


                      I´ll see what I can do. But not before next week. I cant access our secure servers from here...



                      • #12

                        what I can share are numbers from our first step analysis after mapping. Mapping was very stringent: best unique hit
                        (= at least one shortest unique sub-sequence contained, point mutations allowed, no indels allowed):

                        here the summary:
                        The data set contains reads from the following organism:
                        Homo sapiens 4020914

                        Read length (bp) number
                        11 86
                        12 1333
                        13 6904
                        14 20384
                        15 49695
                        16 91159
                        17 131835
                        18 153064
                        19 166469
                        20 164352
                        21 169349
                        22 171943
                        23 178216
                        24 185147
                        25 210388
                        26 230526
                        27 245471
                        28 178388
                        29 168223
                        30 143917
                        31 135030
                        32 122484
                        33 113991
                        34 106351
                        35 98137
                        36 90642
                        37 83322
                        38 77510
                        39 70555
                        40 63349
                        41 56651
                        42 50235
                        43 44120
                        44 38896
                        45 34378
                        46 30472
                        47 26112
                        48 22580
                        49 18993
                        50 15407
                        51 12211
                        52 9602
                        53 7529
                        54 5801
                        55 4115
                        56 2990
                        57 2174
                        58 1517
                        59 1204
                        60 953
                        61 821
                        62 623
                        63 664
                        64 700
                        65 464
                        66 572
                        67 514
                        68 655
                        69 356
                        70 400
                        71 287
                        72 158
                        73 140
                        74 103
                        75 80
                        76 55
                        77 33
                        78 25
                        79 26
                        80 7
                        81 15
                        82 4
                        83 7
                        84 1
                        85 1
                        86 4
                        87 3
                        88 2
                        89 2
                        90 3
                        92 2
                        93 3
                        94 6
                        95 1
                        96 4
                        98 1
                        101 1
                        102 2
                        107 1
                        110 1
                        112 1
                        113 3
                        116 2
                        123 1

                        Intergenic regions 1810570558bp 58.8%
                        Promoters 44676168bp 1.5%
                        Exons 97616725bp 3.2%
                        Introns 1172232197bp 38.1%

                        Read distribution:
                        Intergenic regions 1694883 42.2%
                        Promoters 325016 8.1%
                        Exon 1079293 26.8%
                        Intron 1167470 29.0%
                        Partial 79268 2.0%


                        Next step: clustering
                        summary output:

                        Cluster detection:
                        window size: 100
                        reads/window: 7
                        probability.: 1.1e-10
                        clusters detected: 35496
                        reads in clusters: 2118299 52.68%
                        min. cluster length: 13
                        max. cluster length: 5876
                        avg. cluster length: 117
                        min. number of reads: 7
                        max. number of reads: 251937
                        avg. number of reads: 59

                        intergenic regions 10945 30.8%
                        promoters 3369 9.5%
                        exon 10501 29.6%
                        intron 8883 25.0%
                        partial 5167 14.6%


                        expression analysis:

                        analyzed transcripts: 85562
                        expressed transcripts: 72514 84.8%
                        normalized expression value (NE):
                        minimum: 0.000
                        maximum: 95.675
                        average: 0.061
                        analyzed loci: 32514
                        expressed loci: 26160 80.5%

                        NE Transcripts
                        (0.000:0.020] 48993
                        (0.020:0.040] 9557
                        (0.040:0.060] 4390
                        (0.060:0.080] 2294
                        (0.080:0.100] 1465
                        (0.100:0.120] 1020
                        (0.120:0.140] 707
                        (0.140:0.160] 576
                        (0.160:0.180] 453
                        (0.180:0.200] 353
                        (0.200:0.220] 245
                        (0.220:0.240] 251
                        (0.240:0.260] 174
                        (0.260:0.280] 166
                        (0.280:0.300] 131
                        (0.300:0.320] 121
                        (0.320:0.340] 100
                        (0.340:0.360] 69
                        (0.360:0.380] 81
                        (0.380:0.400] 107
                        (0.400:95.675] 1261


                        This was very crude first analysis run at all parameters default.
                        Mapping on our mapping station took 10 minutes
                        (parameters for best unique are least time consuming)

                        Rest of analysis took 7 minutes on GGA.




                        • #13

                          thanks for sharing the numbers, it sure looks promising. Was this data from one lane only?


                          • #14
                            The raw reads were pooled from two channels.

                            Again, this was a quick and dirty first pass. Mapped tag numbers can be increased significantly with more relaxed mapping parameters. However, downstream pathway mining of the expressed transcripts 100% confirms the biological context of the sample.



                            • #15
                              thanks for the Helicos data. What number of mismatches was allowed in the alignment?
                              Basically there was no limit on the number of point mutations allowed. The "unique best match" setting in our method works like that:

                              There is a tree with shortest unique words for each position in the genome. This shortest unique word matches exactly once in the genome. E.g. one starts with a tuple of 5 checks uniqueness, increases one bp, checks uniqueness,6..,7.. 8.. and so on until the "word" is unique. SNPs are taken into account. This library of shortest unique words has a variable length.

                              For mapping parameters can be introduced: point mutations and indels within those shortest unique words.

                              For "unique best match" none of the above is allowed (=most stringent). Reads from Helicos were checked whether they contain at least one exact shortest unique word in full. Then around this position, alignmet grows into the read in both directions. Here point mutations were allowed, no limit imposed. At this growth, in this case, SNPs were not taken into account. So several of the observed point mutations can originate from a SNP.

                              Very basic statistics:
                              Point mutations # of reads
                              0 509622
                              1 369486
                              2 318733
                              3 313244
                              4 297974
                              5 297301
                              6 298140
                              7 344822
                              8 233730
                              9 191911
                              10 153682
                              11 131893
                              12 113460
                              13 98302
                              14 82719
                              15 69071
                              16 57540
                              17 46855
                              18 37505
                              19 30710
                              20 24214

                              Keep in mind that we have read lengths up to 123 bp. The above numbers need to be normalized to read length and length and count of shortest unique words contained.


                              Latest Articles


                              • seqadmin
                                The Impact of AI in Genomic Medicine
                                by seqadmin

                                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                02-26-2024, 02:07 PM
                              • seqadmin
                                Multiomics Techniques Advancing Disease Research
                                by seqadmin

                                New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                                A major leap in the field has
                                02-08-2024, 06:33 AM





                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:12 AM
                              0 responses
                              Last Post seqadmin  
                              Started by seqadmin, 02-23-2024, 04:11 PM
                              0 responses
                              Last Post seqadmin  
                              Started by seqadmin, 02-21-2024, 08:52 AM
                              0 responses
                              Last Post seqadmin  
                              Started by seqadmin, 02-20-2024, 08:57 AM
                              0 responses
                              Last Post seqadmin