Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ECO
    --Site Admin--
    • Oct 2007
    • 1360

    Let's talk about ONT nanopore stuff!

    Per the request here, it seems time to create this forum! I'm really excited to see where this data goes and when I can get my hands on a MinION!
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    #2
    Originally posted by ECO View Post
    Per the request here, it seems time to create this forum! I'm really excited to see where this data goes and when I can get my hands on a MinION!
    More like "get my hand on a MinION" They're tiny!

    That said, it's not entirely clear to me what users are allowed to discuss about results, though I will describe my methodology for evaluating it. I used both the 1D and 2D reads (converted to fastq), and mapped with this command line:

    Code:
    mapPacBio.sh -Xmx30g k=7 in=reads.fastq ref=reference.fa maxreadlen=1000 minlen=200 idtag ow int=f qin=33 mhist=mhist1.txt idhist=idhist1.txt ehist=ehist1.txt indelhist=indelhist1.txt lhist=lhist1.txt gchist=gchist1.txt qhist=qhist1.txt qahist=qahist1.txt bhist=bhist1.txt out=mapped1.sam minratio=0.15 ignorequality slow ordered maxindel1=40 maxindel2=400 nodisk bs=bs1.sh
    Then I pasted the histograms into Excel and examined their scatterplots. This command breaks reads over 1kbp into 1kbp pieces and maps them independently; you can set this higher (up to 6kbp) but the mapping rate drops as the shred length increases. The output is in the same order as the input, so you can determine mapped read length by counting the number of consecutive sam lines with the same read name (the pieces get a name suffix of _1, _2, etc) that map to consecutive genomic positions.

    If you run the resulting "bs1.sh" bash shellscript, and have samtools installed, it will turn the sam output into a sorted, indexed bam file ready for IGV.
    Last edited by Brian Bushnell; 09-25-2014, 05:44 PM.

    Comment

    • WhatsOEver
      Senior Member
      • Apr 2012
      • 215

      #3
      Originally posted by Brian Bushnell View Post
      This command breaks reads over 1kbp into 1kbp pieces and maps them independently; you can set this higher (up to 6kbp) but the mapping rate drops as the shred length increases.
      And can you state on why it is dropping? To many errors in the alignment? Breaking the reads into small fragments sounds like one step backwards to me

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        So is this an one-off observation?

        Can people chime in (with non-specific comments, if they can't talk about specifics) if their experience is in-line/unlike the paper above?

        Comment

        • NextGenSeq
          Senior Member
          • Apr 2009
          • 482

          #5
          By the time people post results the data is already obsolete. The protocols and software change every week if not sooner.

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            @NextGenSeq: Are you implying that data from one week can't be trusted the next

            Comment

            • NextGenSeq
              Senior Member
              • Apr 2009
              • 482

              #7
              No it is improving every week.

              If I were PacBio I would be worried.

              Comment

              • Brian Bushnell
                Super Moderator
                • Jan 2014
                • 2709

                #8
                Originally posted by WhatsOEver View Post
                And can you state on why it is dropping? To many errors in the alignment? Breaking the reads into small fragments sounds like one step backwards to me
                The Nanopore reads I've seen have a sort of 'bistable' error model - lower for a while, then higher for a while, then lower for a while, etc. The higher-error mode is harder to map. Breaking the reads into pieces allows mapping the lower-error-mode pieces and discarding the higher-error-mode pieces; the shorter the piece, the more likely it will be entirely within a lower-error-mode region.

                Comment

                • samanta
                  Senior Member
                  • Feb 2010
                  • 108

                  #9
                  About a month back, one of my collaborators asked me to check about Oxford Nanopore, because she was planning to do a large sequencing project with Illumina+Pacbio, and wanted to know whether waiting for ONT would save her money. She heard good things from another colleague about the portability of Minions and was curious. I am not involved in the early access program and looked for any information available publicly. Based on what I found, I believe the company is advised by incompetent scientists, who are getting the company bad reputation.


                  My personal background - I have been working on nanotechnology since 1993, wrote the first (and highly cited) paper on calculating electrical current through small organic molecules in 1995 and worked with the NASA Nanotech group for several years in early 2000 before moving on to genomics. At NASA, one my closest collaborator worked on nanopore sequencing and another one worked on computational modeling current flow through the pore. However, I was never directly involved in either of those projects and the main reason being signal quality from the pores. So, the first thing I wanted to find out about ONT is the error rate, because the electrical signal from molecules moving at room temperature tends to get noisy. This is basic quantum (and statistical) physics, which no amount of technology can overcome.


                  The error rate is very important in deciding about assembly projects. It is definitely possible to do assembly from long erroneous reads, but you will need more reads and that means your costs go up. At the end of the day, my collaborator is interested in comparative costs between various technologies.


                  I tried to find a straight answer for over a month and could not. For example, Michael Schatz, who is involved in early access program, posted a figure showing 'assembly from nanopore' in twitter. When I asked him about the error rate, he gave some philosophical answer - 'I do not care, because assembly is possible, as long as there is more signal than noise'. WTF? Based on his slides from a recent conference (see here), he had the numbers, but decided to stonewall. Then I learned that the assembly was done with nanopore+ILMN (hybrid), whereas PacBio assemblies are done with PacBio only. Neither did I get a straight answer about error rate from Nick Loman - another scientist working closely with ONT CEO to release data. Those frustrations led me to write this blog post about the company -

                  An infinite amount of propaganda being spread about Oxford Nanopore really troubles the lowly janitors like us. We are not sure why this company and its ‘fanboys’ operate with innuendos passed around social media channels and not deliver any real information like other respectable companies would do. Hopefully Nick Loman’s presentation tomorrow will be backed by release of some real data, but until then scientists’ job is not to cheer-lead for companies, but find out the truth and represent it faithfully and accurately. The scientific community is failing to play its proper role just like they failed to debunk Ewan Birney’s misleading media campaign about ‘killing the junk DNA’ (@ENCODE_NIH). In fact, in case of ENCODE, scientists were so married to the propaganda from Birney and friends that even reputed journals attacked Dan Graur for simply telling the truth.



                  The situation seems to have improved somewhat after the company allowed Nick Loman to release his data (check our blog for link), and Michael Schatz posted his slides with the kind of information one needs to make decisions -

                  At last we get the analysis of Oxford Nanopore data that we had been looking for since first day. Michael Schatz posted the GI2014 slides of James Gurtowski from his lab in his website.



                  Hopefully, others will take a look at the data and come up with an objective answer regarding what is possible and not possible. The technology has promises, but error rate is a critical concern.
                  http://homolog.us

                  Comment

                  • samanta
                    Senior Member
                    • Feb 2010
                    • 108

                    #10
                    Originally posted by Brian Bushnell View Post
                    The Nanopore reads I've seen have a sort of 'bistable' error model - lower for a while, then higher for a while, then lower for a while, etc. The higher-error mode is harder to map. Breaking the reads into pieces allows mapping the lower-error-mode pieces and discarding the higher-error-mode pieces; the shorter the piece, the more likely it will be entirely within a lower-error-mode region.
                    That is possibly due to the molecule moving through the pore at different speed, and the HMM (Viterbi) calculation for base-calling being fixed at one mode and miscalling in the other mode.

                    This thing is definitely a physicist's paradise and would give rise to interesting physics papers, similar to what we used to do on current transport during early 1990s.
                    http://homolog.us

                    Comment

                    • NextGenSeq
                      Senior Member
                      • Apr 2009
                      • 482

                      #11
                      Originally posted by samanta View Post
                      About a month back, one of my collaborators asked me to check about Oxford Nanopore, because she was planning to do a large sequencing project with Illumina+Pacbio, and wanted to know whether waiting for ONT would save her money. She heard good things from another colleague about the portability of Minions and was curious. I am not involved in the early access program and looked for any information available publicly. Based on what I found, I believe the company is advised by incompetent scientists, who are getting the company bad reputation.


                      My personal background - I have been working on nanotechnology since 1993, wrote the first (and highly cited) paper on calculating electrical current through small organic molecules in 1995 and worked with the NASA Nanotech group for several years in early 2000 before moving on to genomics. At NASA, one my closest collaborator worked on nanopore sequencing and another one worked on computational modeling current flow through the pore. However, I was never directly involved in either of those projects and the main reason being signal quality from the pores. So, the first thing I wanted to find out about ONT is the error rate, because the electrical signal from molecules moving at room temperature tends to get noisy. This is basic quantum (and statistical) physics, which no amount of technology can overcome.


                      The error rate is very important in deciding about assembly projects. It is definitely possible to do assembly from long erroneous reads, but you will need more reads and that means your costs go up. At the end of the day, my collaborator is interested in comparative costs between various technologies.


                      I tried to find a straight answer for over a month and could not. For example, Michael Schatz, who is involved in early access program, posted a figure showing 'assembly from nanopore' in twitter. When I asked him about the error rate, he gave some philosophical answer - 'I do not care, because assembly is possible, as long as there is more signal than noise'. WTF? Based on his slides from a recent conference (see here), he had the numbers, but decided to stonewall. Then I learned that the assembly was done with nanopore+ILMN (hybrid), whereas PacBio assemblies are done with PacBio only. Neither did I get a straight answer about error rate from Nick Loman - another scientist working closely with ONT CEO to release data. Those frustrations led me to write this blog post about the company -

                      An infinite amount of propaganda being spread about Oxford Nanopore really troubles the lowly janitors like us. We are not sure why this company and its ‘fanboys’ operate with innuendos passed around social media channels and not deliver any real information like other respectable companies would do. Hopefully Nick Loman’s presentation tomorrow will be backed by release of some real data, but until then scientists’ job is not to cheer-lead for companies, but find out the truth and represent it faithfully and accurately. The scientific community is failing to play its proper role just like they failed to debunk Ewan Birney’s misleading media campaign about ‘killing the junk DNA’ (@ENCODE_NIH). In fact, in case of ENCODE, scientists were so married to the propaganda from Birney and friends that even reputed journals attacked Dan Graur for simply telling the truth.



                      The situation seems to have improved somewhat after the company allowed Nick Loman to release his data (check our blog for link), and Michael Schatz posted his slides with the kind of information one needs to make decisions -

                      At last we get the analysis of Oxford Nanopore data that we had been looking for since first day. Michael Schatz posted the GI2014 slides of James Gurtowski from his lab in his website.



                      Hopefully, others will take a look at the data and come up with an objective answer regarding what is possible and not possible. The technology has promises, but error rate is a critical concern.
                      There's paper in press claiming that using ONT data in combination with Illumina improves assembly quality ten fold.

                      Comment

                      • samanta
                        Senior Member
                        • Feb 2010
                        • 108

                        #12
                        ten fold compared to what?

                        Check page 13 of Michael Schatz's slides I posted here.

                        At last we get the analysis of Oxford Nanopore data that we had been looking for since first day. Michael Schatz posted the GI2014 slides of James Gurtowski from his lab in his website.


                        Illumina alone - N50=59Kb

                        Illumina + Nanopore - N50=362kbp

                        Illumina + Pacbio - N50=811kbp

                        So, my collaborator will lose by going from Pacbio to Nanopore. Moreover, the promise of carrying USB stick to the field does not hold, if she has to also carry a 90Kg Illumina machine.
                        http://homolog.us

                        Comment

                        • seqqeq
                          Junior Member
                          • Nov 2009
                          • 3

                          #13
                          Originally posted by samanta View Post
                          That is possibly due to the molecule moving through the pore at different speed, and the HMM (Viterbi) calculation for base-calling being fixed at one mode and miscalling in the other mode.

                          This thing is definitely a physicist's paradise and would give rise to interesting physics papers, similar to what we used to do on current transport during early 1990s.
                          The binary error mode is something I would expect from HMM basecalling. Not all levels have the same clear differentiation. At difficult regions, once you got a base wrong, all the following bases have to be consistent to be wrong also. So you get a string of very wrong calls, only to recover later back to consistent correct calls.

                          Systematic error is likely to be troubling.

                          Comment

                          • NextGenSeq
                            Senior Member
                            • Apr 2009
                            • 482

                            #14
                            Originally posted by samanta View Post
                            ten fold compared to what?

                            Check page 13 of Michael Schatz's slides I posted here.

                            At last we get the analysis of Oxford Nanopore data that we had been looking for since first day. Michael Schatz posted the GI2014 slides of James Gurtowski from his lab in his website.


                            Illumina alone - N50=59Kb

                            Illumina + Nanopore - N50=362kbp

                            Illumina + Pacbio - N50=811kbp

                            So, my collaborator will lose by going from Pacbio to Nanopore. Moreover, the promise of carrying USB stick to the field does not hold, if she has to also carry a 90Kg Illumina machine.
                            Versus a 2 ton PacBio instrument?

                            Anyway read the paper when it comes out. I can't post further info about it.

                            There is data showing over 99% accuracy of ONT data aligned to reference genomes which is not yet publicly available.

                            Comment

                            • robp
                              Member
                              • Aug 2013
                              • 13

                              #15
                              Originally posted by samanta View Post
                              That is possibly due to the molecule moving through the pore at different speed, and the HMM (Viterbi) calculation for base-calling being fixed at one mode and miscalling in the other mode.

                              This thing is definitely a physicist's paradise and would give rise to interesting physics papers, similar to what we used to do on current transport during early 1990s.
                              I also think an algorithm for dealing with this would give rise to a very interesting CS paper. I'd be willing to bet that changes in molecular speed affect the resulting signal in detectable ways, and that modifying the underlying HMM to account for this is possible. ONP base-calling definitely seems like an interesting computational problem.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM
                              • seqadmin
                                Investigating the Gut Microbiome Through Diet and Spatial Biology
                                by seqadmin




                                The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                                02-24-2025, 06:31 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 05:03 AM
                              0 responses
                              16 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-19-2025, 07:27 AM
                              0 responses
                              16 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              17 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-03-2025, 01:15 PM
                              0 responses
                              185 views
                              0 reactions
                              Last Post seqadmin  
                              Working...