Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • mido1951
    Senior Member
    • May 2014
    • 123

    find error rate in Raw Read

    Hi,
    How can i find the overall error rate of the Raw reads? and the rate of (substitutions, deletions and insertions)?
    how to do?
    Thanks
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    Look at @Brian's post (#18) here. You would either need a reference or you will have to assemble your data into a reference.

    Comment

    • mido1951
      Senior Member
      • May 2014
      • 123

      #3
      i used BBmap for mapping. it is a good tool?
      this is my result:
      Code:
         ------------------   Results   ------------------
      
      Genome:                 1
      Key Length:             13
      Max Indel:              16000
      Minimum Score Ratio:    0.56
      Mapping Mode:           normal
      Reads Used:             108490  (52257243 bases)
      
      Mapping:                4967.598 seconds.
      Reads/sec:              21.84
      kBases/sec:             10.52
      
      
      Read 1 data:            pct reads       num reads       pct bases          num bases
      
      mapped:                  29.4433%           31943        29.9910%           15672446
      unambiguous:             29.0607%           31528        29.6165%           15476766
      ambiguous:                0.3825%             415         0.3745%             195680
      low-Q discards:           0.0000%               0         0.0000%                  0
      
      perfect best site:        0.0046%               5         0.0002%                126
      semiperfect site:         0.0046%               5         0.0002%                126
      
      Match Rate:                   NA               NA        85.4575%           14050523
      Error Rate:              40.9037%           31938        14.5421%            2390951
      Sub Rate:                40.9024%           31937         7.2366%            1189814
      Del Rate:                40.7756%           31838         4.6777%             769083
      Ins Rate:                40.7103%           31787         2.6278%             432054
      N Rate:                   0.0013%               1         0.0003%                 55
      
      Total time:             5126.959 seconds.
      in this example: the global error rate is 40.9024% or 14.5421%(per bases)?
      and for Sub, Del, Ins?
      thanks

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        BBMap is a great tool but it needs to be applied appropriately.

        You need to provide additional information to get more. What kind of data is this, what are you mapping against, the command line options you used. BTW: You have only 30% of the reads mapping so that is low numbers to begin with (if you are mapping against a reference).

        Comment

        • mido1951
          Senior Member
          • May 2014
          • 123

          #5
          yes, I use my raw Reads for mapping to the reference.
          and i use the default parameters for BBmap.
          in this example: the global error rate is 40.9024% or 14.5421%(per bases)?
          and for Sub, Del, Ins?
          thanks

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            It appears that you did not read the post #18 in the thread that I linked above that tells how to plot the rates you are looking for.

            If you are working with MinION or PacBio type data (long reads) then you should be using a mapPacBio.sh instead of bbmap.sh. That error rate may not be meaningful as it stands now.

            To calculate the error rate (for long reads) you may have to do something like this:

            Code:
            $ mapPacBio.sh in=your_reads.fa ref=ref.fa mhist=mhist.txt qhist=qhist.txt maxlen=2000
            @Brian is likely to swing by this thread later tonight (and may have specific suggestions). That example above was for PacBio data but I assume it may work for MinION data.

            Comment

            • Brian Bushnell
              Super Moderator
              • Jan 2014
              • 2709

              #7
              Well, I'm not really sure what those reads are, at an average length of ~481bp. Probably PacBio, though, considering the ~14.5% error rate.

              As GenoMax said, you should map PacBio (or minIon) reads with mapPacBio.sh. The usage and algorithm are the same as bbmap.sh, but it is designed for the PacBio error model.

              The error rates you want are in the "pct bases" column.

              Comment

              • mido1951
                Senior Member
                • May 2014
                • 123

                #8
                i work with MinION reads with an average lenght ~5000bp.
                For this i use mapPacBio.sh?

                excuse me, for the error rate i work with "pct bases" column?
                thanks

                Comment

                • Brian Bushnell
                  Super Moderator
                  • Jan 2014
                  • 2709

                  #9
                  Originally posted by mido1951 View Post
                  i work with MinION reads with an average lenght ~5000bp.
                  For this i use mapPacBio.sh?
                  Yes. And you may need to use higher-than-default sensitivity, if the data is particularly low quality; you can adjust sensitivity with the "minid" flag. E.g. "minid=0.5" will try to map reads down to 50% identity.

                  excuse me, for the error rate i work with "pct bases" column?
                  thanks
                  That's correct.

                  Comment

                  • mido1951
                    Senior Member
                    • May 2014
                    • 123

                    #10
                    I used mapPacBio.sh and mapPacBio8k.sh.
                    but the execution takes a long time compared to bbmap.sh.
                    that's logic?

                    Comment

                    • Brian Bushnell
                      Super Moderator
                      • Jan 2014
                      • 2709

                      #11
                      Yep, mapPacBio is slower, because it supports higher sensitivity and longer reads. Note that "mapPacBio8k.sh" is not in the latest release, so you may be using an older version that might also be somewhat slower.

                      Comment

                      Latest Articles

                      Collapse

                      • SEQadmin2
                        Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                        by SEQadmin2


                        I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                        Here are nine questions we think about, in roughly the order they matter, before...
                        Yesterday, 07:11 AM
                      • SEQadmin2
                        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                        by SEQadmin2


                        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                        ...
                        06-02-2026, 10:05 AM
                      • SEQadmin2
                        Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                        by SEQadmin2


                        With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                        Introduction

                        Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                        05-22-2026, 06:42 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, 06-17-2026, 06:09 AM
                      0 responses
                      17 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-09-2026, 11:58 AM
                      0 responses
                      38 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-05-2026, 10:09 AM
                      0 responses
                      43 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-04-2026, 08:59 AM
                      0 responses
                      49 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...