Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • mido1951
    Senior Member
    • May 2014
    • 123

    Error rate in BBMAP

    hello,
    I want to know the equation error rate in BBMAP?
    how you calcuer that rate?
    Thank you for your reply.
    @Brain?
    Last edited by mido1951; 02-18-2016, 01:35 PM.
  • mido1951
    Senior Member
    • May 2014
    • 123

    #2
    any response for this?

    Comment

    • HESmith
      Senior Member
      • Oct 2009
      • 512

      #3
      What do you mean by "equation error rate"? Are you asking what fraction of reads are aligned to the incorrect loci? Or how errors affect the alignment accuracy? Or something else?

      Comment

      • mido1951
        Senior Member
        • May 2014
        • 123

        #4
        no, I want to know how you get the error rate?
        what is the distance to get the error rate?
        formally, how to express the error rate in BBMAP?

        Code:
        Error Rate:              23.1932%          281318         [B]3.9989%[/B][I](this)[/I]           24419957

        Comment

        • HESmith
          Senior Member
          • Oct 2009
          • 512

          #5
          Again, to what error rate are you referring? There are multiple errors that can be measured, such as the two examples I provided.

          What command(s) did you use to obtain the line of code in your previous post?
          Last edited by HESmith; 02-18-2016, 04:57 PM.

          Comment

          • Brian Bushnell
            Super Moderator
            • Jan 2014
            • 2709

            #6
            Hmmm, I think mido is referring to the stderr output of BBMap after it finishes running. Those columns are:

            (name), (% of reads with any errors), (number of reads with any errors), (% of bases with any errors), (number of bases with any errors).

            Please note that the value can be a bit misleading if a lot of reads are mapped with long deletions. For Illumina reads, it's better to look at the substitution rate unless you reduce BBMap's default "maxindel" flag from the default 16000 down to a much lower value of perhaps 100.

            The way it is calculated is based on the number of total alignment operations and number of matching alignment operations. Internally, when BBMap aligns a read to a reference, it supports 5 operations:

            M: Match
            S: Substitution (a base in the read differs from the reference)
            I: Insertion (a base in the read not present in the reference)
            D: Deletion (a base in the reference not present in the read)
            N: No-call (undefined in the read or the reference)

            These roughly correspond to cigar strings, but cigar strings do not have an equivalent of the "N" symbol, and they do have a lot of strange, poorly-defined symbols, rendering them not very useful in computation.

            Simply, the sub rate is calculated as S/(M+S+I+D+N). The del rate is D/(M+S+I+D+N), and so forth. The error rate is (S+I+D+N)/(M+S+I+D+N).

            For example, this match string:

            mmmmSmmmmImmmmDDDDDDmmmmN

            ...has 16 matches, 1 sub, 1 insertion, 6 deletions, and 1 N, for 25 total operations. The cigar string would be 4=X4=I4=6D4=X, or something like that (the specification is not fully defined). In this case the sub rate would be 4%, ins rate 4%, del rate 24%, and N rate 4%, giving an error rate of (1+1+6+1)/(16+1+1+6+1)=36%.
            Last edited by Brian Bushnell; 02-18-2016, 08:21 PM.

            Comment

            • mido1951
              Senior Member
              • May 2014
              • 123

              #7
              Originally posted by Brian Bushnell View Post
              (name), (% of reads with any errors), (number of reads with any errors), (% of bases with any errors), (number of bases with any errors).
              thankyou Brian for your Explanations.
              I made a mapping with BBmap and i saw the error rate (% of bases with any errors) because you have told me the other day that we must take account of this error rate.
              I am looking how you calculate that error rate and what is the equation because I need to put it in my research.
              thank you

              Comment

              • GenoMax
                Senior Member
                • Feb 2008
                • 7142

                #8
                Since you mostly appear to be working with MinION data is this question for data aligned with mapPacBio.sh?

                If you mapped your MinION data with regular bbmap.sh then the errors could be different.

                Comment

                • mido1951
                  Senior Member
                  • May 2014
                  • 123

                  #9
                  Originally posted by GenoMax View Post
                  Since you mostly appear to be working with MinION data is this question for data aligned with mapPacBio.sh?

                  If you mapped your MinION data with regular bbmap.sh then the errors could be different.
                  I did not use mapPacBio.sh because i have Minion reads.
                  But I have corrected reads and I made a mapping these reads to the reference genome and I see the error rate.
                  but I have to express the error rate and I have to put it in my research because I can not put an error rate without turning the equation of error rate.

                  Comment

                  • GenoMax
                    Senior Member
                    • Feb 2008
                    • 7142

                    #10
                    You should have used mapPacBio.sh with raw MinION reads but that is not the main point here.

                    @Brian: Did provide an explanation of how the error rate is calculated above (with the equation). Numbers we see in the final output must be an average across all mapped reads.

                    Comment

                    • mido1951
                      Senior Member
                      • May 2014
                      • 123

                      #11
                      the global error rate equation is:
                      The error rate is (S+I+D+N)/(M+S+I+D+N) ??

                      Comment

                      • GenoMax
                        Senior Member
                        • Feb 2008
                        • 7142

                        #12
                        Yes I think so. Per @Brian

                        The way it is calculated is based on the number of total alignment operations and number of matching alignment operations.

                        Comment

                        • mido1951
                          Senior Member
                          • May 2014
                          • 123

                          #13
                          what's the distance used in every M, D, I,....??

                          Comment

                          • GenoMax
                            Senior Member
                            • Feb 2008
                            • 7142

                            #14
                            @Brian will have to chime in with a final word but I think the rate is an average across all alignment operations as he indicated above. Each read will have its own M,S,I,D,N values from its CIGAR strings.

                            Comment

                            • mido1951
                              Senior Member
                              • May 2014
                              • 123

                              #15
                              Is that the distance M=N=I=D=S=1??
                              any response Brian?
                              Last edited by mido1951; 02-19-2016, 03:22 PM.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-26-2026, 11:10 AM
                              0 responses
                              12 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              48 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              107 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              125 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...