Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • lwhitmore
    Member
    • Aug 2013
    • 70

    Mosaik Aligner/IUPAC codes

    Hey All,
    I am currently trying to use Mosaik to align reads to a reference with IUPAC codes. When I set the mismatch rate to 0 (meaning no mismatches are allowed) reads do not align. Looking at reads specifically they should align if the IUPAC code alignments were not counted as mismatches. My question is if there is a way to align to IUPAC codes and them not be counted as mismatches.

    Any help would be appreciated
    Leanne
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    #2
    BBMap has a special mode enabled with the flag "semiperfectmode". In this mode, reads will only align if they have zero mismatches and zero indels, however ambiguous bases in the reference are NOT considered mismatches. Also, reads that align off the end of a scaffold, with no mismatches, are considered semiperfect. So semiperfect is defined as: "At least 50% of the bases in the read match the reference, there are no mismatches, no indels, and no ambiguous bases in the read". As a result this mode is useful in mapping to bad assemblies with very short contigs.

    There's also perfectmode, which is very fast but does not allow any ambiguous bases in the reference.

    Comment

    • lwhitmore
      Member
      • Aug 2013
      • 70

      #3
      Thanks for your response I will definitely check that aligner out (is it published?)

      Comment

      • Brian Bushnell
        Super Moderator
        • Jan 2014
        • 2709

        #4
        I'm working on a publication; hopefully it will be ready in a few months. I do have a powerpoint comparing it to other aligners here.

        Comment

        • lwhitmore
          Member
          • Aug 2013
          • 70

          #5
          Ok cool, one more question I was able to successfully align 2 reads to an extremely small reference genome using the following command


          ./bbmap.sh ref=/Users/debjit/Documents/PROJECT_piRNA/7_ALIGNMENT2L1FAMS_LW/29Family_L1MdA_I_LW.fa in=/Users/debjit/Documents/PROJECT_piRNA/7_ALIGNMENT2L1FAMS_LW/SRR519779test_LW.fastq nodisk semiperfectmode=F

          The results (below) appear to be correct. I am just wondering what the Error rate is referring to? Is it the IUPAC base? And is there anyway to change the mismatch rate?? Thanks again for all your help!


          Set genScaffoldInfo=true
          Set genome to 1

          Loaded Reference: 0.005 seconds.
          Loading index for chunk 1-1, build 1
          Indexing threads started for block 0-1
          Indexing threads finished for block 0-1
          Generated Index: 0.451 seconds.
          Analyzed Index: 3.652 seconds.
          Cleared Memory: 0.242 seconds.
          Processing reads in single-ended mode.
          Started read stream.
          Started 4 mapping threads.
          Detecting finished threads: 0, 1, 2, 3

          ------------------ Results ------------------

          Genome: 1
          Key Length: 13
          Max Indel: 16000
          Minimum Score Ratio: 0.56
          Mapping Mode: normal
          Reads Used: 2 (52 bases)

          Mapping: 3.542 seconds.
          Reads/sec: 0.56
          kBases/sec: 0.01


          Read 1 data:

          mapped: 100.0000% 2 reads
          unambiguous: 0.0000% 0 reads


          perfect best site: 0.0000%
          semiperfect site: 0.0000%
          ambiguousMapping: 100.0000% (Kept)
          low-Q discards: 0.0000%

          Match Rate: 96.1538% 50
          Error Rate: 3.8462% 2
          Sub Rate: 3.8462% 2
          Del Rate: 0.0000% 0
          Ins Rate: 0.0000% 0
          N Rate: 0.0000% 0

          Total time: 7.984 seconds.

          Comment

          • Brian Bushnell
            Super Moderator
            • Jan 2014
            • 2709

            #6
            Leanne,

            "error rate" includes substitutions, deletions, and insertions. In this case, you had 2 substitutions, which were 3.84% of the total input bases.

            Any IUPAC code will register in the "N Rate", which does NOT count as an error.

            Comment

            • lwhitmore
              Member
              • Aug 2013
              • 70

              #7
              Upon visualization these reads are not aligning to the IUPAC base at all. I should mention that I tried setting the semiperfectmode=T and got no alignments.

              Again any help with this would be appreciated

              Comment

              • Brian Bushnell
                Super Moderator
                • Jan 2014
                • 2709

                #8
                Leanne,

                The alignments reported are the best alignments. If there are no alignments with zero mismatches and at least 50% of the bases matching (not including IUPAC symbols), it will report nothing in semiperfectmode. Can you post the two reads and the reference?

                If there are a lot of ambiguous bases in the reference to which you which to map, it is possible that there were no seeds, since you are using extremely short 25bp reads and the seed length is set at 13 by default. You can try running with the flag "k=7" to decrease the seed length to 7. There must be at least one seed hit to find the candidate mapping site.

                Comment

                • lwhitmore
                  Member
                  • Aug 2013
                  • 70

                  #9
                  Sure,
                  >Reference
                  CGACTCGAGACTCGAGCCCCGGGCTACCTTGCCAGCAGAGTCTTGCCCAACACCCGCAAGGGTCCACACGGGACTCCCCACGGGACCCTAAGACCTCTGGTGAGTGGATCACAGTGCCTGCCCCAATCCAATCGCGCGGAACTTGAGACTGCGGTACATAGGGAAGCAGGCTACCCGGGCCTGATCTGGGGCACAAGTCCCTTCCGCTCGACTCGWGACTCGAGCCCCGGGCTACCTTGCCAGCAGAGTCTTGCCCAACACCCGCAAGGGTCCACACGGGACTCCCCACGGGACCCTAAGACCTCTGGTGAGTGGATCACAGTGCCTGCCCCAATCCAATCGCGCGGAACTYGAGACTGCGGTACATAGGGAAGCAGGCTACCCGGGCCTGATCTGGGGCACAAGTCCCTTCCGCTCGACTCGAGACTCGAGCCCCGGGCTACCTTGMCAGCAGAGTCTTGCCCAACACCCGCAAGGGCCCACACGGGACTCCCCACGGGACCCTAAGACCTCTGGTGAGTGGAACACAGCGCCTACCCCAATCCAATCGCGTGGAACTTGAGACTGCGGTACATAGGGAAGCAGGCTACCCGGGCTTGATCTGGGGCACAAACCCCTTCCACTCCACTCGAGCCCCGGCTACCTTGCCAGCTGAGTCGCCTGACACCCGCAAGGGCCCACACAGGATTCCACACGTGATCCTAAGACCTCTAGTGAGTGGAACACAACTTCTGCCAGGAGTCTGGTTCGAACACCAGATATCTGGGTACCTGCCTTGCAAGAAGAGAGCTTGCCTGCAGAGAATACTCTGCCCACTGAAACTAAGGAGAGTGCTACCCTCCAGGTCTGCTCATAGAGGCTAACAGAGTCACCTGAAGAACAAGCTCTTAACAGTGACAACTAAAACAGCTAGCTTCAGAGATTACCAGATGGCGAAAGGCAAACGTAAGAATCCTACTAACAGAAATCAAGACCACTCACCATCATCAGAACGCAGCACTCCCACCCCACCTAGTCCTGGGCACCCCAACACAACCGAAAATCTAGACCCAGATTTAAAAACATTTCTCATGATGATGATAGAGGACATCAAGAAGGACTTTCATAAGTCACTTAAAGATTTACAGGAGAGCACTGCTAAAGAGTTACAGGCTCTTAAAGAAAAGCAGGAAAACACAGCCAAACAGGTGATGGAAATGAACAAAACCATACTAGAACTAAAAGGGGAAGTAGACACAATAAAGAAAACCCAAAGCGAGGCAACGCTGGAGATAGAAACCCTAGGAAAGAGATCTGGAACCATAGATGCGAGCATCAGCAACAGAATACAAGAAATGGAAGAGAGAATCTCAGGTGCAGAAGATTCCATAGAGAACATCGACACAACAGTCAAAGAAAATACAAAATGCAAAAGGATCCTAACTCAAAACATCCAGGTAATCCAGGACACAATGAGAAGACCAAACCTACGGATAATAGGAATTGATGAGAATGAAGATTTTCAACTTAAAGGGCCAGCTAATATCTTCAACAAAATAATAGAAGAAAACTTCCCAAACATAAAAAAAGAGATGCCCATGATCATACAAGAAGCATACAGAACTCCAAATAGACTGGACCAGAAAAGAAATTCCTCCCGACACATAATAATCAGAACAACAAATGCACTAAATAAAGATAGAATATTAAAAGCAGTAAGGGAGAAAGGTCAAGTAACATATAAAGGAAGGCCTATCAGAATTACACCAGACTTTTCACCAGAGACTATGAAAGCCAGAAGAGCCTGGACAGATGTTATACAGACACTAAGAGAACACAAATGCCAGCCCAGGCTACTATACCCGGCCAAACTCTCAATTACCATAGATGGAGAAACCAAAGTATTCCACGACAAAACCAAGTTCACACAATATCTTTCCACGAATCCAGCCCTTCAAAGGATAATAACAGAAAAGAAGCAATACAAGGACGGAAATCACGCCCTAGAACAACCAAGAAAGTAATCATTCAACAAACCAAAAAGAAGACAGCCACAAGAACAGAATGCCAACTCTAACAACAAAAATAAAAGGGAGCAACAATTACTTTTCCTTAATATCTCTTAATATCAATGGACTCAATTCCCCAATAAAAAGACATAGACTAACAGACTGGCTACACAAACAGGACCCAACATTCTGCTGCTTACAGGAAACCCATCTCAGGGAAAAAGACAGACACTACCTCAGAGTGAAAGGCTGGAAAACAATTTTCCAAGCAAATGGACTGAAGAAACAAGCTGGAGTAGCCATTTTAATATCGGATAAAATCGACTTCCAACCCAAAGTTATCAAAAAAGACAAGGAGGGACACTTCATACTCATCAAAGGTAAAATCCTCCAAGAGGAACTCTCAATTCTGAATATCTACGCACCAAATGCAAGGGCAGCCACATTCATTAGAGACACTTTAGTAAAGCTCAAAGCATACATTGCACCTCACACAATAATAGTGGGAGACTTCAACACACCACTTTCTTCAAAGGACAGATCGTGGAAACAGAAACTAAACAGGGACACAGTGAAACTAACAGAAGTTATGAAACAAATGGACCTGACAGATATCTACAGAACATTTTATCCTAAAACAAAAGGATATACCTTCTTCTCAGCACCTCACGGGACCTTCTCCAAAATTGACCATATAATTGGTCACAAAACAGGCCTCAATAGATACAAAAATATTGAAATTGTCCCATGTATCCTATCAGACCACCATGGCCTAAGACTGATCTTCAATAACAACATAAATAATGGAAAGCCAACATTCACGTGGAAACTGAATAACACTCTTCTCAATGATACCTTGGTCAAGGAAGGAATAAAGAAAGAAATTAAAGACTTTTTAGAGTTTAATGAAAATGAAGCCACAACGTACCCAAACCTATGGGACACAATGAAAGCATTTCTAAGAGGGAAACTCATAGCGCTGAGTGCCTCCAAGAAGAAACGGGAGACAGCACATACTAGCAGCTTGACAACACATCTAAAAGCCCTAGAAAAAAAGGAAGCAAATTCACCCAAGAGGAGTAGACGGCAGGAAATAATCAAACTCAGGGGTGAAATCAACCAAGTGGAAACAAGAAGAACTATTCAAAGAATTAACCAAACGAGGAGTTGGTTCTTTGAGAAAATCAACAAGATAGATAAACCCTTAGCTAGACTCACTAAAGGGCACAGGGACAAAATCCTAATTAACAAAATCAGAAATGAAAAGGGAGACATAACAACAGATCCTGAAGAAATCCAAAACACCATCAGATCCTTCTACAAAAGGCTATACTCAACAAAACTGGAAAACCTGGACGAAATGGACAAATTTCTGGACAGATACCAGGTACCAAAGTTGAATCAGGATCAAGTTGACCATCTAAACAGTCCCATATCACCTAAAGAAATAGAAGCAGTTATTAATAGTCTCCCAACCAAAAAAAGCCCAGGACCAGATGGGTTTAGTGCAGAGTTCTATCAGACCTTCAAAGAAGATCTAATTCCAATTCTGCACAAACTATTTCACAAAATAGAAGTAGAAGGTACTCTACCCAACTCATTTTATGAAGCCACTATTACTCTGATACCTAAACCACAGAAAGATCCAACAAAGATAGAGAACTTCAGACCAATTTCTCTTATGAATATCGATGCAAAAATCCTCAATAAAATTCTCGCTAACCGAATCCAAGAACACATTAAAGCAATCATCCATCCTGACCAAGTAGGTTTTATTCCAGGGATGCAGGGATGGTTTAATATACGAAAATCCATCAATGTAATCCATTATATAAACAAACTCAAAGACAAAAACCACATGATCATCTCGTTAGATGCAGAAAAAGCATTTGACAAGATCCAACACCCATTCATGATAAAAGTTTTGGAAAGATCAGGAATTCAAGGCCCATACCTAAACATGATAAAAGCAATCTACAGCAAACCAGTAGCCAACATCAAAGTAAATGGAGAGAAGCTGGAAGCAATCCCACTAAAATCAGGGACTAGACAAGGCTGCCCACTTTCTCCCTACCTTTTCAACATAGTACTTGAAGTATTAGCCAGAGCAATTCGACAACAAAAGGAGATCAAGGGGATACAAATTGGAAAAGAGGAAGTCAAAATATCACTTTTTGCAGATGATATGATAGTATATATAAGTGACCCTAAAAATTCTACCAGAGAACTCCTAAACCTGATAAACAGCTTCGGTGAAGTAGCTGGATATAAAATAAACTCAAACAAGTCAATGGCCTTTCTCTATACAAAGAATAAACAGGCTGAGAAAGAAATTAGGGAAACAACACCCTTCTCAATAGTCACAAATAATATAAAATATCTTGGCGTGACTCTAACTAAGGAGGTGAAAGATCTGTATGATAAAAACTTCAAATCTCTGAAGAAAGAAATTAAAGAAGATCTCAGAAGATGGAAAGATCTCCCATGCTCATGGATTGGCAGGATCAACATTGTAAAAATGGCTATCTTGCCAAAAGCAATCTACAGATTCAATGCAATCCCCATCAAAATTCCAACTCAATTCTTCAACGAATTGGAAGGAGCAATTTGCAAATTTGTCTGGAATAACAAAAAACCTAGGATAGCAAAAAGTCTTCTCAAGGATAAAAGAACTTCTGGCGGAATCACCATGCCAGACCTAAAGCTTTACTACAGAGCAATTGTGATAAAAACTGCATGGTACTGGTATAGAGACAGACAAGTAGACCAATGGAATAGAATTGAAGATCCAGAAATGAACCCACACACCTATGGTCACTTGATCTTCGACAAGGGAGCTAAAACCATCCAGTGGAAGAAAGACAGCATTTTCAACAATTGGTGCTGGCACAACTGGTTGTTATCGTGTAGAAGAATGCGAATCGATCCATACTTATCTCCTTGTACTAAGGTCAAATCTAAGTGGATCAAGGAACTTCACATAAAACCAGAGACACTGAAACTTATAGAGGAGAAAGTGGGGAAAAGCCTTGAAGATATGGGCACAGGGGAAAAATTCCTGAACAGAACAGCAATGGCTTGTGCTGTAAGATCGAGAATCGACAAATGGGACCTAATGAAACTCCAAAGTTTCTGCAAGGCAAAAGACACCGTCAATAAGACAAAAAGACCACCAACAGATTGGGAAAGGATCTTTACCTATCCTAAATCAGATAGGGGACTAATATCCAACATATATAAAGAACTCAAGAAGGTGGACTTCAGAAAATCAAATAACCCCATTAAAAAATGGGGCTCAGAACTGAACAAAGAATTCTCACCTGAGGAATACCGAATGGCAGAGAAGCACTTGAAAAAATGTTCAACATCCTTAATCATCAGGGAAATGCAAATCAAAACAACCCTGAGATTCCACCTCACACCAGTCAGAATGGCTAAGATCAAAAATTCAGGTGACAGCAGATGCTGGCGAGGATGTGGAGAAAGAGGAACACTCCTCCATTGTTGGTGGGAGTGCAGGCTTGTACAACCACTCTGGAAATCAGTCTGGCGGTTCCTCAGAAAACTGGACATAGTACTACCGGAGGATCCAGCAATACCTCTCCTGGGCATATATCCAGAAGATGCCCCAACAGGTAAGAAGGACACATGCTCCACTATGTTCATAGCAGCCTTATTTATAATAGCCAGAAGCTGGAAAGAACCTAGATGCCCCTCAACAGAGGAATGGATACAGAAAATGTGGTACATCTACACAATGGAGTACTACTCAGCTATTAAAAAGAATGAATTTATGAAATTCCTAGCCAAATGGATGGACCTGGAGGGCATCATCCTGAGTGAGGTAACACATTCACAAAGAAACTCACACAATATGTATTCACTGATAAGTGGATATTAGCCCCAAACCTAGGATACCCAAGATATAAGATATAATTTGCTAAACACATGAAACTCAAGGAGAATGAAGACTGAAGTGTGGACACTATGCCCCTCCTTAGATTTGGGAACAAAACACCCATGGAAGGAGTTACAGAGACGGAGTTTGGAGCTGAGATGAAAGGATGGACCATGTAGAGACTGCCATAGCCAGGGATCCACCCCATAATCAGCATCCAAACGCTGACACCATTGCATACACTAGCAAGATTTTATTGAAAGGACGCAGATGTAGCTGTCTCTTGTGAGACTATGCCGGGGCCCAGCAAACACAGAAGTGGATGCTCACAGTCAGCTAATGGATGGATCATAGGGCTCCCAATGGAGGAGCTAGAGAAAGTAGCCAAGGAGCTAAAGGGATCTGCAACCCTATAGGTGGAACAACATTATGAGCTAACCAGTACCCCGGAGCTCTTGACTCTAGCTGCATATATATCAAAAGATGGCCTAGTCGGCCATCACTGGAAAGAGAGGCCCATTGGACTTGCAAACTTTATATGCCCCAGTACAGGGGAATACCAGGGCCAAAAAGGGGGAGTGGGTGGGCAGGGGAGTGGGGGTGGGTGGATATGGGGGACTTTTGGTATAGCATTGGAAATGTAAATGAGTTAAATACCTAATAAAAAATGGAAAAAAA



                  READS
                  @SRR519779.1063095ILLUMINA-8787AF:52:FC:1:29:10379:5418length=36
                  TACCTTGACAGCAGAGTCTTGCCCAAC
                  +SRR519779.1063095ILLUMINA-8787AF:52:FC:1:29:10379:5418length=36
                  IHIIIAIIHHIIIIGIIIGGGHIIDII
                  @SRR519779.2989698ILLUMINA-8787AF:52:FC:1:81:1791:16837length=36
                  ACAGCAGAGTCTTGCCCAACACCCG
                  +SRR519779.2989698ILLUMINA-8787AF:52:FC:1:81:1791:16837length=36
                  IIIIIIIIIGIIIIIIIBIIIIIII


                  These two reads should align to the M base in the reference
                  I did try decreasing the flag to k=7 but that did not change anything
                  Thanks!

                  Comment

                  • lwhitmore
                    Member
                    • Aug 2013
                    • 70

                    #10
                    Brian,
                    Just so you know it is the 8th base in the first read and the first base in the second base that should align to the M.

                    Comment

                    • Brian Bushnell
                      Super Moderator
                      • Jan 2014
                      • 2709

                      #11
                      Leanne,

                      OK, looks like you found a bug The IUPAC characters are being counted as mismatches instead of Ns. I will fix that, and the next version of BBMap I upload will have the fix in place.

                      If you manually change the 'M' to an 'N', then semiperfectmode=T will map to the location with the 'N'.

                      Thanks, you now get your name in the BBMap credits

                      -Brian

                      Comment

                      • lwhitmore
                        Member
                        • Aug 2013
                        • 70

                        #12
                        Ok cool! Glad I could help!
                        If I change the M to an N that will mean any base can align at that base rather than just an A or a C, correct?


                        Let me know when your next version comes out

                        Thanks
                        Leanne

                        Comment

                        • Brian Bushnell
                          Super Moderator
                          • Jan 2014
                          • 2709

                          #13
                          Originally posted by lwhitmore View Post
                          Ok cool! Glad I could help!
                          If I change the M to an N that will mean any base can align at that base rather than just an A or a C, correct?
                          That's correct; I do not know of any aligner that handles the full complement of ambiguity codes as they are defined.

                          Let me know when your next version comes out

                          Thanks
                          Leanne
                          Will do.

                          Comment

                          • Brian Bushnell
                            Super Moderator
                            • Jan 2014
                            • 2709

                            #14
                            Leanne,

                            It's fixed; the new version (32.06) will classify an alignment to an IUPAC code as an 'N' rather than a substitution, so they will be preferred in mapping.

                            Comment

                            • lwhitmore
                              Member
                              • Aug 2013
                              • 70

                              #15
                              ok cool
                              So this might be something with my computer but whenever I try and run the shell script bbmap.sh i get the following message
                              -bash: ./bbmap.sh: /bin/bash^M: bad interpreter: No such file or directory

                              this didn't happen with the last version
                              Thought I would just let you know
                              Thanks again!
                              Leanne

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 10:09 AM
                              0 responses
                              10 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              26 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...