Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • lyw1
    Member
    • Sep 2014
    • 33

    #46
    Hi Brian,

    I download the newest edition (34.44) and test. It showed the same problem.

    Exception in thread "main" java.lang.RuntimeException: Unknown parameter restrictleft=25
    at jgi.BBDukF.<init>(BBDukF.java:427)
    at jgi.BBDukF.main(BBDukF.java:66)

    Thanks,

    Zheng

    Comment

    • Brian Bushnell
      Super Moderator
      • Jan 2014
      • 2709

      #47
      Zheng,

      You appear to have downloaded an old version by mistake. The latest is now 34.22, and there is no 34.44 (only a 33.44); so please download 34.22 - I have verified that it is working correctly.

      Comment

      • lyw1
        Member
        • Sep 2014
        • 33

        #48
        Hi Brian,

        Some sequence has quality problelm. It may have one or two "N" or one nucleotide showed wrongly. Is it possible to count this kind sequence as the correct sequence? For example, aaggctctggattacaggat is the reference sequence. Now, is it possible to count aaggctcNggattacaggat to the same group of the reference sequence?

        Thank you,

        Zheng

        Comment

        • Brian Bushnell
          Super Moderator
          • Jan 2014
          • 2709

          #49
          Hi Zheng,

          Yes, you can do this with the "hdist=1" flag, which will allow up to 1 substitution error.

          -Brian

          Comment

          • lyw1
            Member
            • Sep 2014
            • 33

            #50
            Thanks, Brian.

            Currently the max of hdist is 3. Is there a way to allow hdist=6?

            Comment

            • Brian Bushnell
              Super Moderator
              • Jan 2014
              • 2709

              #51
              Zheng,

              A high value of hdist requires exponentially more time and memory to load the reference, but you can us an arbitrarily high value if you want, by adding the -da flag, which disables assertions.

              For example:

              bbduk.sh in=reads.fq outm=matched.fq ref=ref.fa k=25 hdist=6 -da

              However, the program will generate a factor of hdist^(3*K) additional kmers - for k=25 and hdist=6, taking 177,978,515,625 times as much time and memory to build the index, so that won't work. hdist=4 would probably still work though if you gave it enough memory.

              One alternative is "meet in the middle". I added a new flag "qhdist" which stands for "query hamming distance", which mutates the query kmers instead of the reference kmers. So you would get equivalent results from the above command and this command:

              bbduk.sh in=reads.fq outm=matched.fq ref=ref.fa k=25 hdist=3 qhdist=3

              ...but the second command would require 421,875 times less memory, so it's feasible to run. It might be very slow, though, as it would also run 421,875 times slower than normal.

              There is one last alternative, though - I added support in BBDuk for degenerate bases at specific locations. You can use it like this:

              bbduk.sh in=reads.fq outm=matched.fq literal=NNNNNNCCCCGGGGGTTTTTAAAAA k=25 copyundefined

              With the "copyundefined" flag, a copy of each reference sequence will be made representing every valid combination of defined letter. So instead of increasing memory or time use by 6^75, it only increases them by 4^6 or 4096 which is completely reasonable, but it only allows substitutions at predefined locations. You can use the "copyundefined", "hdist", and "qhdist" flags together for a lot of flexibility - for example, hdist=2 qhdist=1 and 3 Ns in the reference would allow a hamming distance of 6 with much lower resource requirements than hdist=6. Just be sure to give BBDuk as much memory as possible.

              Edit: It might be fun to write something that handles arbitrary motifs with degenerate bases and a set number of mismatches, with no time or memory penalty. Maybe I'll do that; I'll let you know.
              Last edited by Brian Bushnell; 03-26-2015, 05:51 PM.

              Comment

              • lyw1
                Member
                • Sep 2014
                • 33

                #52
                Thanks, Brian. Very helpful.

                Comment

                • lyw1
                  Member
                  • Sep 2014
                  • 33

                  #53
                  comfuse about result

                  Hi Brian,

                  The total sequence length is 26 bp.

                  I used following command to get sequence with starting sequence "CACTTCTATAGT".

                  bbduk.sh -Xmx1g in=7_S7_L001_R1_001.fastq outm=7_S7_L001_R1_001matched.fq outu=7_S7_L001_R1_001unmatched.fastq restrictleft=12 k=12 literal=CACTTCTATAGT

                  Then I counted seqeunce frequency with following command,

                  kmercountexact.sh in=7_S7_L001_R1_001matched.fastq outk=7_S7_L001_R1_001_24new.txt mincount=26 k=26

                  From the first command, I saw every sequence include CACTTCTATAGT at leftest side when I opened 7_S7_L001_R1_001matched.fq

                  However, 7_S7_L001_R1_001_24new.txt result surprised me. Most of them did not include CACTTCTATAGT at its leftest side.

                  >32
                  CCCCCCCCCCCGCCACTATAGAAGTG
                  >30
                  TCCGGGGCTTCCCCACTATAGAAGTG
                  >72
                  CACTTCTATAGTGGGGAACCACCGTT
                  >101
                  CACTTCTATAGTGGGGATTTCCCCTT
                  >36
                  TGGGGAATTCCCTAACTATAGAAGTG
                  >40
                  TTGGGGAAGCCCCCACTATAGAAGTG
                  >33
                  TATGGGGAATCCCCACTATAGAAGTG
                  >47
                  TGGGGCTTTCCCATACTATAGAAGTG
                  >46
                  TAGGGGATTTCCCTACTATAGAAGTG
                  >27
                  CACTTCTATAGTGGGGAAGTACCGAT
                  >28
                  CACTTCTATAGTCGGTGGTTCCCCTT
                  >34
                  TCCCTTCTATTTCTACTATAGAAGTG
                  >231
                  TAGGTGCTTCCCCTACTATAGAAGTG
                  >52
                  TTGGGGGAAACCCCACTATAGAAGTG
                  >59
                  TCGGGATTTCCCCAACTATAGAAGTG
                  >43
                  TCGGGAAAACCCCCACTATAGAAGTG
                  >45
                  TTGGGATTTCCCCCACTATAGAAGTG
                  >58
                  CACTTCTATAGTGGGGAATCCCCCTT
                  >26
                  TCCCCCACCGTCCCACTATAGAAGTG
                  >61
                  CACTTCTATAGTAGGGAAACCACCTT
                  >26
                  TCCCCACTGGAGCTACTATAGAAGTG
                  >65
                  CGGGAATCCCCTAAACTATAGAAGTG
                  >46
                  TCGGGGATGCACCTACTATAGAAGTG

                  Please help. Thanks.

                  Zheng

                  Comment

                  • Brian Bushnell
                    Super Moderator
                    • Jan 2014
                    • 2709

                    #54
                    Hi Zheng,

                    There are 3 compounding issues here that make things a little confusing. First, BBDuk, by default looks for both a kmer and its reverse-complement. You can disable this behavior with "rcomp=f" (though that probably won't affect anything in this case). Second, BBDuk also ignores the middle base of a kmer in order to increase sensitivity. You can disable this with "mm=f" (which stands for maskmiddle). Lastly, and most important in this case, kmercountexact stores only a kmer or its reverse complement, whichever is alphabetically higher. I forgot to add that to the help, but this too can be disabled with "rcomp=f". This shouldn't change the counts in your case, but it will make the orientation of the output the same as the input.

                    It looks like all of the sequences you posted either start with "CACTTCTATAGT" or end with "ACTATAGAAGTG", which means the 26-mer was stored as its reverse-complement.
                    Last edited by Brian Bushnell; 04-27-2015, 07:08 PM.

                    Comment

                    • lyw1
                      Member
                      • Sep 2014
                      • 33

                      #55
                      Thanks, Brian.

                      for command,
                      bbduk.sh -Xmx1g in=7_S7_L001_R1_001matched.fq out=7_S7_L001_R1_001matched_trimmed.fq ftr=2
                      The out file has nothing. ftr=10 and it works. Some sequences has a couple N at right side because of poor quality. How can I remove 2 bp at right side of the sequence?

                      Zheng

                      Comment

                      • Brian Bushnell
                        Super Moderator
                        • Jan 2014
                        • 2709

                        #56
                        Hi Zheng,

                        "ftr=2" will remove all bases after position 2 in the read (zero-based), meaning all reads will end up 3bp long, which will make them get discarded due to the default minlength=10. To remove the last 2 bases, you need "ftr2=2" instead. And you can set "minlength=0" if you don't want any reads to be discarded.

                        -Brian

                        Comment

                        • lyw1
                          Member
                          • Sep 2014
                          • 33

                          #57
                          Hi Brian,

                          Executing jgi.BBDukF [-Xmx1g, in=7_S7_L001_R1_001matched.fq, out=7_S7_L001_R1_001matched_trimmed.fq, ftr2=2]

                          Exception in thread "main" java.lang.RuntimeException: Unknown parameter ftr2=2
                          at jgi.BBDukF.<init>(BBDukF.java:392)
                          at jgi.BBDukF.main(BBDukF.java:62)

                          Is anything wrong here?

                          It seems it works to remove last 2 bp from 26 sequence by ftr=23?

                          Thanks,

                          Comment

                          • Brian Bushnell
                            Super Moderator
                            • Jan 2014
                            • 2709

                            #58
                            Zheng,

                            ftr=23 would work. The reason ftr2=2 failed is because you have an old version of BBTools; I added it fairly recently.

                            -Brian

                            Comment

                            • lyw1
                              Member
                              • Sep 2014
                              • 33

                              #59
                              Thanks, Brian. I test new version of BB Tools and it works.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM
                              • seqadmin
                                Investigating the Gut Microbiome Through Diet and Spatial Biology
                                by seqadmin




                                The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                                02-24-2025, 06:31 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Today, 05:03 AM
                              0 responses
                              15 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 07:27 AM
                              0 responses
                              13 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              15 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-03-2025, 01:15 PM
                              0 responses
                              185 views
                              0 reactions
                              Last Post seqadmin  
                              Working...