Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    Hi Brian,

    I download the newest edition (34.44) and test. It showed the same problem.

    Exception in thread "main" java.lang.RuntimeException: Unknown parameter restrictleft=25
    at jgi.BBDukF.<init>(BBDukF.java:427)
    at jgi.BBDukF.main(BBDukF.java:66)

    Thanks,

    Zheng

    Comment


    • #47
      Zheng,

      You appear to have downloaded an old version by mistake. The latest is now 34.22, and there is no 34.44 (only a 33.44); so please download 34.22 - I have verified that it is working correctly.

      Comment


      • #48
        Hi Brian,

        Some sequence has quality problelm. It may have one or two "N" or one nucleotide showed wrongly. Is it possible to count this kind sequence as the correct sequence? For example, aaggctctggattacaggat is the reference sequence. Now, is it possible to count aaggctcNggattacaggat to the same group of the reference sequence?

        Thank you,

        Zheng

        Comment


        • #49
          Hi Zheng,

          Yes, you can do this with the "hdist=1" flag, which will allow up to 1 substitution error.

          -Brian

          Comment


          • #50
            Thanks, Brian.

            Currently the max of hdist is 3. Is there a way to allow hdist=6?

            Comment


            • #51
              Zheng,

              A high value of hdist requires exponentially more time and memory to load the reference, but you can us an arbitrarily high value if you want, by adding the -da flag, which disables assertions.

              For example:

              bbduk.sh in=reads.fq outm=matched.fq ref=ref.fa k=25 hdist=6 -da

              However, the program will generate a factor of hdist^(3*K) additional kmers - for k=25 and hdist=6, taking 177,978,515,625 times as much time and memory to build the index, so that won't work. hdist=4 would probably still work though if you gave it enough memory.

              One alternative is "meet in the middle". I added a new flag "qhdist" which stands for "query hamming distance", which mutates the query kmers instead of the reference kmers. So you would get equivalent results from the above command and this command:

              bbduk.sh in=reads.fq outm=matched.fq ref=ref.fa k=25 hdist=3 qhdist=3

              ...but the second command would require 421,875 times less memory, so it's feasible to run. It might be very slow, though, as it would also run 421,875 times slower than normal.

              There is one last alternative, though - I added support in BBDuk for degenerate bases at specific locations. You can use it like this:

              bbduk.sh in=reads.fq outm=matched.fq literal=NNNNNNCCCCGGGGGTTTTTAAAAA k=25 copyundefined

              With the "copyundefined" flag, a copy of each reference sequence will be made representing every valid combination of defined letter. So instead of increasing memory or time use by 6^75, it only increases them by 4^6 or 4096 which is completely reasonable, but it only allows substitutions at predefined locations. You can use the "copyundefined", "hdist", and "qhdist" flags together for a lot of flexibility - for example, hdist=2 qhdist=1 and 3 Ns in the reference would allow a hamming distance of 6 with much lower resource requirements than hdist=6. Just be sure to give BBDuk as much memory as possible.

              Edit: It might be fun to write something that handles arbitrary motifs with degenerate bases and a set number of mismatches, with no time or memory penalty. Maybe I'll do that; I'll let you know.
              Last edited by Brian Bushnell; 03-26-2015, 05:51 PM.

              Comment


              • #52
                Thanks, Brian. Very helpful.

                Comment


                • #53
                  comfuse about result

                  Hi Brian,

                  The total sequence length is 26 bp.

                  I used following command to get sequence with starting sequence "CACTTCTATAGT".

                  bbduk.sh -Xmx1g in=7_S7_L001_R1_001.fastq outm=7_S7_L001_R1_001matched.fq outu=7_S7_L001_R1_001unmatched.fastq restrictleft=12 k=12 literal=CACTTCTATAGT

                  Then I counted seqeunce frequency with following command,

                  kmercountexact.sh in=7_S7_L001_R1_001matched.fastq outk=7_S7_L001_R1_001_24new.txt mincount=26 k=26

                  From the first command, I saw every sequence include CACTTCTATAGT at leftest side when I opened 7_S7_L001_R1_001matched.fq

                  However, 7_S7_L001_R1_001_24new.txt result surprised me. Most of them did not include CACTTCTATAGT at its leftest side.

                  >32
                  CCCCCCCCCCCGCCACTATAGAAGTG
                  >30
                  TCCGGGGCTTCCCCACTATAGAAGTG
                  >72
                  CACTTCTATAGTGGGGAACCACCGTT
                  >101
                  CACTTCTATAGTGGGGATTTCCCCTT
                  >36
                  TGGGGAATTCCCTAACTATAGAAGTG
                  >40
                  TTGGGGAAGCCCCCACTATAGAAGTG
                  >33
                  TATGGGGAATCCCCACTATAGAAGTG
                  >47
                  TGGGGCTTTCCCATACTATAGAAGTG
                  >46
                  TAGGGGATTTCCCTACTATAGAAGTG
                  >27
                  CACTTCTATAGTGGGGAAGTACCGAT
                  >28
                  CACTTCTATAGTCGGTGGTTCCCCTT
                  >34
                  TCCCTTCTATTTCTACTATAGAAGTG
                  >231
                  TAGGTGCTTCCCCTACTATAGAAGTG
                  >52
                  TTGGGGGAAACCCCACTATAGAAGTG
                  >59
                  TCGGGATTTCCCCAACTATAGAAGTG
                  >43
                  TCGGGAAAACCCCCACTATAGAAGTG
                  >45
                  TTGGGATTTCCCCCACTATAGAAGTG
                  >58
                  CACTTCTATAGTGGGGAATCCCCCTT
                  >26
                  TCCCCCACCGTCCCACTATAGAAGTG
                  >61
                  CACTTCTATAGTAGGGAAACCACCTT
                  >26
                  TCCCCACTGGAGCTACTATAGAAGTG
                  >65
                  CGGGAATCCCCTAAACTATAGAAGTG
                  >46
                  TCGGGGATGCACCTACTATAGAAGTG

                  Please help. Thanks.

                  Zheng

                  Comment


                  • #54
                    Hi Zheng,

                    There are 3 compounding issues here that make things a little confusing. First, BBDuk, by default looks for both a kmer and its reverse-complement. You can disable this behavior with "rcomp=f" (though that probably won't affect anything in this case). Second, BBDuk also ignores the middle base of a kmer in order to increase sensitivity. You can disable this with "mm=f" (which stands for maskmiddle). Lastly, and most important in this case, kmercountexact stores only a kmer or its reverse complement, whichever is alphabetically higher. I forgot to add that to the help, but this too can be disabled with "rcomp=f". This shouldn't change the counts in your case, but it will make the orientation of the output the same as the input.

                    It looks like all of the sequences you posted either start with "CACTTCTATAGT" or end with "ACTATAGAAGTG", which means the 26-mer was stored as its reverse-complement.
                    Last edited by Brian Bushnell; 04-27-2015, 07:08 PM.

                    Comment


                    • #55
                      Thanks, Brian.

                      for command,
                      bbduk.sh -Xmx1g in=7_S7_L001_R1_001matched.fq out=7_S7_L001_R1_001matched_trimmed.fq ftr=2
                      The out file has nothing. ftr=10 and it works. Some sequences has a couple N at right side because of poor quality. How can I remove 2 bp at right side of the sequence?

                      Zheng

                      Comment


                      • #56
                        Hi Zheng,

                        "ftr=2" will remove all bases after position 2 in the read (zero-based), meaning all reads will end up 3bp long, which will make them get discarded due to the default minlength=10. To remove the last 2 bases, you need "ftr2=2" instead. And you can set "minlength=0" if you don't want any reads to be discarded.

                        -Brian

                        Comment


                        • #57
                          Hi Brian,

                          Executing jgi.BBDukF [-Xmx1g, in=7_S7_L001_R1_001matched.fq, out=7_S7_L001_R1_001matched_trimmed.fq, ftr2=2]

                          Exception in thread "main" java.lang.RuntimeException: Unknown parameter ftr2=2
                          at jgi.BBDukF.<init>(BBDukF.java:392)
                          at jgi.BBDukF.main(BBDukF.java:62)

                          Is anything wrong here?

                          It seems it works to remove last 2 bp from 26 sequence by ftr=23?

                          Thanks,

                          Comment


                          • #58
                            Zheng,

                            ftr=23 would work. The reason ftr2=2 failed is because you have an old version of BBTools; I added it fairly recently.

                            -Brian

                            Comment


                            • #59
                              Thanks, Brian. I test new version of BB Tools and it works.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Understanding Genetic Influence on Infectious Disease
                                by seqadmin




                                During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                                Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                                09-09-2024, 10:59 AM
                              • seqadmin
                                Addressing Off-Target Effects in CRISPR Technologies
                                by seqadmin






                                The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                                08-27-2024, 04:44 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Today, 02:44 PM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 09-06-2024, 08:02 AM
                              0 responses
                              143 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 09-03-2024, 08:30 AM
                              0 responses
                              150 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 08-27-2024, 04:40 AM
                              0 responses
                              158 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X