Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    Hi Brian,

    I download the newest edition (34.44) and test. It showed the same problem.

    Exception in thread "main" java.lang.RuntimeException: Unknown parameter restrictleft=25
    at jgi.BBDukF.<init>(BBDukF.java:427)
    at jgi.BBDukF.main(BBDukF.java:66)

    Thanks,

    Zheng

    Comment


    • #47
      Zheng,

      You appear to have downloaded an old version by mistake. The latest is now 34.22, and there is no 34.44 (only a 33.44); so please download 34.22 - I have verified that it is working correctly.

      Comment


      • #48
        Hi Brian,

        Some sequence has quality problelm. It may have one or two "N" or one nucleotide showed wrongly. Is it possible to count this kind sequence as the correct sequence? For example, aaggctctggattacaggat is the reference sequence. Now, is it possible to count aaggctcNggattacaggat to the same group of the reference sequence?

        Thank you,

        Zheng

        Comment


        • #49
          Hi Zheng,

          Yes, you can do this with the "hdist=1" flag, which will allow up to 1 substitution error.

          -Brian

          Comment


          • #50
            Thanks, Brian.

            Currently the max of hdist is 3. Is there a way to allow hdist=6?

            Comment


            • #51
              Zheng,

              A high value of hdist requires exponentially more time and memory to load the reference, but you can us an arbitrarily high value if you want, by adding the -da flag, which disables assertions.

              For example:

              bbduk.sh in=reads.fq outm=matched.fq ref=ref.fa k=25 hdist=6 -da

              However, the program will generate a factor of hdist^(3*K) additional kmers - for k=25 and hdist=6, taking 177,978,515,625 times as much time and memory to build the index, so that won't work. hdist=4 would probably still work though if you gave it enough memory.

              One alternative is "meet in the middle". I added a new flag "qhdist" which stands for "query hamming distance", which mutates the query kmers instead of the reference kmers. So you would get equivalent results from the above command and this command:

              bbduk.sh in=reads.fq outm=matched.fq ref=ref.fa k=25 hdist=3 qhdist=3

              ...but the second command would require 421,875 times less memory, so it's feasible to run. It might be very slow, though, as it would also run 421,875 times slower than normal.

              There is one last alternative, though - I added support in BBDuk for degenerate bases at specific locations. You can use it like this:

              bbduk.sh in=reads.fq outm=matched.fq literal=NNNNNNCCCCGGGGGTTTTTAAAAA k=25 copyundefined

              With the "copyundefined" flag, a copy of each reference sequence will be made representing every valid combination of defined letter. So instead of increasing memory or time use by 6^75, it only increases them by 4^6 or 4096 which is completely reasonable, but it only allows substitutions at predefined locations. You can use the "copyundefined", "hdist", and "qhdist" flags together for a lot of flexibility - for example, hdist=2 qhdist=1 and 3 Ns in the reference would allow a hamming distance of 6 with much lower resource requirements than hdist=6. Just be sure to give BBDuk as much memory as possible.

              Edit: It might be fun to write something that handles arbitrary motifs with degenerate bases and a set number of mismatches, with no time or memory penalty. Maybe I'll do that; I'll let you know.
              Last edited by Brian Bushnell; 03-26-2015, 05:51 PM.

              Comment


              • #52
                Thanks, Brian. Very helpful.

                Comment


                • #53
                  comfuse about result

                  Hi Brian,

                  The total sequence length is 26 bp.

                  I used following command to get sequence with starting sequence "CACTTCTATAGT".

                  bbduk.sh -Xmx1g in=7_S7_L001_R1_001.fastq outm=7_S7_L001_R1_001matched.fq outu=7_S7_L001_R1_001unmatched.fastq restrictleft=12 k=12 literal=CACTTCTATAGT

                  Then I counted seqeunce frequency with following command,

                  kmercountexact.sh in=7_S7_L001_R1_001matched.fastq outk=7_S7_L001_R1_001_24new.txt mincount=26 k=26

                  From the first command, I saw every sequence include CACTTCTATAGT at leftest side when I opened 7_S7_L001_R1_001matched.fq

                  However, 7_S7_L001_R1_001_24new.txt result surprised me. Most of them did not include CACTTCTATAGT at its leftest side.

                  >32
                  CCCCCCCCCCCGCCACTATAGAAGTG
                  >30
                  TCCGGGGCTTCCCCACTATAGAAGTG
                  >72
                  CACTTCTATAGTGGGGAACCACCGTT
                  >101
                  CACTTCTATAGTGGGGATTTCCCCTT
                  >36
                  TGGGGAATTCCCTAACTATAGAAGTG
                  >40
                  TTGGGGAAGCCCCCACTATAGAAGTG
                  >33
                  TATGGGGAATCCCCACTATAGAAGTG
                  >47
                  TGGGGCTTTCCCATACTATAGAAGTG
                  >46
                  TAGGGGATTTCCCTACTATAGAAGTG
                  >27
                  CACTTCTATAGTGGGGAAGTACCGAT
                  >28
                  CACTTCTATAGTCGGTGGTTCCCCTT
                  >34
                  TCCCTTCTATTTCTACTATAGAAGTG
                  >231
                  TAGGTGCTTCCCCTACTATAGAAGTG
                  >52
                  TTGGGGGAAACCCCACTATAGAAGTG
                  >59
                  TCGGGATTTCCCCAACTATAGAAGTG
                  >43
                  TCGGGAAAACCCCCACTATAGAAGTG
                  >45
                  TTGGGATTTCCCCCACTATAGAAGTG
                  >58
                  CACTTCTATAGTGGGGAATCCCCCTT
                  >26
                  TCCCCCACCGTCCCACTATAGAAGTG
                  >61
                  CACTTCTATAGTAGGGAAACCACCTT
                  >26
                  TCCCCACTGGAGCTACTATAGAAGTG
                  >65
                  CGGGAATCCCCTAAACTATAGAAGTG
                  >46
                  TCGGGGATGCACCTACTATAGAAGTG

                  Please help. Thanks.

                  Zheng

                  Comment


                  • #54
                    Hi Zheng,

                    There are 3 compounding issues here that make things a little confusing. First, BBDuk, by default looks for both a kmer and its reverse-complement. You can disable this behavior with "rcomp=f" (though that probably won't affect anything in this case). Second, BBDuk also ignores the middle base of a kmer in order to increase sensitivity. You can disable this with "mm=f" (which stands for maskmiddle). Lastly, and most important in this case, kmercountexact stores only a kmer or its reverse complement, whichever is alphabetically higher. I forgot to add that to the help, but this too can be disabled with "rcomp=f". This shouldn't change the counts in your case, but it will make the orientation of the output the same as the input.

                    It looks like all of the sequences you posted either start with "CACTTCTATAGT" or end with "ACTATAGAAGTG", which means the 26-mer was stored as its reverse-complement.
                    Last edited by Brian Bushnell; 04-27-2015, 07:08 PM.

                    Comment


                    • #55
                      Thanks, Brian.

                      for command,
                      bbduk.sh -Xmx1g in=7_S7_L001_R1_001matched.fq out=7_S7_L001_R1_001matched_trimmed.fq ftr=2
                      The out file has nothing. ftr=10 and it works. Some sequences has a couple N at right side because of poor quality. How can I remove 2 bp at right side of the sequence?

                      Zheng

                      Comment


                      • #56
                        Hi Zheng,

                        "ftr=2" will remove all bases after position 2 in the read (zero-based), meaning all reads will end up 3bp long, which will make them get discarded due to the default minlength=10. To remove the last 2 bases, you need "ftr2=2" instead. And you can set "minlength=0" if you don't want any reads to be discarded.

                        -Brian

                        Comment


                        • #57
                          Hi Brian,

                          Executing jgi.BBDukF [-Xmx1g, in=7_S7_L001_R1_001matched.fq, out=7_S7_L001_R1_001matched_trimmed.fq, ftr2=2]

                          Exception in thread "main" java.lang.RuntimeException: Unknown parameter ftr2=2
                          at jgi.BBDukF.<init>(BBDukF.java:392)
                          at jgi.BBDukF.main(BBDukF.java:62)

                          Is anything wrong here?

                          It seems it works to remove last 2 bp from 26 sequence by ftr=23?

                          Thanks,

                          Comment


                          • #58
                            Zheng,

                            ftr=23 would work. The reason ftr2=2 failed is because you have an old version of BBTools; I added it fairly recently.

                            -Brian

                            Comment


                            • #59
                              Thanks, Brian. I test new version of BB Tools and it works.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                Yesterday, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              58 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              54 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              45 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              55 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X