Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • lyw1
    replied
    Thanks, Brian. I test new version of BB Tools and it works.

    Leave a comment:


  • Brian Bushnell
    replied
    Zheng,

    ftr=23 would work. The reason ftr2=2 failed is because you have an old version of BBTools; I added it fairly recently.

    -Brian

    Leave a comment:


  • lyw1
    replied
    Hi Brian,

    Executing jgi.BBDukF [-Xmx1g, in=7_S7_L001_R1_001matched.fq, out=7_S7_L001_R1_001matched_trimmed.fq, ftr2=2]

    Exception in thread "main" java.lang.RuntimeException: Unknown parameter ftr2=2
    at jgi.BBDukF.<init>(BBDukF.java:392)
    at jgi.BBDukF.main(BBDukF.java:62)

    Is anything wrong here?

    It seems it works to remove last 2 bp from 26 sequence by ftr=23?

    Thanks,

    Leave a comment:


  • Brian Bushnell
    replied
    Hi Zheng,

    "ftr=2" will remove all bases after position 2 in the read (zero-based), meaning all reads will end up 3bp long, which will make them get discarded due to the default minlength=10. To remove the last 2 bases, you need "ftr2=2" instead. And you can set "minlength=0" if you don't want any reads to be discarded.

    -Brian

    Leave a comment:


  • lyw1
    replied
    Thanks, Brian.

    for command,
    bbduk.sh -Xmx1g in=7_S7_L001_R1_001matched.fq out=7_S7_L001_R1_001matched_trimmed.fq ftr=2
    The out file has nothing. ftr=10 and it works. Some sequences has a couple N at right side because of poor quality. How can I remove 2 bp at right side of the sequence?

    Zheng

    Leave a comment:


  • Brian Bushnell
    replied
    Hi Zheng,

    There are 3 compounding issues here that make things a little confusing. First, BBDuk, by default looks for both a kmer and its reverse-complement. You can disable this behavior with "rcomp=f" (though that probably won't affect anything in this case). Second, BBDuk also ignores the middle base of a kmer in order to increase sensitivity. You can disable this with "mm=f" (which stands for maskmiddle). Lastly, and most important in this case, kmercountexact stores only a kmer or its reverse complement, whichever is alphabetically higher. I forgot to add that to the help, but this too can be disabled with "rcomp=f". This shouldn't change the counts in your case, but it will make the orientation of the output the same as the input.

    It looks like all of the sequences you posted either start with "CACTTCTATAGT" or end with "ACTATAGAAGTG", which means the 26-mer was stored as its reverse-complement.
    Last edited by Brian Bushnell; 04-27-2015, 07:08 PM.

    Leave a comment:


  • lyw1
    replied
    comfuse about result

    Hi Brian,

    The total sequence length is 26 bp.

    I used following command to get sequence with starting sequence "CACTTCTATAGT".

    bbduk.sh -Xmx1g in=7_S7_L001_R1_001.fastq outm=7_S7_L001_R1_001matched.fq outu=7_S7_L001_R1_001unmatched.fastq restrictleft=12 k=12 literal=CACTTCTATAGT

    Then I counted seqeunce frequency with following command,

    kmercountexact.sh in=7_S7_L001_R1_001matched.fastq outk=7_S7_L001_R1_001_24new.txt mincount=26 k=26

    From the first command, I saw every sequence include CACTTCTATAGT at leftest side when I opened 7_S7_L001_R1_001matched.fq

    However, 7_S7_L001_R1_001_24new.txt result surprised me. Most of them did not include CACTTCTATAGT at its leftest side.

    >32
    CCCCCCCCCCCGCCACTATAGAAGTG
    >30
    TCCGGGGCTTCCCCACTATAGAAGTG
    >72
    CACTTCTATAGTGGGGAACCACCGTT
    >101
    CACTTCTATAGTGGGGATTTCCCCTT
    >36
    TGGGGAATTCCCTAACTATAGAAGTG
    >40
    TTGGGGAAGCCCCCACTATAGAAGTG
    >33
    TATGGGGAATCCCCACTATAGAAGTG
    >47
    TGGGGCTTTCCCATACTATAGAAGTG
    >46
    TAGGGGATTTCCCTACTATAGAAGTG
    >27
    CACTTCTATAGTGGGGAAGTACCGAT
    >28
    CACTTCTATAGTCGGTGGTTCCCCTT
    >34
    TCCCTTCTATTTCTACTATAGAAGTG
    >231
    TAGGTGCTTCCCCTACTATAGAAGTG
    >52
    TTGGGGGAAACCCCACTATAGAAGTG
    >59
    TCGGGATTTCCCCAACTATAGAAGTG
    >43
    TCGGGAAAACCCCCACTATAGAAGTG
    >45
    TTGGGATTTCCCCCACTATAGAAGTG
    >58
    CACTTCTATAGTGGGGAATCCCCCTT
    >26
    TCCCCCACCGTCCCACTATAGAAGTG
    >61
    CACTTCTATAGTAGGGAAACCACCTT
    >26
    TCCCCACTGGAGCTACTATAGAAGTG
    >65
    CGGGAATCCCCTAAACTATAGAAGTG
    >46
    TCGGGGATGCACCTACTATAGAAGTG

    Please help. Thanks.

    Zheng

    Leave a comment:


  • lyw1
    replied
    Thanks, Brian. Very helpful.

    Leave a comment:


  • Brian Bushnell
    replied
    Zheng,

    A high value of hdist requires exponentially more time and memory to load the reference, but you can us an arbitrarily high value if you want, by adding the -da flag, which disables assertions.

    For example:

    bbduk.sh in=reads.fq outm=matched.fq ref=ref.fa k=25 hdist=6 -da

    However, the program will generate a factor of hdist^(3*K) additional kmers - for k=25 and hdist=6, taking 177,978,515,625 times as much time and memory to build the index, so that won't work. hdist=4 would probably still work though if you gave it enough memory.

    One alternative is "meet in the middle". I added a new flag "qhdist" which stands for "query hamming distance", which mutates the query kmers instead of the reference kmers. So you would get equivalent results from the above command and this command:

    bbduk.sh in=reads.fq outm=matched.fq ref=ref.fa k=25 hdist=3 qhdist=3

    ...but the second command would require 421,875 times less memory, so it's feasible to run. It might be very slow, though, as it would also run 421,875 times slower than normal.

    There is one last alternative, though - I added support in BBDuk for degenerate bases at specific locations. You can use it like this:

    bbduk.sh in=reads.fq outm=matched.fq literal=NNNNNNCCCCGGGGGTTTTTAAAAA k=25 copyundefined

    With the "copyundefined" flag, a copy of each reference sequence will be made representing every valid combination of defined letter. So instead of increasing memory or time use by 6^75, it only increases them by 4^6 or 4096 which is completely reasonable, but it only allows substitutions at predefined locations. You can use the "copyundefined", "hdist", and "qhdist" flags together for a lot of flexibility - for example, hdist=2 qhdist=1 and 3 Ns in the reference would allow a hamming distance of 6 with much lower resource requirements than hdist=6. Just be sure to give BBDuk as much memory as possible.

    Edit: It might be fun to write something that handles arbitrary motifs with degenerate bases and a set number of mismatches, with no time or memory penalty. Maybe I'll do that; I'll let you know.
    Last edited by Brian Bushnell; 03-26-2015, 05:51 PM.

    Leave a comment:


  • lyw1
    replied
    Thanks, Brian.

    Currently the max of hdist is 3. Is there a way to allow hdist=6?

    Leave a comment:


  • Brian Bushnell
    replied
    Hi Zheng,

    Yes, you can do this with the "hdist=1" flag, which will allow up to 1 substitution error.

    -Brian

    Leave a comment:


  • lyw1
    replied
    Hi Brian,

    Some sequence has quality problelm. It may have one or two "N" or one nucleotide showed wrongly. Is it possible to count this kind sequence as the correct sequence? For example, aaggctctggattacaggat is the reference sequence. Now, is it possible to count aaggctcNggattacaggat to the same group of the reference sequence?

    Thank you,

    Zheng

    Leave a comment:


  • Brian Bushnell
    replied
    Zheng,

    You appear to have downloaded an old version by mistake. The latest is now 34.22, and there is no 34.44 (only a 33.44); so please download 34.22 - I have verified that it is working correctly.

    Leave a comment:


  • lyw1
    replied
    Hi Brian,

    I download the newest edition (34.44) and test. It showed the same problem.

    Exception in thread "main" java.lang.RuntimeException: Unknown parameter restrictleft=25
    at jgi.BBDukF.<init>(BBDukF.java:427)
    at jgi.BBDukF.main(BBDukF.java:66)

    Thanks,

    Zheng

    Leave a comment:


  • Brian Bushnell
    replied
    Zheng,

    Are you using the latest version (34.19) of BBTools? I added "restrictleft" and "restrictright" only a few weeks ago, so an old version won't support those flags...

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Multiomics Techniques Advancing Disease Research
    by seqadmin


    New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

    A major leap in the field has
    ...
    02-08-2024, 06:33 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 04:11 PM
1 response
19 views
0 likes
Last Post kim897
by kim897
 
Started by seqadmin, 02-21-2024, 08:52 AM
0 responses
35 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-20-2024, 08:57 AM
0 responses
24 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-14-2024, 09:19 AM
0 responses
57 views
0 likes
Last Post seqadmin  
Working...
X