Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hi Brian,

    I have started using BBDuk as it seemed more well documented and supported than alternatives, and alternatives did not function as I expected. Thanks for writing these tools.

    Earlier in the thread you discuss kmer lengths:
    Originally posted by Brian Bushnell View Post
    If you want to determine the ideal kmer length, it may be useful to characterize the error rates of your data, or quantify your true and false positive assignment rates based on filtering when using synthetic reads with their origin tagged. "randomreads.sh" generates and tags reads from reference, with adjustable error rates; I use it a lot to validate approaches and determine optimal settings.
    I am wondering how I might use this in my context. I have methylation data that suffers from read-through into adapter. Basically the insert sizes are small due to fragmented nature of the DNA, so there is a lot of adapter contamination. I have tested a few values for kmer (15,20,25). Instead of this arbitrary choosing, could you give me an example of what 'error rates' I can use in my context? I have a downstream plot to essentially determine by eye if trimming of adapter is successful, but it would be nice to explain how we arrive at the kmer value which resolves our issue.

    Thanks again for the tools,

    Bruce.

    Comment


    • Hi Bruce,

      I don't know an easy way to find the true error rate of bisulfite treated reads. But if you assume the quality values are correct, you can do this, starting with your real reads:

      addadapters.sh in=reads.fq right literal=TCGGATAAGGCGCTCGCGCCGCATCCGACAAATGTGTTCAGCGA rate=0.5 adderrors out=x.fq reads=1m

      This will add adapters to the reads and rename them to indicate where they were placed. Then, errors will be added to the adapters based on the quality value at each position. The sequence I specified is just a random piece of E.coli; don't use real adapter sequence, which would interfere with the test. Next, trim with BBDuk:

      bbduk.sh in=x.fq out=y.fq literal=TCGGATAAGGCGCTCGCGCCGCATCCGACAAATGTGTTCAGCGA k=23 mink=11 hdist=1

      Then grade the result:

      addadapters.sh grade in=y.fq


      It will print something like this:

      Code:
      Total output:                           3076 reads                      244458 bases
      Perfectly Correct (% of output):        2765 reads (89.889%)            213358 bases (87.278%)
      Incorrect (% of output):                311 reads (10.111%)             31100 bases (12.722%)
      
      Adapters Remaining (% of adapters):     311 reads (21.917%)             3518 bases (1.439%)
      Non-Adapter Removed (% of valid):       0 reads (0.0000%)               0 bases (0.0000%)
      Note that this will only test kmer-based trimming. In practice, for paired reads, I recommend adding the flags "tbo tpe" which will extinguish virtually all adapter sequence, but those are hard to test on real data since you don't know the insert size. They can be tested on synthetic data, though.

      At any rate, this will allow you to determine the expected true- and false-positive rates for your data for a given value of k and hdist.

      Comment


      • Hi Brian,

        thanks for the reply. I have gone about testing different kmer to see what they result in in terms of retained bases. My thinking was that lower kmer should result in retaining less sequence, and at the point that the amount of sequence retained plateaus is a good kmer value for the data. As it stands this was a reasonable assumption (see attached plot).

        However I have neglected 'mink' in this regard. I had determined a mink=5 based on not calling methylation events from within 5bp of 5', 3' which is standard given there is extreme fluctuation of quality at ends of reads. But does mink actually mean that you can reduce kmer to 'mink' bp at ends of reads for trimming? I.e. if mink=10, then adapter in the final 10 bp of a read can be trimmed, but <10bp is untrimmed? I have tested one sample (12T1-10) for this and we see marginal increase (~1.7% bases more). Are these therefore more likely adapter?

        Sorry if this is obvious, just want to check method before running all samples.

        Thanks,

        Bruce.
        Attached Files
        Last edited by bruce01; 07-30-2015, 07:57 AM. Reason: readability

        Comment


        • Originally posted by bruce01 View Post
          But does mink actually mean that you can reduce kmer to 'mink' bp at ends of reads for trimming? I.e. if mink=10, then adapter in the final 10 bp of a read can be trimmed, but <10bp is untrimmed?
          That's correct. For "ktrim=r k=20 mink=10", a read will be trimmed if there is a 20-mer match anywhere in the read; but at the rightmost end of the read, it will also look for a 19-mer match, 18-mer match, etc. down to 10-mer but not below. So if the last 10bp are adapter they will be trimmed, but if only the last 9bp are adapter, they won't.

          I have tested one sample (12T1-10) for this and we see marginal increase (~1.7% bases more). Are these therefore more likely adapter?
          On your graph, 12T1-10 retains more than 12T1 (for longer kmer lengths). This is a bit counter-intuitive - normally, setting mink below k should strictly result in fewer bases retained - but the reason is due to the "mm" (maskmiddle) flag. When mink is set, this flag gets disabled, which reduces sensitivity slightly for full-length kmers. As a result, even though sensitivity is being increased at the end of the read, full-length kmer hits in the middle of the read are being missed due to errors. You can recover this sensitivity by increasing hdist by 1 to allow an additional mismatch. For an apples-to-apples comparison, you could run 12T1 and 12T1-10 both with the "mm=f" flag.

          I'm not sure what's going on at and below k=10 when mink=10; mink is supposed to always be less than k, so those configurations are untested. And double-checking my code, mink does in fact get disabled when mink>=k, so the results of 12T1-10 and 12T1 should be identical at K=10 or less.

          Judging by the effects of disabling "mm", it looks like this data has a sufficient error rate that it would benefit from increasing hdist; perhaps "k=21 hdist=2".
          Last edited by Brian Bushnell; 07-30-2015, 09:13 AM.

          Comment


          • Hi Brian,

            to clarify, I did have mink=5 for the rest of the runs based on my previous assertion that SNP, methylation would not be called in the first/final 5bp of a read, so the 'mm' would have been disabled for both. This should result in fewer bases between mink=5 vs. mink=10, based on missing those adapters that are <10, versus <5. Right?

            I will test with an extra mismatch based on expecting increased error rates from the previous error estimation you specified. Comparing hdist=1 to =2 should be a good determinant of final parameter.

            Thanks for your help,

            Bruce.

            Comment


            • Originally posted by bruce01 View Post
              Hi Brian,

              to clarify, I did have mink=5 for the rest of the runs based on my previous assertion that SNP, methylation would not be called in the first/final 5bp of a read, so the 'mm' would have been disabled for both. This should result in fewer bases between mink=5 vs. mink=10, based on missing those adapters that are <10, versus <5. Right?
              Ah! OK, I understand now. Yes, the additional bases trimmed with mink=5 and not with mink=10 are adapter... or false-positive. 5 is pretty short and will be expected to yield false-positives roughly 2*1/(4^5) or .2% of the time, or 15x as much - 3% of the time - with hdist=1. But only 5 extra bases would get trimmed so that's not very important.

              Thanks for your help,

              Bruce.
              You're welcome! Remember, if you are using paired ends, I highly recommend the "tbo" and "tpe" flags.

              Comment


              • Hi Brian,

                do BBMap/BBDuk by chance have a low complexity sequence filtering or masking option?

                Thanks!

                Comment


                • Hi Luc,

                  BBDuk has an entropy filtering option:
                  Code:
                  Entropy/Complexity parameters:
                  entropy=-1          Set between 0 and 1 to filter reads with entropy below
                                      that value.  Higher is more stringent.
                  entropywindow=50    Calculate entropy using a sliding window of this length.
                  entropyk=5          Calculate entropy using kmers of this length.
                  For example:
                  bbduk.sh in=reads.fq out=filtered.fq entropy=0.01

                  That will probably only filter homopolymers... while this:

                  bbduk.sh in=reads.fq out=filtered.fq entropy=0.95

                  ...will filter anything that is not very high complexity.

                  BBMask, on the other hand, can do entropy-masking as well as masking of simple repeats. The parameters are similar:

                  Code:
                  Processing parameters:
                  maskrepeats=f       (mr) Mask areas covered by exact repeat kmers.
                  kr=5                Kmer size to use for repeat detection (1-15).  Use minkr and maxkr to sweep a range of kmers.
                  minlen=40           Minimum length of repeat area to mask.
                  mincount=4          Minimum number of repeats to mask.
                  masklowentropy=t    (mle) Mask areas with low complexity by calculating entropy over a window for a fixed kmer size.
                  ke=5                Kmer size to use for entropy calculation (1-15).  Use minke and maxke to sweep a range.  Large ke uses more memory.
                  window=80           (w) Window size for entropy calculation.
                  entropy=0.70        (e) Mask windows with entropy under this value (0-1).  0.0001 will mask only homopolymers and 1 will mask everything.
                  lowercase=f         (lc) Convert masked bases to lower case.  Default is to convert them to N.
                  split=f             Split into unmasked pieces and discard masked pieces.
                  Entropy is calculated in a standard way, using the counts of unique short kmers that occur in a window, such that the more unique kmers occur within the window - and the more even the distribution of counts - the closer the value approaches 1. But, you will probably have to play with the exact cutoff a bit to get the result you want. Kmer length and window size are not overly important, but window size should be shorter than read length and similar to the length of the shortest features you want to mask, and kmer length should be much shorter then window size. The defaults for BBMask are the settings I used to mask the human genome to prevent non-vertebrate reads from mapping to it.
                  Last edited by Brian Bushnell; 08-17-2015, 08:47 AM.

                  Comment


                  • Hi Brian,

                    thanks a lot for the detailed information and for providing this great tool! Looks like it will do (more than) everything I will need.

                    Comment


                    • Hi Brian,

                      I used bbduk.sh to remove adapters and quality trim my fastq files and for some reason it removes the x and y coordinates of the cluster within the read name and replaces it with 0. This makes most of my read names no longer unique. Is there a way to avoid this?

                      I ran it with:
                      Code:
                      bbduk.sh qtrim=rl trimq=20 in=../Gbar_Lib1_Index025_L001_R#_001.fastq out1=../Gbar_Lib1_Index025_L001_R#_001_bbduk_q20.fastq minlen=25 ktrim=r k=25 mink=11 ref=~/bbmap/resources/index25.fa
                      
                      head  ../../Gbar_Lib1_Index025_L001_R1_001.fastq
                      @SN677:200:H5FCKBCXX:1:1106:2130:10004 1:N:0:ACTGAT
                      GTTAAAAAGATTAAAGCTACAAGAGCGAATCTTACTCCCCAGGCGATGCAGTGGAGCAGATTAAAGCCACAACAGTGAATCTTGCTATCCCGACATTGCAGTTAAAAAGATTAAAGCTACAGCAGCGAATCTTACCTCCCAGGCGGTGCATTGGAACAGATTAAAGACACATCGGTGAATCTTATTTCCCCGGTATTGCAGTTAAAAAGATTAAAGCCACAGCGGCGAATCCTACTTCCCTAGTGGTGC
                      +
                      GGGGGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIIIGGIIIIIIIIIGIIIIAGGIGIIIIIIGGGIIGGIGIGGGGGGGGIGGGG.GGGAGGIA7GAAGIIIGGGGAGGGIGGIIIIIIGGGAG.7..<<<.GGGGGGGAAAA7.7AAA.
                      @SN677:200:H5FCKBCXX:1:1106:2046:10016 1:N:0:ACTGAT
                      AAATCGAGAGACGAAAAAAAGAAATGAAAGAAAAGGAACAAAAAATCAAAAAAGTGAGAGAAAAAGAAAATGAAGAGAAAGAATTCGAAAAAGAAATTGAGATTGAACAAGAATGCAAGCAAGAAGGAGAAATCGAAAGAAAAGAAGAAAAGATTGAAAAAGAAAAAGAAATCCAAAAAGAGAGAGAGGTCGAAAATAAAAGTGAAAATGAGAGAGGAAATGAAAAGAGAACAAAAGAAGTTGAGATG
                      +
                      <<GA<.<<AAGGGAGAGG<GGAGGIIIAGGGGGGGGGGGGGGGGAAGGGG.<GG<<<GGG.<A.GGGGGAAGGGGIGIGGGGGG.AGGGGGAGAGAA.<AA.<AGGGGGGIGGGGGGGGGGIIGAAGGIGGGAGGGG<AGAAGIGIIIGGGGG..<AGA.AGGGAGGGGGA<.77.7A.AGGAA.7GGA..7<7.<.7A.A7.77GA7.7G.G.A.....7A7.....G.7....7.7AA7..7....
                      @SN677:200:H5FCKBCXX:1:1106:2068:10031 1:N:0:ACTGAT
                      AGAGAGCTGTCTATACCACTGGCAAAGGGGCTTCTGCCGTGGGGCTGACAGCAGCAGTGCACAAGGATCCGGTAACCAGGGAGTGGACCCTTGAAGGAGGAGCCCTTGTTTTAGCTGACAAAGGGATTTGCCTTATTGACGAATTTGATAAAATGAATGATCAAGACAGGTAAGGGAAAGCCTGGCATAAATTTAGCCACTATAATTAGATAACTTCAGCAAACACCTTTCGTCGTTTGCTTTACTTTTT
                      
                      head  ../Gbar_Lib1_Index025_L001_R1_001_bbduk_q20.fastq
                      @SN677:200:H5FCKBCXX:1:1106:0:0 1:N:0:ACTGAT
                      CTTGTTAACCAATGCTATTATAGGTTTGATGTCTCATACAGGAGTATAGGATAAAGCTCTCACGTTTTGTTTCAAAGAAGGGCCTGTAACTCATTATGAACTGCTGCTTGCCAACACTTGTGTTGCATTGCTTCATAGATGTCAGCAGGTGAATCATTTGACTTAGCAGGAAGGTGAATCGTTTTGTTTCATATATTGCAGTAGTTCATAATGAGTTACAG
                      +
                      GGGAGIIIIIIIIIIIIIGIGIIGIIIIGIGIIGGGIIIIGGGGIGGGGIIIIIIIIIIIGIIIIIIIGGIIIIIGIIIGGGIIIGIIGIIIIGIIIIGGGGGGIIIIIIIGGAGIGGIIIIIIIGGGIIIIIIIAGGGGGIGGGGAGGGGIGGGGIIIIGIGGGGIGGGAGGIGIG<AGAGGAGG7AGAGGAGGGGAGAGGGGGGGGGIGGGGGGGAGAA
                      @SN677:200:H5FCKBCXX:1:1106:0:0 1:N:0:ACTGAT
                      AAAATCTGATTTAAAAAGAATATTTATTTTAAAAATTATTGCATGACTATTAATATGATAGGAAAGTCGTATAAAAATATCGATAGAAAAATTTTACCGATTTAGTGGTATTGAAAAAATTGGGTAAAAACAAGGTATCGAGACCTTGATCTCATAAAACTGAGTAGAAAATATTTTTATAAATATTTATGGAGTGTCATTTAGTTAGTATTAAAGTTTTGTTAGAAAATTTTAAC
                      +
                      GGGGGIIIIIIIIIIIIGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGGIIIIIIIIIIIIIIIIIIIIIIGIIIIIIIIIIIIIIIGGIGGIGIIIIIIIGIIIIIIIGGIGGIGIIIGGIGIGGGGGIIIGGAGGGIIIGGIIIIGGGGGGIGIIGIGGGGGG.GGGGGGIIGGGGGGGAAAAG.<GGAGGGGIGGGGGIIGGGG.AGG7AGGGGAAG.7GGGG
                      @SN677:200:H5FCKBCXX:1:1106:0:0 1:N:0:ACTGAT
                      ATCTCTTGGCTTTTATCATGTTTAAATCATGATGGCGAGTTGGCTAAGACATGGCAACCATTTACTACTTCCAGTGCGTTAGCAGTCAACTTATGCTTTTAGGATTTTCTATTTCTCCAGTTGGGTCATGGTTTATATGCTTAGTGCTTCCACCTGCTTATAACATTTTTCATTATGTTGGTTGCTCTACTTTGGATAGCATATTGGCTGTTCACTAAGCTTTAATGATTTTGAAAGACTTGCCATGCA
                      Last edited by kcamnairb; 12-14-2015, 07:05 AM.

                      Comment


                      • I can't replicate this...

                        Code:
                        bushnell@gpint109:/global/projectb/scratch/bushnell/temp$ bbduk.sh qtrim=rl trimq=20 in=x.fq out1=y.fq minlen=25 ktrim=r k=25 mink=11 ref=z.fa
                        java -Djava.library.path=/usr/common/jgi/utilities/bbtools/prod-v35.78/lib -ea -Xmx46673m -Xms46673m -cp /usr/common/jgi/utilities/bbtools/prod-v35.78/lib/BBTools.jar jgi.BBDukF qtrim=rl trimq=20 in=x.fq out1=y.fq minlen=25 ktrim=r k=25 mink=11 ref=z.fa
                        Executing jgi.BBDukF [qtrim=rl, trimq=20, in=x.fq, out1=y.fq, minlen=25, ktrim=r, k=25, mink=11, ref=z.fa]
                        
                        BBDuk version 35.78
                        maskMiddle was disabled because useShortKmers=true
                        Initial:
                        Memory: max=46901m, free=46167m, used=734m
                        
                        Added 15 kmers; time:   0.031 seconds.
                        Memory: max=46901m, free=43965m, used=2936m
                        
                        Input is being processed as unpaired
                        Started output streams: 0.014 seconds.
                        Processing time:                0.013 seconds.
                        
                        Input:                          2 reads                 497 bases.
                        QTrimmed:                       2 reads (100.00%)       102 bases (20.52%)
                        KTrimmed:                       0 reads (0.00%)         0 bases (0.00%)
                        Result:                         2 reads (100.00%)       395 bases (79.48%)
                        
                        Time:                           0.063 seconds.
                        Reads Processed:           2    0.03k reads/sec
                        Bases Processed:         497    0.01m bases/sec
                        bushnell@gpint109:/global/projectb/scratch/bushnell/temp$ cat x.fq
                        @SN677:200:H5FCKBCXX:1:1106:2130:10004 1:N:0:ACTGAT
                        GTTAAAAAGATTAAAGCTACAAGAGCGAATCTTACTCCCCAGGCGATGCAGTGGAGCAGATTAAAGCCACAACAGTGAATCTTGCTATCCCGACATTGCAGTTAAAAAGATTAAAGCTACAGCAGCGAATCTTACCTCCCAGGCGGTGCATTGGAACAGATTAAAGACACATCGGTGAATCTTATTTCCCCGGTATTGCAGTTAAAAAGATTAAAGCCACAGCGGCGAATCCTACTTCCCTAGTGGTGC
                        +
                        GGGGGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIIIGGIIIIIIIIIGIIIIAGGIGIIIIIIGGGIIGGIGIGGGGGGGGIGGGG.GGGAGGIA7GAAGIIIGGGGAGGGIGGIIIIIIGGGAG.7..<<<.GGGGGGGAAAA7.7AAA.
                        @SN677:200:H5FCKBCXX:1:1106:2046:10016 1:N:0:ACTGAT
                        AAATCGAGAGACGAAAAAAAGAAATGAAAGAAAAGGAACAAAAAATCAAAAAAGTGAGAGAAAAAGAAAATGAAGAGAAAGAATTCGAAAAAGAAATTGAGATTGAACAAGAATGCAAGCAAGAAGGAGAAATCGAAAGAAAAGAAGAAAAGATTGAAAAAGAAAAAGAAATCCAAAAAGAGAGAGAGGTCGAAAATAAAAGTGAAAATGAGAGAGGAAATGAAAAGAGAACAAAAGAAGTTGAGATG
                        +
                        <<GA<.<<AAGGGAGAGG<GGAGGIIIAGGGGGGGGGGGGGGGGAAGGGG.<GG<<<GGG.<A.GGGGGAAGGGGIGIGGGGGG.AGGGGGAGAGAA.<AA.<AGGGGGGIGGGGGGGGGGIIGAAGGIGGGAGGGG<AGAAGIGIIIGGGGG..<AGA.AGGGAGGGGGA<.77.7A.AGGAA.7GGA..7<7.<.7A.A7.77GA7.7G.G.A.....7A7.....G.7....7.7AA7..7....
                        bushnell@gpint109:/global/projectb/scratch/bushnell/temp$ cat y.fq
                        @SN677:200:H5FCKBCXX:1:1106:2130:10004 1:N:0:ACTGAT
                        GTTAAAAAGATTAAAGCTACAAGAGCGAATCTTACTCCCCAGGCGATGCAGTGGAGCAGATTAAAGCCACAACAGTGAATCTTGCTATCCCGACATTGCAGTTAAAAAGATTAAAGCTACAGCAGCGAATCTTACCTCCCAGGCGGTGCATTGGAACAGATTAAAGACACATCGGTGAATCTTATTTCCCCGGTATTGCAGTTAAAAAGATTAAAGCCACAGC
                        +
                        GGGGGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIIIGGIIIIIIIIIGIIIIAGGIGIIIIIIGGGIIGGIGIGGGGGGGGIGGGG.GGGAGGIA7GAAGIIIGGGGAGGGIGGIIIIIIGGGAG
                        @SN677:200:H5FCKBCXX:1:1106:2046:10016 1:N:0:ACTGAT
                        AAATCGAGAGACGAAAAAAAGAAATGAAAGAAAAGGAACAAAAAATCAAAAAAGTGAGAGAAAAAGAAAATGAAGAGAAAGAATTCGAAAAAGAAATTGAGATTGAACAAGAATGCAAGCAAGAAGGAGAAATCGAAAGAAAAGAAGAAAAGATTGAAAAAGAAAAAGAAAT
                        +
                        <<GA<.<<AAGGGAGAGG<GGAGGIIIAGGGGGGGGGGGGGGGGAAGGGG.<GG<<<GGG.<A.GGGGGAAGGGGIGIGGGGGG.AGGGGGAGAGAA.<AA.<AGGGGGGIGGGGGGGGGGIIGAAGGIGGGAGGGG<AGAAGIGIIIGGGGG..<AGA.AGGGAGGGGGA<
                        bushnell@gpint109:/global/projectb/scratch/bushnell/temp$ cat z.fa
                        >1
                        AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                        BBDuk does not change read names. So, I can only imagine that many of the names already had zeroes going in. Note that BBDuk does reorder reads (unless you use the "ordered" flag), and some of the reads may have been filtered out, so the top 3 reads before and after are not necessarily the same.

                        Comment


                        • You are right, sorry about that.

                          Comment


                          • Hi Brian,

                            Would you mind explaining the the sliding window option for quality trimming (qtrim) in bbduk? How big is the window and is this similar to trimmomatic's sliding window approach? I notice I have less reads trimmed with the sliding window compared to rl trimming, although the % of bases trimmed is the same (so longer stretches of sequence are trimmed with the sliding window I suppose). Is one approach preferred over the other?

                            Thanks!

                            Comment


                            • Originally posted by salamay View Post
                              Hi Brian,

                              Would you mind explaining the the sliding window option for quality trimming (qtrim) in bbduk? How big is the window and is this similar to trimmomatic's sliding window approach? I notice I have less reads trimmed with the sliding window compared to rl trimming, although the % of bases trimmed is the same (so longer stretches of sequence are trimmed with the sliding window I suppose). Is one approach preferred over the other?

                              Thanks!
                              I added a sliding window just so that people could use it as a drop-in replacement for Trimommatic, but I do not recommend using it. The normal "rl" trim mode guarantees optimal results (assuming the quality scores are accurate, of course) while window-based trimming is a heuristic that cannot ensure optimal results. It works by trimming until the average quality score in a user-specified window size exceeds the threshold.

                              You can set the window size by adding a comma, like this:

                              "qtrim=window,5 trimq=10"

                              That will use a 5bp window and trim until the average quality in the window is at least 10. The default window size is 4.

                              Comment


                              • Dear Brian,

                                I have many reads from a very specific scRNA contaminant of length 299 bp. Would BBDuk's contaminant filtering be a suitable tool for removing these, and what parameters would you recommend? (my reads are 100 bp PE)

                                Would it be possible to do it in one run along with adapter and quality trimming, e.g. by adding this 299bp sequence to the current adapters.fa file that you have provided in the resources folder?

                                Many thanks for any help
                                Last edited by willd; 01-11-2016, 04:22 AM. Reason: Added 'along with adapter and quality trimming'

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Latest Developments in Precision Medicine
                                  by seqadmin



                                  Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                                  Somatic Genomics
                                  “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                                  05-24-2024, 01:16 PM
                                • seqadmin
                                  Recent Advances in Sequencing Analysis Tools
                                  by seqadmin


                                  The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                                  05-06-2024, 07:48 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 01:32 PM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 05-24-2024, 07:15 AM
                                0 responses
                                199 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 05-23-2024, 10:28 AM
                                0 responses
                                221 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 05-23-2024, 07:35 AM
                                0 responses
                                232 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X