Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • I'd like some guidance on bbduk.sh parameters for trimming and filtering raw reads that would fit best for meeting the following criterion. I'm dealing with PE 150bp raw reads and bbduk.sh version 38.84

    1) Discard a read pair if either one read contains adapter contamination;
    2) Discard a read pair if more than 10% of bases are uncertain in either one read;
    3) Discard a read pair if the proportion of low quality bases is over 50% in either one read.​

    From my understanding of the parameters, point 2 could be met by setting maxns=15. I am not sure on what paramters to use for points 1 and 3. Any help would be much appreciated.
    Last edited by brewseeker; 06-22-2023, 06:09 AM.

    Comment


    • Hi Brian,
      This question is regarding the parameter "mincovfraction or mcf" which is available in BBDuk but I notice that it is not available in Seal. Am I missing anything or was this a deliberate choice, perhaps due to computational considerations. Is there a way around? I know I could use "minkmerfraction" but I am a bit worried about missing a few corner cases. Any suggestions to accomplish this functionality using Seal would be very helpful.
      Thanks in advance.

      Comment


      • Hi Brian,

        Thanks for the super nice tool! I have been doing some mock tests to try to understand a bit better how BBDuk works, and found the results of one of them to be somewhat counterintuitive. In short, I was trying to right-trim a single adapter of length 19 from the same FASTQ file (coming from a 100bp single-read experiment), using two different values of kmin (10/19), while allowing a hamming distance of 1, and filtering out reads shorter than 50 bases after trimming. The exact commands that I ran were:

        Code:
        bbduk.sh in=SomeSample.fastq.gz out=SomeSample.Clean.10.fastq.gz ref=adapters.fa ktrim=r k=19 mink=10 hdist=1 minlength=50 ordered=t
        bbduk.sh in=SomeSample.fastq.gz out=SomeSample.Clean.19.fastq.gz ref=adapters.fa ktrim=r k=19 mink=19 hdist=1 minlength=50 ordered=t​
        ​
        where adapters.fa contains only my 19 base-long adapter. And here is the relevant output for both commands:

        A) For kmin=10:

        Added 832 kmers
        Input: 24597549 reads 2459754900 bases.
        KTrimmed: 792343 reads (3.22%) 14335014 bases (0.58%)
        Total Removed: 2618 reads (0.01%) 14335014 bases (0.58%)
        Result: 24594931 reads (99.99%) 2445419886 bases (99.42%)
        B) For kmin=19:
        Added 55 kmers
        Input: 24597549 reads 2459754900 bases.
        KTrimmed: 283169 reads (1.15%) 7480620 bases (0.30%)
        Total Removed: 2620 reads (0.01%) 7480620 bases (0.30%)
        Result: 24594929 reads (99.99%) 2452274280 bases (99.70%)​
        As expected, trimming with kmin=10 led to a larger number of reads being trimmed, due to the less stringent conditions imposed at the 3' end of the reads. What I found counterintuitive was the fact that trimming with kmin=19 actually resulted in 2 extra reads being discarded after trimming.

        Given that I am forcing right-trimming and the adapter sequence is only 19 bases long, while my input reads are all 100 bases long, I would have expected the same number of reads being discarded in both cases (or, if anything, more reads being discarded for kmin=10). My rationale for this is that, since k=19 in both cases, trimming at the 3' end of a read would in the worst case result in an 81 base-long trimmed read (regardless of kmin being 10, 19, or any other value lower than 19). Therefore, only trimming in the middle of a read (or, in the most extreme situation, trimming the entire read due to a kmer match at the 5' end) should potentially lead to a read being shorter than 50 bases (and thus discarded) after trimming. However, since again k=19 in both cases, any trimming happening in the middle of the reads should also be identical in both cases, whereas any potential 5' trimming should in fact be more aggressive in the case of kmin=10, correct?

        Is there something I am missing, specifically on how ktrim=r works? I would really appreciate it if you (or any of the good samaritans inhabiting this forum) could help me understand these results.

        Thanks a lot in advance for any insights you can provide, and thanks again for all your efforts in developing these awesome tools!

        Cheers,

        -Juan







        Last edited by razorofockham; 03-21-2024, 02:10 PM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Non-Coding RNA Research and Technologies
          by seqadmin




          Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

          Nobel Prize for MicroRNA Discovery
          This week,...
          10-07-2024, 08:07 AM
        • seqadmin
          Recent Developments in Metagenomics
          by seqadmin





          Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
          09-23-2024, 06:35 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 06:55 AM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 10-02-2024, 04:51 AM
        0 responses
        105 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 10-01-2024, 07:10 AM
        0 responses
        113 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 09-30-2024, 08:33 AM
        1 response
        117 views
        0 likes
        Last Post EmiTom
        by EmiTom
         
        Working...
        X