Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bbduk.sh barcode filter

    Does the barcode filter in bbduk.sh (v37.66) only allow perfect matches or are mismatches allowed?
    Thanks,
    Lynn

  • #2
    What barcode filter are you referring? Generally with "hdist=N" paramter you can allow or disallow (hdist=0) mismatches.

    Comment


    • #3
      I'm referring to these parameters:
      barcodefilter=t barcodes=TCTCGCGC
      As far as I can see from my results right now, only reads with exactly this sequence in the header are retained. I think this may be too stringent. Does the 'hdist' parameter affect the barcode given how short it is?

      Comment


      • #4
        I see. I have not personally used this feature since most of barcode work is done at the bcl2fastq stage, where you can allow for errors in sequence.

        What exactly are you trying to do? Eliminate reads with some barcodes? I don't think the hdist= parameter is going to apply for the barcodes. It is for errors in the main read. You may have to look for an alternate way to do this. Perhaps using "demuxbyname.sh" may be a better option. Take a look at that.

        Comment


        • #5
          Thanks for the reply. I have a set of large paired-end fastq files that were preprocessed by a sequencing core and I suspect that they were never demultiplexed because the file is quite large and may have had its own lane or flowcell. The headers contain barcodes that are mostly TCTCGCGC or that string with one or two mismatches but there are also barcodes that are wildly different and I want to strip those out without stripping what are probably legit barcodes with 1 or 2 mismatches. When I tried demuxbyname.sh, it started writing out 2x80,000 files, two for every barcode variant present.

          Comment


          • #6
            Use the code I have in this post to enumerate all the different barcodes present in your file. That should give you an idea of the complexity of the problem. Then choose the ones you want (that actually should belong to your samples since you made them) to demux and use only those with demuxbyname.sh.

            Comment


            • #7
              I have already looked at all the barcodes. I've got 76,959 different barcodes that are not exact matches accounting for about 48 million reads. If allow 1 mismatch, I could recover 16 million of those. If I allow 2 mismatches, I could recover 24 million.

              Comment


              • #8
                I wrote a perl script using the fuzzy match module (Text::Fuzzy) to pull all the entries with no more than two mismatches in the barcode. It's not much but I can send to anyone who is interested.

                Comment


                • #9
                  In future just ask the sequence provider to re-do the demultiplexing with bcl2fastq. You are paying them for it anyway :-)

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Best Practices for Single-Cell Sequencing Analysis
                    by seqadmin



                    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                    06-06-2024, 07:15 AM
                  • seqadmin
                    Latest Developments in Precision Medicine
                    by seqadmin



                    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                    Somatic Genomics
                    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                    05-24-2024, 01:16 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:58 AM
                  0 responses
                  13 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 06-06-2024, 08:18 AM
                  0 responses
                  20 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 06-06-2024, 08:04 AM
                  0 responses
                  18 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 06-03-2024, 06:55 AM
                  0 responses
                  13 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X