Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bbduk.sh barcode filter

    Does the barcode filter in bbduk.sh (v37.66) only allow perfect matches or are mismatches allowed?
    Thanks,
    Lynn

  • #2
    What barcode filter are you referring? Generally with "hdist=N" paramter you can allow or disallow (hdist=0) mismatches.

    Comment


    • #3
      I'm referring to these parameters:
      barcodefilter=t barcodes=TCTCGCGC
      As far as I can see from my results right now, only reads with exactly this sequence in the header are retained. I think this may be too stringent. Does the 'hdist' parameter affect the barcode given how short it is?

      Comment


      • #4
        I see. I have not personally used this feature since most of barcode work is done at the bcl2fastq stage, where you can allow for errors in sequence.

        What exactly are you trying to do? Eliminate reads with some barcodes? I don't think the hdist= parameter is going to apply for the barcodes. It is for errors in the main read. You may have to look for an alternate way to do this. Perhaps using "demuxbyname.sh" may be a better option. Take a look at that.

        Comment


        • #5
          Thanks for the reply. I have a set of large paired-end fastq files that were preprocessed by a sequencing core and I suspect that they were never demultiplexed because the file is quite large and may have had its own lane or flowcell. The headers contain barcodes that are mostly TCTCGCGC or that string with one or two mismatches but there are also barcodes that are wildly different and I want to strip those out without stripping what are probably legit barcodes with 1 or 2 mismatches. When I tried demuxbyname.sh, it started writing out 2x80,000 files, two for every barcode variant present.

          Comment


          • #6
            Use the code I have in this post to enumerate all the different barcodes present in your file. That should give you an idea of the complexity of the problem. Then choose the ones you want (that actually should belong to your samples since you made them) to demux and use only those with demuxbyname.sh.

            Comment


            • #7
              I have already looked at all the barcodes. I've got 76,959 different barcodes that are not exact matches accounting for about 48 million reads. If allow 1 mismatch, I could recover 16 million of those. If I allow 2 mismatches, I could recover 24 million.

              Comment


              • #8
                I wrote a perl script using the fuzzy match module (Text::Fuzzy) to pull all the entries with no more than two mismatches in the barcode. It's not much but I can send to anyone who is interested.

                Comment


                • #9
                  In future just ask the sequence provider to re-do the demultiplexing with bcl2fastq. You are paying them for it anyway :-)

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Recent Advances in Sequencing Analysis Tools
                    by seqadmin


                    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                    Yesterday, 07:48 AM
                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Today, 06:57 AM
                  0 responses
                  7 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 07:17 AM
                  0 responses
                  13 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-02-2024, 08:06 AM
                  0 responses
                  19 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-30-2024, 12:17 PM
                  0 responses
                  22 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X