Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • darthsequencer
    Member
    • Feb 2012
    • 35

    Is there a command to output the kmers of each sequence in a multifasta file?

    Comment

    • GenoMax
      Senior Member
      • Feb 2008
      • 7142

      @darthsequencer: You can use kmercountexact.sh from BBMap suite.

      Comment

      • roselaw27
        Junior Member
        • Mar 2020
        • 2

        Trouble parsing header

        Dear BBMap team:

        I tried to use filterbytile.sh to remove the reads with low quality, but I encountered an error message saying that there was a trouble parsing the header. I've read the description of the script and Brian Bushnell said that was possible when the reads were renamed (such as in SRA) and to contact him if such error happened.

        I downloaded the sequencing data (SRA) from ncbi and used fastq-dump to get the fastq files. I wonder if there is a solution to this?

        Thank you very much!
        Rose

        Comment

        • GenoMax
          Senior Member
          • Feb 2008
          • 7142

          @roselaw27: You can use `-F` option with fastq-dump to try and recreate fastq headers in original illumina format.

          Comment

          • roselaw27
            Junior Member
            • Mar 2020
            • 2

            Thank you for your respond. Unfortunately I tried using -F and filterbytile.sh still showed the same error message. The data were from HiSeq 2500.

            Thank you again. I think I am going to try to find other ways to solve the problem.

            Rose

            Comment

            • silask
              Junior Member
              • Oct 2017
              • 9

              BBsketch alltoall is incomplete

              Can I ask a question about bbsketch?

              I want to compare the ANI between many genomes (1000+) to each other.
              I did

              Code:
              bbsketch.sh perfile genome_folder/*.fasta out=sketch.gz k=31,24 threads=16
              
              comparesketch.sh alltoall sketch.gz k=31,24 prealloc=0.75 format=3 threads=16 out=table.tsv
              The log files seem correct:

              Code:
              Set threads to 16
              Loading sketches.
              Loaded 1157 sketches in 59.541 seconds.
              Total Time:     59.784 seconds.

              Code:
              Set threads to 16
              Loading sketches.
              Executing kmer.KmerTableSet [ways=31, tabletype=10, prealloc=0.75]
              
              Initial size set to 45218398
              Initial:
              Ways=31, initialSize=45218398, prefilter=f, prealloc=0.75
              Memory: max=91268m, total=91268m, free=90848m, used=420m
              
              3.713 seconds.
              Indexed 2880884 unique and 10513099 total hashcodes.
              Loaded 1157 sketches in 8.457 seconds.
              
              Ran 1225005 comparisons in 9.344 seconds.
              Total Time:     17.801 seconds.
              In the final output, I'm missing some genome comparisons (which I get with mash). If I run bbsketch on a subset I get the expected comparisons.


              - Genomes are highly similar.

              #Query Ref ANI QSize RefSize QBases RBases QTaxID RTaxID KID WKID SSU
              genome1.fasta genome2.fasta 94.223 1984118 1796930 1987598 1797650 -1 -1 24.952 27.523 .

              - It is not simply due to the naming: I neither find "genome1 vs genome2" nor "genome 2 vs genome1"


              Any idea?

              Comment

              • AndrewP
                Member
                • Mar 2019
                • 11

                I'm trying to use BBmap to find all perfect hits or hits with an indel length 1.


                Code:
                bbmapskinner.sh  in=kmer.fasta out=result.sam ambiguous=all strictmaxindel=1
                I'm running a control experiment where I have a subsequence to a larger sequence in the reference. I can find the subsequence in the reference if it's an exact match. However, if I add an indel in either the subsequence or reference, BBmap is unable to map the reference. I thought by setting strictmaxindel to 1, it should be able to report an alignment with a single indel. I've tried setting strictmaxindel to 10 and it still doesn't find the alignment.

                Is there something that I am doing wrong?

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  06-02-2026, 10:05 AM
                • SEQadmin2
                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                  by SEQadmin2


                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                  Introduction

                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                  05-22-2026, 06:42 AM
                • SEQadmin2
                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                  by SEQadmin2

                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                  05-06-2026, 09:04 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, 06-02-2026, 12:03 PM
                0 responses
                21 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-02-2026, 11:40 AM
                0 responses
                14 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 05-28-2026, 11:40 AM
                0 responses
                29 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 05-26-2026, 10:12 AM
                0 responses
                31 views
                0 reactions
                Last Post SEQadmin2  
                Working...