I am not sure why BBMap is not reporting the two sites in the output SAM. Perhaps that is by design.
Unfortunately @Brian has been missing from online forums for last couple of months. I pinged him again yesterday with ref to this question. We will need him to chime in on these questions.
Unconfigured Ad
Collapse
X
-
@sdriscoll thanks for the tip! with these two options bbmap does report one 100% semiperfect site read but it's not in the output file. Anything else I'm missing?
the exact command was:
Code:bbmap.sh secondary=t sssr=0.80 k=3 maxindel=0 strictmaxindel ref=sample.genome.fa ambiguous=all maxsites=10 in=sample.read.fa out=mapped.sam vslow
Leave a comment:
-
-
@milan.molbio-
bbmap looks for the best alignment and reports only that one by default. If there are multiple "best" then you can get them all by setting 'ambig=all'.
What you're looking for should be possible by enabling 'secondary=t' and then setting both 'sssr' and 'maxsites' to your liking. "secondary" alignments are those with lower mapping scores relative to the best alignment. "sssr" controls the ratio of the best score that is acceptable (default is 0.95 to allow secondary alignments with mapping scores as low as 95% of the best score). Finally "maxsites" sets a cap on the number of secondary alignment sites to report. You can probably just set that to a very high value to be sure you have them "all" - or at least all that bbmap can find.
Leave a comment:
-
-
Hi guys,
I'm trying to get bbmap to find all hits up to some identity threshold for a 20bp read, but it only finds the perfect hit. This should be possible right? So I must be doing something wrong. Here's the test scenario:
expected output:Code:# file: sample.genome.fa > 10 GRCz10 TTTCCTGAGGATGTGAGATTACTTCCGGGTTTTACAACAACATGGCAGCGCCCTTAGGTCTGCATCTGGAGTTTGGGTTTGTATTTATTGTTTCCTTTATCATTAATCTATTTTTTTTTTCTTTCTCGTGTTTTTTAAGAACATGGCAGTTTTGTTTTACATCAACTTGGCAGTAAT # file sample.reads.fa: > r1 GTTTTACAACAACATGGCAG GTTTTTTAAGAACATGGCAG # run.bash bbmap.sh ref=sample.genome.fa k=3 bbmap.sh minid=0.7 k=3 maxindel=0 strictmaxindel ref=sample.genome.fa ambiguous=all in=sample.read.fa out=mapped.sam vslow # results: mapped.sam mapped.sam @HD VN:1.4 SO:unsorted @SQ SN: 10 GRCz10 LN:177 @PG ID:BBMap PN:BBMap VN:37.85 CL:java -Djava.library.path=/Users/milan/Projects/geneassassin/code/ga-pipeline/bin/bbmap/jni/ -ea -Xmx3200m align2.BBMap build=1 overwrite=true fastareadlen=500 minid=0.7 k=3 maxindel=0 strictmaxindel ref=sample.genome.fa ambiguous=all in=sample.read.fa out=mapped.sam vslow r1 0 10 GRCz10 29 40 20= * 0 0 GTTTTACAACAACATGGCAG * NM:i:0 AM:i:40 NH:i:1
- 1 perfect hit (position 28)
- 1 hit with 2 substitutions (position 153, 8:T>A,13:T>A - GTTTTACATCAACTTGGCAG)
- 1 hit with 3 substitutions (position 129, 5:T>A,6:T>C,9:G>C - GTTTTTTAAGAACATGGCAG)
Code:# stdout: Max memory cannot be determined. Attempting to use 3200 MB. If this fails, please add the -Xmx flag (e.g. -Xmx24g) to your command, or run this program qsubbed or from a qlogin session on Genepool, or set ulimit to an appropriate value. java -Djava.library.path=/Users/milan/Projects/geneassassin/code/ga-pipeline/bin/bbmap/jni/ -ea -Xmx3200m -cp /Users/milan/Projects/geneassassin/code/ga-pipeline/bin/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 minid=0.7 k=3 maxindel=0 strictmaxindel ref=sample.genome.fa ambiguous=all in=sample.read.fa out=mapped.sam vslow Executing align2.BBMap [tipsearch=150, minhits=1, minratio=0.25, rescuemismatches=50, rescuedist=3000, build=1, overwrite=true, fastareadlen=500, minid=0.7, k=3, maxindel=0, strictmaxindel, ref=sample.genome.fa, ambiguous=all, in=sample.read.fa, out=mapped.sam] Version 37.85 [tipsearch=150, minhits=1, minratio=0.25, rescuemismatches=50, rescuedist=3000, build=1, overwrite=true, fastareadlen=500, minid=0.7, k=3, maxindel=0, strictmaxindel, ref=sample.genome.fa, ambiguous=all, in=sample.read.fa, out=mapped.sam] Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.250 Retaining all best sites for ambiguous mappings. Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.449 NOTE: Ignoring reference file because it already appears to have been processed. NOTE: If you wish to regenerate the index, please manually delete ref/genome/1/summary.txt Set genome to 1 Loaded Reference: 0.075 seconds. Loading index for chunk 1-1, build 1 Generated Index: 0.004 seconds. Analyzed Index: 0.002 seconds. Started output stream: 0.021 seconds. Cleared Memory: 0.116 seconds. Processing reads in single-ended mode. Started read stream. Started 8 mapping threads. Detecting finished threads: 0, 1, 2, 3, 4, 5, 6, 7 ------------------ Results ------------------ Genome: 1 Key Length: 3 Max Indel: 0 Minimum Score Ratio: 0.4486 Mapping Mode: normal Reads Used: 1 (20 bases) Mapping: 0.208 seconds. Reads/sec: 4.81 kBases/sec: 0.10 Read 1 data: pct reads num reads pct bases num bases mapped: 100.0000% 1 100.0000% 20 unambiguous: 100.0000% 1 100.0000% 20 ambiguous: 0.0000% 0 0.0000% 0 low-Q discards: 0.0000% 0 0.0000% 0 perfect best site: 100.0000% 1 100.0000% 20 semiperfect site: 100.0000% 1 100.0000% 20 Match Rate: NA NA 100.0000% 20 Error Rate: 0.0000% 0 0.0000% 0 Sub Rate: 0.0000% 0 0.0000% 0 Del Rate: 0.0000% 0 0.0000% 0 Ins Rate: 0.0000% 0 0.0000% 0 N Rate: 0.0000% 0 0.0000% 0 Total time: 0.571 seconds.
Leave a comment:
-
Hello-
I'm trying to understand this example in the original post:
It is provided as an example of how to deal with the non-deterministic behavior of bbmap when aligning paired-end reads. However what is described (map reads as single-end and then re-pair them using repair.sh) doesn't seem to be what these code lines show. If the goal is to get bbmap to produce deterministic paired-end alignments wouldn't the final outcome of the method be an alignment file and not just a FASTQ? Why would I be using the un-aligned reads as part of the final product?Code:bbmap.sh in=reads.fq outu=unmapped.fq int=f repair.sh in=unmapped.fq out=paired.fq fint outs=singletons.fq
Leave a comment:
-
-
Hi guys,
I'm trying to get bbmap to find all hits up to some identity threshold for a 20bp read, but it only finds the perfect hit. This should be possible right? So I must be doing something wrong. Here's the test scenario:
expected output:Code:# file: sample.genome.fa > 10 GRCz10 TTTCCTGAGGATGTGAGATTACTTCCGGGTTTTACAACAACATGGCAGCGCCCTTAGGTCTGCATCTGGAGTTTGGGTTTGTATTTATTGTTTCCTTTATCATTAATCTATTTTTTTTTTCTTTCTCGTGTTTTTTAAGAACATGGCAGTTTTGTTTTACATCAACTTGGCAGTAAT # file sample.reads.fa: > r1 GTTTTACAACAACATGGCAG GTTTTTTAAGAACATGGCAG # run.bash bbmap.sh ref=sample.genome.fa k=3 bbmap.sh minid=0.7 k=3 maxindel=0 strictmaxindel ref=sample.genome.fa ambiguous=all in=sample.read.fa out=mapped.sam vslow # results: mapped.sam mapped.sam @HD VN:1.4 SO:unsorted @SQ SN: 10 GRCz10 LN:177 @PG ID:BBMap PN:BBMap VN:37.85 CL:java -Djava.library.path=/Users/milan/Projects/geneassassin/code/ga-pipeline/bin/bbmap/jni/ -ea -Xmx3200m align2.BBMap build=1 overwrite=true fastareadlen=500 minid=0.7 k=3 maxindel=0 strictmaxindel ref=sample.genome.fa ambiguous=all in=sample.read.fa out=mapped.sam vslow r1 0 10 GRCz10 29 40 20= * 0 0 GTTTTACAACAACATGGCAG * NM:i:0 AM:i:40 NH:i:1
- 1 perfect hit (position 28)
- 1 hit with 2 substitutions (position 153, 8:T>A,13:T>A - GTTTTACATCAACTTGGCAG)
- 1 hit with 3 substitutions (position 129, 5:T>A,6:T>C,9:G>C - GTTTTTTAAGAACATGGCAG)
Code:# stdout: Max memory cannot be determined. Attempting to use 3200 MB. If this fails, please add the -Xmx flag (e.g. -Xmx24g) to your command, or run this program qsubbed or from a qlogin session on Genepool, or set ulimit to an appropriate value. java -Djava.library.path=/Users/milan/Projects/geneassassin/code/ga-pipeline/bin/bbmap/jni/ -ea -Xmx3200m -cp /Users/milan/Projects/geneassassin/code/ga-pipeline/bin/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 minid=0.7 k=3 maxindel=0 strictmaxindel ref=sample.genome.fa ambiguous=all in=sample.read.fa out=mapped.sam vslow Executing align2.BBMap [tipsearch=150, minhits=1, minratio=0.25, rescuemismatches=50, rescuedist=3000, build=1, overwrite=true, fastareadlen=500, minid=0.7, k=3, maxindel=0, strictmaxindel, ref=sample.genome.fa, ambiguous=all, in=sample.read.fa, out=mapped.sam] Version 37.85 [tipsearch=150, minhits=1, minratio=0.25, rescuemismatches=50, rescuedist=3000, build=1, overwrite=true, fastareadlen=500, minid=0.7, k=3, maxindel=0, strictmaxindel, ref=sample.genome.fa, ambiguous=all, in=sample.read.fa, out=mapped.sam] Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.250 Retaining all best sites for ambiguous mappings. Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.449 NOTE: Ignoring reference file because it already appears to have been processed. NOTE: If you wish to regenerate the index, please manually delete ref/genome/1/summary.txt Set genome to 1 Loaded Reference: 0.075 seconds. Loading index for chunk 1-1, build 1 Generated Index: 0.004 seconds. Analyzed Index: 0.002 seconds. Started output stream: 0.021 seconds. Cleared Memory: 0.116 seconds. Processing reads in single-ended mode. Started read stream. Started 8 mapping threads. Detecting finished threads: 0, 1, 2, 3, 4, 5, 6, 7 ------------------ Results ------------------ Genome: 1 Key Length: 3 Max Indel: 0 Minimum Score Ratio: 0.4486 Mapping Mode: normal Reads Used: 1 (20 bases) Mapping: 0.208 seconds. Reads/sec: 4.81 kBases/sec: 0.10 Read 1 data: pct reads num reads pct bases num bases mapped: 100.0000% 1 100.0000% 20 unambiguous: 100.0000% 1 100.0000% 20 ambiguous: 0.0000% 0 0.0000% 0 low-Q discards: 0.0000% 0 0.0000% 0 perfect best site: 100.0000% 1 100.0000% 20 semiperfect site: 100.0000% 1 100.0000% 20 Match Rate: NA NA 100.0000% 20 Error Rate: 0.0000% 0 0.0000% 0 Sub Rate: 0.0000% 0 0.0000% 0 Del Rate: 0.0000% 0 0.0000% 0 Ins Rate: 0.0000% 0 0.0000% 0 N Rate: 0.0000% 0 0.0000% 0 Total time: 0.571 seconds.
Leave a comment:
-
how to find all ambiguous reads?
Hi guys,
I'm trying to get bbmap to find all hits up to some identity threshold for a 20bp read, but it only finds the perfect hit. This should be possible right? So I must be doing something wrong. Here's the test scenario:
expected output:Code:# file: sample.genome.fa > 10 GRCz10 TTTCCTGAGGATGTGAGATTACTTCCGGGTTTTACAACAACATGGCAGCGCCCTTAGGTCTGCATCTGGAGTTTGGGTTTGTATTTATTGTTTCCTTTATCATTAATCTATTTTTTTTTTCTTTCTCGTGTTTTTTAAGAACATGGCAGTTTTGTTTTACATCAACTTGGCAGTAAT # file sample.reads.fa: > r1 GTTTTACAACAACATGGCAG GTTTTTTAAGAACATGGCAG # run.bash bbmap.sh ref=sample.genome.fa k=3 bbmap.sh minid=0.7 k=3 maxindel=0 strictmaxindel ref=sample.genome.fa ambiguous=all in=sample.read.fa out=mapped.sam vslow # results: mapped.sam mapped.sam @HD VN:1.4 SO:unsorted @SQ SN: 10 GRCz10 LN:177 @PG ID:BBMap PN:BBMap VN:37.85 CL:java -Djava.library.path=/Users/milan/Projects/geneassassin/code/ga-pipeline/bin/bbmap/jni/ -ea -Xmx3200m align2.BBMap build=1 overwrite=true fastareadlen=500 minid=0.7 k=3 maxindel=0 strictmaxindel ref=sample.genome.fa ambiguous=all in=sample.read.fa out=mapped.sam vslow r1 0 10 GRCz10 29 40 20= * 0 0 GTTTTACAACAACATGGCAG * NM:i:0 AM:i:40 NH:i:1
- 1 perfect hit (position 28)
- 1 hit with 2 substitutions (position 153, 8:T>A,13:T>A - GTTTTACATCAACTTGGCAG)
- 1 hit with 3 substitutions (position 129, 5:T>A,6:T>C,9:G>C - GTTTTTTAAGAACATGGCAG)
Code:# stdout: Max memory cannot be determined. Attempting to use 3200 MB. If this fails, please add the -Xmx flag (e.g. -Xmx24g) to your command, or run this program qsubbed or from a qlogin session on Genepool, or set ulimit to an appropriate value. java -Djava.library.path=/Users/milan/Projects/geneassassin/code/ga-pipeline/bin/bbmap/jni/ -ea -Xmx3200m -cp /Users/milan/Projects/geneassassin/code/ga-pipeline/bin/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 minid=0.7 k=3 maxindel=0 strictmaxindel ref=sample.genome.fa ambiguous=all in=sample.read.fa out=mapped.sam vslow Executing align2.BBMap [tipsearch=150, minhits=1, minratio=0.25, rescuemismatches=50, rescuedist=3000, build=1, overwrite=true, fastareadlen=500, minid=0.7, k=3, maxindel=0, strictmaxindel, ref=sample.genome.fa, ambiguous=all, in=sample.read.fa, out=mapped.sam] Version 37.85 [tipsearch=150, minhits=1, minratio=0.25, rescuemismatches=50, rescuedist=3000, build=1, overwrite=true, fastareadlen=500, minid=0.7, k=3, maxindel=0, strictmaxindel, ref=sample.genome.fa, ambiguous=all, in=sample.read.fa, out=mapped.sam] Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.250 Retaining all best sites for ambiguous mappings. Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.449 NOTE: Ignoring reference file because it already appears to have been processed. NOTE: If you wish to regenerate the index, please manually delete ref/genome/1/summary.txt Set genome to 1 Loaded Reference: 0.075 seconds. Loading index for chunk 1-1, build 1 Generated Index: 0.004 seconds. Analyzed Index: 0.002 seconds. Started output stream: 0.021 seconds. Cleared Memory: 0.116 seconds. Processing reads in single-ended mode. Started read stream. Started 8 mapping threads. Detecting finished threads: 0, 1, 2, 3, 4, 5, 6, 7 ------------------ Results ------------------ Genome: 1 Key Length: 3 Max Indel: 0 Minimum Score Ratio: 0.4486 Mapping Mode: normal Reads Used: 1 (20 bases) Mapping: 0.208 seconds. Reads/sec: 4.81 kBases/sec: 0.10 Read 1 data: pct reads num reads pct bases num bases mapped: 100.0000% 1 100.0000% 20 unambiguous: 100.0000% 1 100.0000% 20 ambiguous: 0.0000% 0 0.0000% 0 low-Q discards: 0.0000% 0 0.0000% 0 perfect best site: 100.0000% 1 100.0000% 20 semiperfect site: 100.0000% 1 100.0000% 20 Match Rate: NA NA 100.0000% 20 Error Rate: 0.0000% 0 0.0000% 0 Sub Rate: 0.0000% 0 0.0000% 0 Del Rate: 0.0000% 0 0.0000% 0 Ins Rate: 0.0000% 0 0.0000% 0 N Rate: 0.0000% 0 0.0000% 0 Total time: 0.571 seconds.
Leave a comment:
-
Ultimately, i'd like to do variant calling on a combined pac bio / illumina whole viral genome dataset. I am working with BBMap right now as it has the intuitive minid flag, which seems desirable. As a first step, I'm trying to optimize my mapping as much as possible on one of the samples that is most divergent to the reference.
Here is my working command:
bbmap/mapPacBio.sh in=200335185_usedreads.fastq.gz ref=200303013.fa maxindel=40 minid=0.4 vslow k=8 out=200335185.sam overwrite=t bamscript=bs.sh; sh bs.sh
It optimizes the number of reads mapped (4148/4164) and minimizes the number of ambiguous mapping reads (1).
Given, k=8 and minid=0.4, geneious mapper maps all 4164 reads for maxindel ranging from 20-500. If it is in the cards, I'd like to be able to map the remaining stragglers but don't know what other BBMap flags I should try in this endeavor. Also, I'm curious why bbmap is so much more sensitive to the valueof maxindel... here are select bbmap results:
maxindel num reads num ambiguous
20 4145 2
40 4148 1
60 4137 1
100 4130 3
200 4125 4
Leave a comment:
-
-
Thanks! I'm surprised there's something BBTools doesn't doOriginally posted by GenoMax View PostBBTools currently has no count utilities. They may be on the wish list since many have asked Brian. For now, your best bet is to use featureCounts.
Leave a comment:
-
-
BBTools currently has no count utilities. They may be on the wish list since many have asked Brian. For now, your best bet is to use featureCounts.Originally posted by mcmc View PostIs there a way to use BBtools to summarize reads mapped to a genome (using BBmap/BBsplit, in a sam file) by orf? I see that pileup.sh will take a prodigal-output fasta file with orf info, but I've got a genome downloaded from refseq with all the ncbi files (gff, cds fasta, gb). Can BBtools parse one of these to summarize my sam file by orf?
While I could map to the orfs.fna instead, I'm interested in intergenics too, e.g. for orf/RNA discovery.
Thanks,
MCMC
Leave a comment:
-
-
summarizing mapped reads by orf
Is there a way to use BBtools to summarize reads mapped to a genome (using BBmap/BBsplit, in a sam file) by orf? I see that pileup.sh will take a prodigal-output fasta file with orf info, but I've got a genome downloaded from refseq with all the ncbi files (gff, cds fasta, gb). Can BBtools parse one of these to summarize my sam file by orf?
While I could map to the orfs.fna instead, I'm interested in intergenics too, e.g. for orf/RNA discovery.
Thanks,
MCMC
Leave a comment:
-
-
BBSplit ambig=toss
Hi Brian et al.,
When I run BBsplit with ambig=toss, the ambiguous reads are not written to unmapped.fq; but when I run BBmap, they are. Is this the expected behavior? I'd like to be able to retrieve the ambiguous reads from BBsplit (both within/between two references).
Thanks,
MC
Leave a comment:
-
-
Have you looked through the logs and such to see if there is any indication of any issues? There is always the possibility that @Brian may not have checked extreme usage case like this for randomreads.sh and this may be a genuine bug that is clearly a road-block.
Since you have said that 100 genomes seem to work fine you could do 10 runs of 100 genomes each and then perhaps merge the data. A thought.
Leave a comment:
-
-
I will try that, thank you.
And regarding the third question, that is actually a problem. I have no idea where these reads come from. I tried to search them or parts of them in the original genomes, but apparently with no success. Could be chimeras made up of short sequences, but I can't say for sure.
The first thought was that it could be some memory problem, since it gets worse when increasing the size of the initial file, but it's just a random idea.
Leave a comment:
-
-
@Brian will likely have to weigh in on this (especially "positions stated in read header with the actual read sequence, for most of the reads they were too different ") but be aware that he has been behind on support of late.
A few things to check that I can think of:
1. If you are only going to check the + strands then perhaps you should have used the samestrand=t option when generating the reads.
2. Default value for BBMap is ambig=best. Can you try mapping with ambig=all to see if that improves alignments?
3. Do you know why the remaining reads are not mapping (are they chimeras)?
Leave a comment:
-
Latest Articles
Collapse
-
by SEQadmin2
Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.
The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
...-
Channel: Articles
06-02-2026, 10:05 AM -
-
by SEQadmin2
With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.
Introduction
Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...-
Channel: Articles
05-22-2026, 06:42 AM -
ad_right_rmr
Collapse
News
Collapse
| Topics | Statistics | Last Post | ||
|---|---|---|---|---|
|
Started by SEQadmin2, 06-05-2026, 10:09 AM
|
0 responses
14 views
0 reactions
|
Last Post
by SEQadmin2
06-05-2026, 10:09 AM
|
||
|
Started by SEQadmin2, 06-04-2026, 08:59 AM
|
0 responses
28 views
0 reactions
|
Last Post
by SEQadmin2
06-04-2026, 08:59 AM
|
||
|
Started by SEQadmin2, 06-02-2026, 12:03 PM
|
0 responses
33 views
0 reactions
|
Last Post
by SEQadmin2
06-02-2026, 12:03 PM
|
||
|
Started by SEQadmin2, 06-02-2026, 11:40 AM
|
0 responses
23 views
0 reactions
|
Last Post
by SEQadmin2
06-02-2026, 11:40 AM
|
Leave a comment: