how to find all ambiguous reads?
Hi guys,
I'm trying to get bbmap to find all hits up to some identity threshold for a 20bp read, but it only finds the perfect hit. This should be possible right? So I must be doing something wrong. Here's the test scenario:
expected output:
Hi guys,
I'm trying to get bbmap to find all hits up to some identity threshold for a 20bp read, but it only finds the perfect hit. This should be possible right? So I must be doing something wrong. Here's the test scenario:
Code:
# file: sample.genome.fa > 10 GRCz10 TTTCCTGAGGATGTGAGATTACTTCCGGGTTTTACAACAACATGGCAGCGCCCTTAGGTCTGCATCTGGAGTTTGGGTTTGTATTTATTGTTTCCTTTATCATTAATCTATTTTTTTTTTCTTTCTCGTGTTTTTTAAGAACATGGCAGTTTTGTTTTACATCAACTTGGCAGTAAT # file sample.reads.fa: > r1 GTTTTACAACAACATGGCAG GTTTTTTAAGAACATGGCAG # run.bash bbmap.sh ref=sample.genome.fa k=3 bbmap.sh minid=0.7 k=3 maxindel=0 strictmaxindel ref=sample.genome.fa ambiguous=all in=sample.read.fa out=mapped.sam vslow # results: mapped.sam mapped.sam @HD VN:1.4 SO:unsorted @SQ SN: 10 GRCz10 LN:177 @PG ID:BBMap PN:BBMap VN:37.85 CL:java -Djava.library.path=/Users/milan/Projects/geneassassin/code/ga-pipeline/bin/bbmap/jni/ -ea -Xmx3200m align2.BBMap build=1 overwrite=true fastareadlen=500 minid=0.7 k=3 maxindel=0 strictmaxindel ref=sample.genome.fa ambiguous=all in=sample.read.fa out=mapped.sam vslow r1 0 10 GRCz10 29 40 20= * 0 0 GTTTTACAACAACATGGCAG * NM:i:0 AM:i:40 NH:i:1
- 1 perfect hit (position 28)
- 1 hit with 2 substitutions (position 153, 8:T>A,13:T>A - GTTTTACATCAACTTGGCAG)
- 1 hit with 3 substitutions (position 129, 5:T>A,6:T>C,9:G>C - GTTTTTTAAGAACATGGCAG)
Code:
# stdout: Max memory cannot be determined. Attempting to use 3200 MB. If this fails, please add the -Xmx flag (e.g. -Xmx24g) to your command, or run this program qsubbed or from a qlogin session on Genepool, or set ulimit to an appropriate value. java -Djava.library.path=/Users/milan/Projects/geneassassin/code/ga-pipeline/bin/bbmap/jni/ -ea -Xmx3200m -cp /Users/milan/Projects/geneassassin/code/ga-pipeline/bin/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 minid=0.7 k=3 maxindel=0 strictmaxindel ref=sample.genome.fa ambiguous=all in=sample.read.fa out=mapped.sam vslow Executing align2.BBMap [tipsearch=150, minhits=1, minratio=0.25, rescuemismatches=50, rescuedist=3000, build=1, overwrite=true, fastareadlen=500, minid=0.7, k=3, maxindel=0, strictmaxindel, ref=sample.genome.fa, ambiguous=all, in=sample.read.fa, out=mapped.sam] Version 37.85 [tipsearch=150, minhits=1, minratio=0.25, rescuemismatches=50, rescuedist=3000, build=1, overwrite=true, fastareadlen=500, minid=0.7, k=3, maxindel=0, strictmaxindel, ref=sample.genome.fa, ambiguous=all, in=sample.read.fa, out=mapped.sam] Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.250 Retaining all best sites for ambiguous mappings. Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.449 NOTE: Ignoring reference file because it already appears to have been processed. NOTE: If you wish to regenerate the index, please manually delete ref/genome/1/summary.txt Set genome to 1 Loaded Reference: 0.075 seconds. Loading index for chunk 1-1, build 1 Generated Index: 0.004 seconds. Analyzed Index: 0.002 seconds. Started output stream: 0.021 seconds. Cleared Memory: 0.116 seconds. Processing reads in single-ended mode. Started read stream. Started 8 mapping threads. Detecting finished threads: 0, 1, 2, 3, 4, 5, 6, 7 ------------------ Results ------------------ Genome: 1 Key Length: 3 Max Indel: 0 Minimum Score Ratio: 0.4486 Mapping Mode: normal Reads Used: 1 (20 bases) Mapping: 0.208 seconds. Reads/sec: 4.81 kBases/sec: 0.10 Read 1 data: pct reads num reads pct bases num bases mapped: 100.0000% 1 100.0000% 20 unambiguous: 100.0000% 1 100.0000% 20 ambiguous: 0.0000% 0 0.0000% 0 low-Q discards: 0.0000% 0 0.0000% 0 perfect best site: 100.0000% 1 100.0000% 20 semiperfect site: 100.0000% 1 100.0000% 20 Match Rate: NA NA 100.0000% 20 Error Rate: 0.0000% 0 0.0000% 0 Sub Rate: 0.0000% 0 0.0000% 0 Del Rate: 0.0000% 0 0.0000% 0 Ins Rate: 0.0000% 0 0.0000% 0 N Rate: 0.0000% 0 0.0000% 0 Total time: 0.571 seconds.
Comment