BBMap (aligner for DNA/RNAseq) is now open-source and available for download.

valeng replied

05-03-2024, 11:42 AM
Hi there!

I've been working with 16S DNA metagenomics sequences recently. Before proceeding with DADA2 and Qiime2-amplicon-2023.9, I performed quality filtering using the following code:

for file in "${INPUT_DIR}"/*.fastq; do
base_name=$(basename "${file}" .fastq)
bbduk.sh in="${file}" out="${OUTPUT_DIR}/${base_name}.fastq" qtrim=rl trimq=10
echo "Processed file: ${file}"
done

Afterwards, I conducted a FastQC analysis, and the results looked promising. Initially, I encountered an issue where setting the trimq parameter to 15 or 20 resulted in forward and reverse sequences being of different lengths, making it impossible to proceed with DADA2. To resolve this, I set trimq to 10, and adjusted the parameters in DADA2 accordingly:

qiime dada2 denoise-paired --i-demultiplexed-seqs ${INPUT_DIR}/demux-paired-end.qza --p-trunc-len-f 250 --p-trim-left-f 8 --p-trunc-len-r 230 --p-trim-left-r 8 --o-representative-sequences ${OUTPUT_DIR}/dada2-rep-seqs.qza --o-table ${OUTPUT_DIR}/dada2-table.qza --o-denoising-stats ${OUTPUT_DIR}/dada2-stats.qza

However, despite these adjustments, I encountered a significant decrease in the percentage of merged reads.

Do you have any suggestions on how to improve this situation?
Leave a comment:

nicolask replied

03-29-2024, 02:23 AM

Hello, thank you very much for the tool! I have used it in the past on my older intel-based mac. However, I just downloaded the latest version onto my M3 silicon mac, and I am running into some issues...

Essentially, there are errors when trying to create an index, and when fails when trying to align some fasta sequences.

Code:

$ java -ea -Xmx32G -Xms32G -cp /Users/nicolas/bin/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 in=testmap.fa out=nate.bam ref=/Users/nicolas/genomes/mm39/mm39_ref.fa.gz

Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, in=testmap.fa, out=nate.bam, ref=/Users/nicolas/genomes/mm39/mm39_ref.fa.gz]

Version 39.06

Code:

$ ~/bin/bbmap/bbmap.sh in=testmap.fa out=nate.bam ref=~/genomes/mm39/mm39_ref.fa.gz -Xmx32G

/Users/nicolas/bin/bbmap//calcmem.sh: line 78: [: -v: unary operator expected

java -ea -Xmx32G -Xms32G -cp /Users/nicolas/bin/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 in=testmap.fa out=nate.bam ref=/Users/nicolas/genomes/mm39/mm39_ref.fa.gz -Xmx32G

Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, in=testmap.fa, out=nate.bam, ref=/Users/nicolas/genomes/mm39/mm39_ref.fa.gz, -Xmx32G]

Version 39.06

Code:

Exception in thread "Thread-10" java.lang.AssertionError
at align2.IndexMaker4$BlockMaker$CountThread.fillArrays(IndexMaker4.java:413)
at align2.IndexMaker4$BlockMaker$CountThread.run(IndexMaker4.java:300)

Exception in thread "Thread-13" java.lang.AssertionError
at align2.IndexMaker4$BlockMaker$CountThread.fillArrays(IndexMaker4.java:413)
at align2.IndexMaker4$BlockMaker$CountThread.run(IndexMaker4.java:300)

Indexing threads finished for block 0-3
Indexing threads finished for block 4-6

Generated Index: 65.699 seconds.
Finished Writing: 0.824 seconds.

Exception in thread "main" java.lang.AssertionError
at align2.BBIndex.analyzeIndex(BBIndex.java:130)
at align2.BBMap.loadIndex(BBMap.java:420)
at align2.BBMap.main(BBMap.java:32)

From what I can tell, the indexing seems to be complete as I see the relevant files, but fails completely when trying to do the alignment. For now, I am only trying to align a simple fasta file with about 12 different entries. I do not know if this is a java issue, an M3 issue, or something completely wrong that I'm doing.

Thank you in advance.

Sincerely,

NK

Leave a comment:

susanklein replied

11-27-2023, 05:33 PM
Hi,

Getting this strange error running mapPacBio.sh. The cluster node I'm running it on has 104 cpus, 500GB mem, and 400GB storage. I was getting the same error using 104 cpus so tried lowering it to 48.

The read fasta is 56 GB and the reference fasta is 95 MB.

I've cut down the length of the sequence and 'Sm' string shown.

Thanks,

S.

mapPacBio.sh ref=$CPREF ambig=all nodisk=t nzo=t mappedonly=t minid=0.85 out=bact-bbmap.sam scafstats=cp_bb.mapstats in=$reads fastareadlen=6000 threads=48 maxindel=100 k=15

Loading python3/3.9.2
Loading requirement: intel-mkl/2020.3.304
java -ea -Xmx229788m -cp /g/data/nm31/bin/bbmap/current/ align2.BBMapPacBio build=1 overwrite=true minratio=0.40 fastareadlen=6000 ambiguous=best minscaf=100 startpad=10000 stoppad=10000 midpad=6000 ref=/home/554/ta0341/dy44/r12.18_Roepera_similis/cp-fabids.fasta ambig=all nodisk=t nzo=t mappedonly=t minid=0.75 out=bact-bbmap.sam scafstats=bact_bb.mapstats in=/home/554/ta0341/dy44/r12.18_Roepera_similis/reads.fasta fastareadlen=6000 threads=48 maxindel=100 k=15
Executing align2.BBMapPacBio [build=1, overwrite=true, minratio=0.40, fastareadlen=6000, ambiguous=best, minscaf=100, startpad=10000, stoppad=10000, midpad=6000, ref=/home/554/ta0341/dy44/r12.18_Roepera_similis/cp-fabids.fasta, ambig=all, nodisk=t, nzo=t, mappedonly=t, minid=0.75, out=bact-bbmap.sam, scafstats=bact_bb.mapstats, in=/home/554/ta0341/dy44/r12.18_Roepera_similis/reads.fasta, fastareadlen=6000, threads=48, maxindel=100, k=15]
Version 39.01

Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.400
Set OUTPUT_MAPPED_ONLY to true
Scaffold statistics will be written to bact_bb.mapstats
Set threads to 48
Retaining all best sites for ambiguous mappings.
Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.367
Executing dna.FastaToChromArrays2 [/home/554/ta0341/dy44/r12.18_Roepera_similis/cp-fabids.fasta, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=false, minscaf=100, midpad=6000, startpad=10000, stoppad=10000, nodisk=true]

Set genScaffoldInfo=true
Set genome to 1

Loaded Reference: 0.003 seconds.
Loading index for chunk 1-1, build 1
Indexing threads started for block 0-1
Indexing threads finished for block 0-1
Generated Index: 10.393 seconds.
Analyzed Index: 33.211 seconds.
Started output stream: 1.183 seconds.
Creating scaffold statistics table: 0.036 seconds.
Cleared Memory: 0.175 seconds.
Processing reads in single-ended mode.
Started read stream.
Started 48 mapping threads.
Exception in thread "Thread-43" java.lang.AssertionError: -24, -24
4908336, 1,0,99886506,99894071,0,00,151,154695,351771,0,263804,,mSSSmSSSmSmmmSSSSSSmSSSmmSSmmmSmmmSSmmSSSSSmmmSSSSSmSSSmmSSSmmSSmSSSSSmSSSSSSmSSSSSSmSSSSSSSSSSSmmmSSSSSSSmSSmmSSmSSSSSSSSSSSSSSmmmmmmSmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmSSmmmmmmmmmmmmmmmmmmmmSSmSSSSSmmSSSSmmSSmSmSSmSSSSSSSmSSSSmSmSSSSSSSSmmmSSmSSSSSSSSSSSSmSSSmmSSmSSSmmSSSmmSSmmmmmmmSSmmmmSmSmmSmSSSSSmSSSmmSSSSSSmSSSSmmSSSmSSSSmmSmSSSmmSmSSSmSSSmmmSSSSSmSSSmSSSSmmSSmSSSSSSSmSmSSmSSSSSmSmmSSSSmSSSSSSSmSSSmSmmSSSSSmSSSmSSSmmmSSSmSmSSSmmSmSSmSSSSSmSSSmSmmSSmSSSSmSSmSSmmmmSSSSSSSmSmSSSmSmSSSSmmSmSmSmmmmSSSSSSSSSSSSmmSSmSSSmSmSSSSSSmmSmmSSmSSSmmmSSmSSSSSmmSSSSSmSSSmmSSSSmSSSmSmmSS....SSmSm, ATGAAAGAAAATATTCACATAGCTTTGTTTGTTTGATTGGCCT....CCACAAACTACGAAAATTTCTCTTTTTCCTATCTCTATCCAGATCAGCCGAAGCCAAAGCTCTTATTATTTGCTGAGAAATAGCTCGAGTAAG
at align2.TranslateColorspaceRead.realign_new(TranslateColorspaceRead.java:407)
at align2.AbstractMapThread.genMatchStringForSite(AbstractMapThread.java:1043)
at align2.AbstractMapThread.genMatchString(AbstractMapThread.java:923)
at align2.BBMapThreadPacBio.processRead(BBMapThreadPacBio.java:564)
at align2.AbstractMapThread.run(AbstractMapThread.java:522)
Exception in thread "Thread-20" java.lang.AssertionError: -19, -19
9787423, 1,0,91446916,91454472,0,00,84,35773,287063,0,286984,,SSSSmSSSmSSm....SmSm, AGATAGATTAAAATAAAAA....AATATTCATCGTTTAATTCTATAATAGAATGC
at align2.TranslateColorspaceRead.realign_new(TranslateColorspaceRead.java:407)
at align2.TranslateColorspaceRead.realign_new(TranslateColorspaceRead.java:638)
at align2.AbstractMapThread.genMatchStringForSite(AbstractMapThread.java:1026)
at align2.AbstractMapThread.genMatchString(AbstractMapThread.java:923)
at align2.BBMapThreadPacBio.processRead(BBMapThreadPacBio.java:564)
at align2.AbstractMapThread.run(AbstractMapThread.java:522)
Detecting finished threads: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47

**************************************************************************
Warning! 2 mapping threads did not terminate normally.
Check the error log; the output may be corrupt or incomplete.
Please submit the full stderr output as a bug report, not just this message.
**************************************************************************

------------------ Results ------------------

Genome: 1
Key Length: 15
Max Indel: 100
Minimum Score Ratio: 0.367
Mapping Mode: normal
Reads Used: 12661524 (57423401385 bases)

Mapping: 27513.481 seconds.
Reads/sec: 460.19
kBases/sec: 2087.10

Read 1 data: pct reads num reads pct bases num bases

mapped: 1.9671% 249070 1.9269% 1106468960
unambiguous: 1.8172% 230081 1.8567% 1066207675
ambiguous: 0.1500% 18989 0.0701% 40261285
low-Q discards: 0.0000% 0 0.0000% 0

perfect best site: 0.0023% 288 0.0000% 20624
semiperfect site: 0.0023% 288 0.0000% 20624

Match Rate: NA NA 78.4035% 941243945
Error Rate: 86.9937% 248782 20.7111% 248639475
Sub Rate: 86.8864% 248475 5.4915% 65925963
Del Rate: 83.3997% 238504 7.8336% 94043737
Ins Rate: 83.6599% 239248 7.3860% 88669775
N Rate: 3.2615% 9327 0.8854% 10629277
Exception in thread "main" java.lang.AssertionError:
The number of reads out does not add up to the number of reads in.
This may indicate that a mapping thread crashed.
If you submit a bug report, include the entire console output, not just this error message.
125046+124024+0+12412452+0+0 = 12661522 != 12661524
at align2.AbstractMapper.printOutputStats(AbstractMapper.java:1962)
at align2.AbstractMapper.printOutput(AbstractMapper.java:1060)
at align2.BBMapPacBio.testSpeed(BBMapPacBio.java:491)
at align2.BBMapPacBio.main(BBMapPacBio.java:36)
Leave a comment:
ajverster replied

08-10-2023, 09:05 AM
Hello,

I'm noticing some non-reproducibility when bbduk is writing gzipped files. The reads appear to be the same, but the output order has changed. This leads to subtle changes in metaphlan abundance levels (changes at the 4th significant didget). Is this intended behavior or a bug? Is there anyway to force bbduk to run in a fully reproducible way?

```
bbduk.sh in=DC001_R1.fastq.gz out=DC001_R1.rep1.fastq.gz ftm=5 tpe tbo qtrim=rl trimq=25 minlen=50 ref=adapters,phix -Xmx16g
bbduk.sh in=DC001_R1.fastq.gz out=DC001_R1.rep2.fastq.gz ftm=5 tpe tbo qtrim=rl trimq=25 minlen=50 ref=adapters,phix -Xmx16g
```
Leave a comment:
Kermit replied

07-04-2023, 01:07 PM
Thanks for the tool. In comparison to `cutadapt`, I noticed that `bbduk` (using the options below) ran 20x faster, and excluded all of the over-represented sequences that were flagged by FastQC.

```
bbmap/bbduk.sh in1=file_1.fq.gz in2=file_2.fq.gz out1=file_1_trim.fq.gz out2=file_2_trim.fq.gz ref=adapters ktrim=r k=23 mink=11 hdist=1 tpe tbo
```

Are the above options sensible for trimming paired end Illumina RNA-seq?

Should I be using `ktrim=rl` or a lower `k`?

I did not see adapters.fa file in the bbmap folder, so I just used the 'adapters' string as mentioned in the --help description of ref.

I am new to processing this omic at the read level. Thank you

Last edited by Kermit; 07-04-2023, 01:15 PM.
Leave a comment:
Dee Yoo replied

06-05-2023, 05:37 PM
Good day,

I am trying to use BBDuk to remove adapter and to perform quality trimming. The sequencing facility told me that they used the following:

Adapter (Index)
UDI0003

Read 1
AGATCGGAAGAGCACACGTCTGAACTCCAGTCA

Read 2
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

Can you help me what command should I use to detect these adapters and to trim the reads @Q10?

Thank you!
Leave a comment:
bpeacock44 replied

04-10-2023, 05:51 PM
Hello, I am trying to use bbnorm to normalize a very large dataset - 657G per paired-end file. I've been given access to a node with 208 cores and 5888 GB RAM and I assumed it would be able to process the files, but I keep getting the following error:

Exception in thread "Thread-1407" java.lang.RuntimeException: java.io.IOException: No space left on device at stream.ReadStreamByteWriter.run(ReadStreamByteWriter.java:32)
Caused by: java.io.IOException: No space left on device (more lines follow - I can share them if they'd be helpful.)

It continues to run and later on prints again

Exception in thread "Thread-1408" java.lang.RuntimeException: java.io.IOException: No space left on device at stream.ReadStreamByteWriter.run(ReadStreamByteWriter.java:32)
Caused by: java.io.IOException: No space left on device

Then later on I get the following:

Exception in thread "Thread-1497" java.lang.RuntimeException: Writing to a terminated thread.

I ended the run at this point because I wasn't sure if the output would be viable or what exactly was happening. I was wondering a few things:

1) Should I let it keep running? Will the output be usable?
2) If the node I'm using isn't sufficient, is there a way to figure out what would be? When I ran loglog.sh, I got the following: Cardinality: 45491636111 which I calculate to mean that I'd need roughly 500 GB of RAM with the default parameters. (Assuming I made the correct calculation.)
3) Are there changes I can make to the command to lessen the load? I tried increasing the min from 5 to 10 with no success, and then I tried decreasing the hash number from 3 to 2 and then to 1 - again with no change in result.

Here is the command I used:

HTML Code:

bbnorm.sh prefilter=TRUE in1=${WDIR}/${JB}_R1_001.clean.fastq in2=${WDIR}/${JB}_R2_001.clean.fastq out1=${WDIR}/${JB}_R1_001.norm.fastq out2=${WDIR}/${JB}_R2_001.norm.fastq target=100 min=5

I originally ran it with BBMap version 38.63 and then I tried updating it to the latest and got the same result.

Thank you so much!
Leave a comment:
santiagorevale replied

11-17-2022, 06:59 AM
Hi Brian,

I ran filterbyname.sh to extract some FastQ reads from a bigger FastQ file and I noticed that on the quality line of the output, the # (hashtag symbol) gets replaced by an ! (exclamation mark symbol). I checked the documentation and the forums about this behaviour but couldn't find anything.

Is this expected? If so, why?

Mine were latest FastQ files produced by BaseSpace from an Illumina MiSeq instrument.

Thanks in advance.

Best,
Santiago
Leave a comment:
Esteban Gongora replied

11-02-2022, 12:45 PM
Hi Brian Bushnell, thank you very much for developing BBMap. It is such a useful tool!

I am trying to map sequencing reads to a metagenome I had previously assembled and I am getting a very strange output. After running the alignment, I also ran pileup.sh to obtain coverage stats and I got the following output:

Code:

pileup.sh in=mapped.sam out=stats.txt Reads: 18093134 Mapped reads: 3652730 Mapped bases: 331276393 Ref scaffolds: 134515 Ref bases: 43282856 Percent mapped: 20.188 Percent proper pairs: 12.341 Average coverage: 7.654 Average coverage with deletions: 7.689 Standard deviation: 9.676 Percent scaffolds with any coverage: 99.51 Percent of reference bases covered: 99.34

If I interpreted this summary correctly, while most of my contigs/scaffolds are being covered, only ~20% of my input reads are being mapped to said contigs. However, I do no understand what happened with the other ~80% since those were the same reads I used to assemble the contigs. Does this mean that my assembler (metaSPAdes) was not able to assemble that ~80% into any kind of contigs? And if that is the case, do you have any idea why this happens? Sorry if this last question is beyond the scope of this forum.

Thanks in advance with any help you might be able to provide,

Esteban
Likes 2
Leave a comment:
KenC replied

10-18-2022, 12:23 AM
Brian Bushnell Thanks for providing this very useful toolset.

I have a few questions wrt the use of BBSplit, and I guess it's OK to put them in a single post...

[Edited: well, there has been no reply for a while and I figured out some of the answers myself, and I add them below just in case someone else run into the same problems...]

1. Is there any recommendation in terms of computation resource allocation and speeding up a run? For human+mouse genome (a total of 5.6GB .fasta files) and decompressed fastq files at about 50GB each (paired-end), I found that nearly 200GB memory was needed when using 1 thread, and it can take several days or even up to a week to run. Is this normal/expected?

[Edited: in one of my later applications where I only needed to map one of the read-pairs (decompressed at around 50G), I managed to increase the number of threads to 16, with 10G memory per thread, setting java heap size around 50G, and BBSplit finished running overnight.]

2. From the user guide I read that the `maxindel` parameter should be set to a large number for RNA-seq (e.g. 200k for human according to the guide). Another example in the guide for BBMap says:

Code:

To map vertebrate RNA-seq reads to a genome: bbmap.sh in=reads.fq out=mapped.sam maxindel=200k ambig=random intronlen=20 xstag=us

I could not find additional information on properly different parameter settings for e.g. whole exome sequencing (WES) vs RNA-seq data. Is the above a good default for human+mouse RNA-seq while running BBSplit? (except for that intronlen is probably irrelevant for BBSplit, and xstag may be application-specific)

3. I have this problem on re-using genome index as follows:

In this post, someone asked:

After running bbsplit once using the syntax: ref=ref1.fa,ref2.fa, is it possible to re-use that index on subsequent runs using the path= parameter?

And the reply was:

Yes, it is. Also, with BBSplit, I think it will try to regenerate the index every time as long as "ref=" is specified, even if it already exists, so only do that once.

However, I have some trouble figuring out how exactly to reuse pre-built genome index when running BBSplit. Below is a toy example.

Let's say I first build an index for two reference genomes, x.fa and y.fa as follows:

Code:

bbsplit.sh build=1 ref=x.fa,y.fa path=./test

Next I would like to reuse the pre-built index for splitting many fastq files; for one pair of files, I tried, e.g.:

Code:

bbsplit.sh in1=r1.fq in2=r2.fq basename=o%_#.fq build=1 path=./test

However, when I run it, it seems that the index was re-built; contents under ./test/ref/genome/1 and ./test/ref/index/1 were removed and then re-generated from scratch. Indeed, there is this message below in the log.

Code:

NOTE: Deleting contents of ./test/ref/genome/1 because reference is specified and overwrite=true NOTE: Deleting contents of ./test/ref/index/1 because reference is specified and overwrite=true Writing reference. Executing dna.FastaToChromArrays2 [ref/genome/1/merged_ref_1939486580590361.fa.gz, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=1, midpad=300, startpad=8000, stoppad=8000, nodisk=false] Set genScaffoldInfo=true Writing chunk 1 Writing chunk 2 ......

Even if I do not specify `path=./test` in both commands and just leave it to the default (i.e. the current dir), the above still happens. I also tried to specify `overwrite=f` or `ow=f`, but that did not seem to have an effect either.

Besides, I also have a related question, which is whether I can pre-build indices separately for two genomes x.fa and y.fa, saved in different paths or with different build numbers, and then later reuse them for aligning with BBSplit (potentially with something like `build=1,2` or `path=<first_path>,<second_path>`).

[Edited: I figured this out. I first build the index with `bbsplit.sh ref=x.fa,y.fa`, which generated the index in a folder named `ref`, then in the subsequent BBSplit run for each sample organized in its own sample-specific dir, I first link the `ref` folder to the sample dir, then from the sample dir run with something like `bbsplit.sh ref=x.fa,y.fa in1=r1.fq in2=r2.fq basename=o%_#.fq`, now BBSplit can find the index without any issue.]

I am using BBMap version 39.01, the latest version to date.

I appreciate your kind help in advance.
Last edited by KenC; 11-14-2022, 12:43 AM.
Leave a comment:
jmw86069 replied

10-11-2022, 10:54 AM
Is there anything else I need to do to test some Java changes to debug where the issue is happening?
For example, I would like to turn on verboseS=t and have it spit out debug info. I think the pairlen (MAX_PAIR_DIST) is just not being applied from what I can tell from verbose=t output.
Leave a comment:

jmw86069 replied

10-07-2022, 06:37 AM

Thank you for the response Brian Bushnell !
We are using the "pairedonly=t" flag, however the reads are still returned with PROPER_PAIR in the SAM file (sam flags 83 and 163).
It appears the MAPQ is penalized, probably due to the repetitive sequence in read2, however the reads are still shown as properly paired.

Maybe you'll spot something in the arguments below?
I've tried all combinations and all SAM alignments have the tag PROPERLY_PAIRED, and are included in the mapped output.
You can see I also added killbadpairs=t and pairedonly=t, but maybe something else has gone wrong.

Code:

bbmap.sh \
   in=testA_1.subset.fq.gz in2=testA_2.subset.fq.gz \
   ref=/reference_genomes/mm39/mm39canonical.fa \
   path=/reference_genomes/mm39/bbmap_mm39canonical/ \
   out=testA.subset.mm39.maxindel20.pairlen35000.matelen33000.mapped.bam \
   outu=testA.subset.mm39.maxindel20.pairlen35000.matelen33000.unmapped.bam \
   mappedonly=t \
   trimreaddescriptions=t \
   t=3 \
   maxindel=20 \
   strictmaxindel=t \
   inserttag=t \
   matelen=33000 \
   pairlen=35000 \
   intronlen=999999999 \
   ambig=random \
   minid=0.76 \
   killbadpairs=t \
   pairedonly=t \
   covstats=testA.subset.mm39.maxindel20.pairlen35000.matelen33000.mapped.bbmap_covstats.txt \
   covhist=testA.subset.mm39.maxindel20.pairlen35000.matelen33000.mapped.bbmap_covhist.txt \
   ihist=testA.subset.mm39.maxindel20.pairlen35000.matelen33000.mapped.bbmap_inserthist.txt \
   statsfile=testA.subset.mm39.maxindel20.pairlen35000.matelen33000.mapped.bbmap_alignmentReport.txt \
   2>&1 | tee -a output_testA.subset.mm39.maxindel20.pairlen35000.matelen33000.mapped_bbmap.txt

The SAM output is shown below, and all variations of options above appear to have no effect on the SAM flags and tags:

Code:

testA.subset.mm39.maxindel1400.pairlen1200.matelen1000.mapped.bam
QNAME                                                                FLAG  RNAME  POS       MAPQ  CIGAR                  RNEXT  PNEXT     TLEN     SEQ                                                  QUAL                                                 FLAGS
testA.subset.mm39.maxindel20.pairlen35000.matelen33000.mapped.bam
NS500489:195:HLKT3BGXX:1:12109:18917:8143                            83    chr5   11805066  2     3X4=1X2=1X4=2X2=3X29=  =      11375902  -429215  ACATGCCAAAATAACAACCCACTTTTAAGTGTATAATTCATTACACAATGC  //AE//E//E/E/E///EE///EEEEEEEEEEEEEEEEEEEEEEEEAAAAA  XT:A:R  NM:i:10  AM:i:2    X8:Z:429215
NS500489:195:HLKT3BGXX:1:12109:18917:8143                            163   chr5   11375902  3     51=                    =      11805066  429215   GACAGAGACAGAGAGAGGAACATAGACAGAGACAGAGAGAGGAACATAGAC  AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEE/E/E/E/EE/E  XT:A:R  NM:i:0   AM:i:3    X8:Z:429215
NS500489:195:HLKT3BGXX:1:11102:2182:13318                            99    chr17  20345318  42    51=                    =      20345469  202      ACTGGTGGCCTTATAACTTGTGTAATAAGAAATGTTTGAGTTTAACTATAA  AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE  NM:i:0  AM:i:42  X8:Z:202
NS500489:195:HLKT3BGXX:1:11102:2182:13318                            147   chr17  20345469  42    51=                    =      20345318  -202     AGCTAAGAAATAGTGAGAGCTGTGAAAATTATTCTTTCCCAGGAAGGGTAG  AEEEEAEEEEEEEEEEEEEEEEAEEEEEEEEEEE6EEEEEEE/EEEAAAAA  NM:i:0  AM:i:42  X8:Z:202

The end goal is not to include that alignment in the "mapped" output file. I can pass it through awk but that's like adding a speed bump to a racetrack, really awk-ward. Yeah, I said it.

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, 07-25-2024, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin 07-25-2024, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 28 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 161 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News