Seqanswers Leaderboard Ad

**Brian Bushnell** · 05-15-2015, 02:18 PM

First off, I'm using v34.94 and you're using v33.64, so there may be a bit of a discrepancy. But it looks like BBMap is refusing to map to that location (which has an 894bp exact match, cigar 14S894=245S) because there is another location that scores so much higher: 1073 matches and 1bp deletion, cigar 14S765=1D308=66S.

BBMap has some heuristics that skip searching for sites that are substantially lower scoring than others, and it considers 1073 matches, 1 deletion, and 80 soft-clipped bases to be much higher scoring than 894 matches and 259 soft-clipped bases. You can make the read align there several ways:

Code:

mapPacBio.sh in=lbc1_read.fasta out=mapped2.sam ow strictmaxindel=0

mapPacBio.sh in=lbc1_read.fasta out=mapped2.sam ow semiperfectmode

bbmapskimmer.sh in=lbc1_read.fasta out=mapped2.sam ow kfilter=800

Any of those will give you the alignment in question, but they each have other disadvantages - the first one will not allow any indels, the second does not allow any indels or mismatches, just clipping (it's very fast, though), and the third requires a minimum of 800bp consecutive exact match. So they work in this case but none will ensure you get all 99% identity alignments.

You can also forcibly align a single query sequence to everything in the reference like this:

Code:

msa.sh in=full_reference.fasta out=mapped.sam msa=MultiStateAligner9PacBio query=AGCTTGGAGACAACATGTGGTTCTTGACAGCTCTGCTCCTTTGGGTTCCAGTTGATGGGCAAGTGGATACCACAAAGGCAGTGATCACTTTGCAGCCTCCATGGGTCAGCGTGTTCCAAGAGGAAACTGTAACCTTACAGTGTGAGGTGCCCCGTCTGCCTGGGAGCAGCTCCACACAGTGGTTTCTCAATGGCACAGCCACTCAGACCTCGACTCCCAGCTACAGAATCACCTCTGCCAGTGTCAAGGACAGTGGTGAATACAGGTGCCAGAGAGGTCCCTCAGGGCGAAGTGACCCCATACAGCTGGAAATCCACAGAGACTGGCTACTACTGCAGGTATCCAGCAGAGTCTTCACAGAAGGAGAACCTCTGGCCTTGAGGTGTCATGCATGGAAGGATAAGCTGGTGTACAATGTGCTTTACTATCAAAATGGCAAAGCCTTTAAGTTTTTCTACCGGAATTCTCAACTCACCATTCTGAAAACCAACATAAGTCACAACGGCGCCTACCACTGCTCAGGCATGGGAAAGCATCGCTACACATCAGCAGGAGTATCTGTCACTGTGAAAGAGCTATTTCCAGCTCCAGTGCTGAATGCATCCGTGACATCCCCGCTCCTGGAGGGGAATCTGGTCACCCTGAGCTGTGAAACAAAGTTGCTTCTGCAGAGGCCTGGTTTGCAGCTTTACTTCTCCTTCTACATGGGCAGCAAGACCCTGCGAGGCAGGAACACGTCCTCTGAATACCAAATACTAACTGCTAGAAGAGAAGACTCTGGTTTTACTGGTGCGAGGCCACCACAGAAGACGGAAATGTCCTTAAGCGCAGCCCTGAGTTGGAGCTTCAAGTGCTTGGCCTCCAGTTACCAACTCCTGTCTGGCTTCATGTCCTTTTCTATCTGGTAGTGGGAATAATGTTTTTAGTGAACACTGTTCTCTGGGTGACAATACGTAAAGAACTGAAAAGAAAGAAAAAGTGGAATTTAGAAATCTCTTTGGATTCTGCTCATGAGAAGAAGGTAACTTCCAGCCTTCAAGAAGACAGACATTTAGAAGAAGAGCTGAAGAGTCAGGAACAAGAATAAAGAACAGCTGCAGGAAGGGGTGCACCGGAAGGAGCCTGAGGAGGCCAAGTAGCAGCAGCTCAGTGG

...although this is not really what I designed that program for.

So, overall, I don't think you can make BBMap do exactly what you want in general because its opinions of "best alignment" are slightly different than yours, but you could map everything in semiperfectmode and then just re-map the unmapped reads (which you separate using outm and outu streams) allowing 99% identity. That's probably the closest you can get.

**lankage** · 05-22-2015, 10:29 AM

Hi Brian,
Thanks for helping with the PacBio mapping issues. I have another, unrelated question about usage of bbmap. Is there a way to have bbmap output ALL mappings that meet a certain threshold specified with options such as subfilter=X? Using the ambiguous=all parameter, I cant get all equivalent mappings reported for a read, but can I use options to return all mappings that score >= a threshold specified with options like subfilter? Although this isn't the intended usage of the program, I would like to be able to search large sequence sets with primer sequences and look for opportunities where primer pairs sit in proximity to each other while allowing some mismatching.
For example, if a set of primers is designed to perfectly match an intended target site, can I use bbmap to explore less perfect sites for hybridization opportunities?

**Brian Bushnell** · 05-22-2015, 10:40 AM

BBMap is designed to find the best mapping, and heuristics will cause it to ignore mappings that are valid but substantially worse. Therefore, I made a different version of it, BBMapSkimmer, which is designed to find all of the mappings above a certain threshold. The shellscript is bbmapskimmer.sh and the usage is similar to bbmap.sh or mapPacBio.sh. For primers, which I assume will be short, you may wish to use a lower than default K of, say, 10 or 11, and add the "slow" flag.

I also wrote another pair of programs specifically for working with primer pairs, msa.sh and cutprimers.sh. msa.sh will forcibly align a primer sequence (or a set of primer sequences) against a set of reference sequences to find the single best matching location per reference sequence - in other words, if you have 3 primers and 100 ref sequences, it will output a sam file with exactly 100 alignments - one per ref sequence, using the primer sequence that matched best. Of course you can also just run it with 1 primer sequence.

So you run msa twice - once for the left primer, and once for the right primer - and generate 2 sam files. Then you feed those into cutprimers.sh, which will create a new fasta file containing the sequence between the primers, for each reference sequence. We used these programs to synthetically cut V4 out of full-length 16S sequences.

I should say, though, that the primer sites identified are based on the normal BBMap scoring, which is not necessarily the same as where the primers would bind naturally, though with highly conserved regions there should be no difference.

**lankage** · 05-26-2015, 08:11 AM

BBmapskimmer.sh

Hi Brian,
I have been experimenting with bbmapskimmer.sh and had a question about flags in the SAM output. I have been looking at a pair of primers matched against the Macaca fasicularis reference genome V5 for the potential to amplify unwanted sites. I have some python code that reads in sam output and draws a visualization of a primer to subject sequence alignment. I noticed in one case, the sam output has the 256 flag for a sub alignment of a sequence that would seem to suggest that the reverse complement of the query sequence (primer) maps to chromosome 12 at the given position. When i pull out Macaca reference genomic sequence at the given coordinates, I see a match to the 5' - 3' FWD orientation of the primer, rather than the reverse complement of the primer. Is there not a way to tell what orientation the query sequence matches the reference if the query hit is stored as a sub mapping?

Sam output:
@HD VN:1.4 SO:unsorted
@SQ SN:10 LN:96509753
@SQ SN:11 LN:137757926
@SQ SN:12 LN:132586672
@SQ SN:13 LN:111193037
@SQ SN:14 LN:130733371
@SQ SN:15 LN:112612857
@SQ SN:16 LN:80997621
@SQ SN:17 LN:96864807
@SQ SN:18 LN:75711847
@SQ SN:19 LN:59248254
@SQ SN:1 LN:227556264
@SQ SN:20 LN:78541002
@SQ SN:2 LN:192460366
@SQ SN:3 LN:192294377
@SQ SN:4 LN:170955103
@SQ SN:5 LN:189454096
@SQ SN:6 LN:181584905
@SQ SN:7 LN:171882078
@SQ SN:8 LN:146850525
@SQ SN:9 LN:133195287
@SQ SN:MT LN:16575
@SQ SN:X LN:152835861
@PG ID:BBMap PN:BBMap VN:34.92 CL:java -Djava.library.path=/slipstream/home/graham/bbmap/jni/ -ea -Xmx160531m align2.BBMapPacBioSkimmer build=1 overwrite=true minratio=0.40 fastareadlen=6000 ambig=all minscaf=100 startpad=10000 stoppad=10000 midpad=6000 in=primers.fas ref=mafa5.fa out=primerhits.4.sam k=6 slow subfilter=4
FCGR2A.3F 16 1 90182861 40 19= * 0 0 ACTGTGCCAACGTCCAGTC * NM:i:0 AM:i:40 NH:i:2
FCGR2A.3F 256 12 127476172 32 1=1X17= * 0 0 * * NM:i:1 AM:i:32 NH:i:2
FCGR2A.3R 0 1 90169340 40 20= * 0 0 TTGTCATCCACTCAGCAAGC * NM:i:0 AM:i:40 NH:i:2
FCGR2A.3R 256 3 184441394 32 9=1X10= * 0 0 * * NM:i:1 AM:i:32 NH:i:2

This line in question:
FCGR2A.3F 256 12 127476172 32 1=1X17= * 0 0 * * NM:i:1 AM:i:32 NH:i:2

My output:
[PRE]
*** FCGR2A.3F 1 RC 90182861 19=
ACTGTGCCAACGTCCAGTC
|||||||||||||||||||
ACTGTGCCAACGTCCAGTCGGGTT
*** FCGR2A.3F 12 RC 127476172 1=1X17=
ACTGTGCCAACGTCCAGTC
| |||||||||||||||||
GGCTGGACGTTGGCACAGTGACCA
*** FCGR2A.3R 1 FWD 90169340 20=
TTGTCATCCACTCAGCAAGC
||||||||||||||||||||
TTGTCATCCACTCAGCAAGCTGAGA
*** FCGR2A.3R 3 FWD 184441394 9=1X10=
TTGTCATCCACTCAGCAAGC
||||||||| ||||||||||
TTGTCATCCTCTCAGCAAGCCCAAG[/PRE]

**Brian Bushnell** · 05-26-2015, 09:35 AM

The first alignment has bitflag 16, indicating it is reverse-complemented, and the second has bitflag 256, indicating it is forward (but a secondary alignment). Since the primary alignment is reverse-complemented, the bases are displayed reverse-complemented (required by SAM specification). So, they need to be reverse-complemented again before displaying them for the second alignment.

You can optionally add the flag "saa=f" (secondaryalignmentasterisks=false) which will make BBMap print out the query bases in the correct orientation for every alignment, rather than just the primary.

**lankage** · 05-28-2015, 10:36 AM

mapPacBio.sh - soft clipped mappings

Hi Brian, yet again another question about mapPacBio.sh. I would like to use mapPacBio.sh to map full length MHC allele sequences from pac bio sequencing against a reference dataset which contains partial MHC allele sequences, and separate out those that represent extensions of alleles in the dataset, and those that are putative novels. I would like to be able to specify perfect identity to a reference within the overlap portion but when i use the "idfilter=1 local" parameter, I get no mappings. If i use idfilter=.99, i can see the type of mapping i expect to find as below with only soft clipped bases, but also have mappings with mismatches. Is there a way to restrict the mapping results to those with ONLY soft clipped bases?

bbmap/mapPacBio.sh in=filtered_contigs.fasta ref=Mafa_MHC-I_coding_throughPacBio5-15.05.22.fasta outm=filtered_contigs_mapped.sam outu=filtered_contigs_unmapped.fasta idfilter=.99 local -Xmx4g

m150505_221415_42134_c100814192550000001823181311031594_s1_p0/161219/ccs/lbc13.merged.zerolen.fastq_1;size=22; 0 Mafa_B_148_01_01 1 43 9S1080= *

**Brian Bushnell** · 05-28-2015, 10:45 AM

Ah... that's unintended. When you set "idfilter=1" it turns on "perfectmode", which was unintentional; I'll fix that. It SHOULD turn on "semiperfectmode", which is what you need - no substitutions or indels, but going off the end of a reference is allowed.

So, in this case, take out the "idfilter" flag and add "semiperfectmode", and you'll get the results you want. Although 100% perfect, non-soft-clipped reads will still be present. You could remove those with a double-mapping, though, like this:

mapPacBio.sh in=reads.fq out=semiperfect.fq semiperfectmode

mapPacBio.sh in=semiperfect.fq outm=perfect.sam outu=clipped.sam perfectmode

-Brian

**blsfoxfox** · 05-31-2015, 11:34 AM

Originally posted by Brian Bushnell View Post

Sure. For simplicity, I will assume the reads are interleaved in a single file initially (you can transform between interleaved and dual-file reads with reformat.sh if you want). First thing is to trim the adapters with BBDuk:

Code:

bbduk.sh in=reads.fq out=trimmed.fq ktrim=r k=23 mink=12 hdist=1 tbo tpe ref=nextera_LMP_adapter.fa.gz

nextera_LMP_adapter.fa.gz is in /bbmap/resources/.

Next, if you want to remove contaminants/spikeins such as PhiX, you can do so now with a filtering pass of BBDuk, but that's optional.

Finally, split the reads into LMP pairs, short pairs, and singletons:

Code:

splitnextera.sh in=trimmed.fq out=longmatepairs.fq outf=fragmentpairs.fq outu=unknownpairs.fq outs=singletons.fq mask=t

At this point you could do quality-trimming if you want to on the individual sub-libraries; it's important not to do so sooner as it interferes with the ability to detect junctions. If all you care about is the LMP data you can just keep longmatepairs.fq and throw away the others, but theoretically, you might be able to get enough fragment pairs and singletons to assemble the contigs from those, then use the LMP for scaffolding, and thus get a complete single-library assembly. That's what Illumina is encouraging, at any rate.

The majority of the reads in the "unknown" bin were LMP, not short-fragment reads, in the one library I tested (and also according to Illumina's data), but I would not really consider it safe to use for scaffolding OR contig-building, and would probably throw it away unless you don't have enough data.

Hi Brian,

Thesedays I am thinking of several assembly questions related to what you said above.
1) After removing those "pair-end" and "single-end" contamination, why can't we reverse_complement those LMP reads and treat them as pair-end reads for the contig assembly stage? Maybe due to the larger variance of the LMP insert sizes?
2) For those "unknown" reads, I think it is also helpful to assemble them as single ends, right?

**Brian Bushnell** · 06-01-2015, 09:15 AM

Originally posted by blsfoxfox View Post

Hi Brian,

Thesedays I am thinking of several assembly questions related to what you said above.
1) After removing those "pair-end" and "single-end" contamination, why can't we reverse_complement those LMP reads and treat them as pair-end reads for the contig assembly stage? Maybe due to the larger variance of the LMP insert sizes?

You can... whether that's a good idea depends on the assembler. Some assemblers probably use LMP reads for kmers just like the pair-end reads. Particularly if you don't have enough coverage of PE reads, using the LMP reads that way is probably fine, though they will probably have a more biased coverage distribution than the PE reads, which could interfere with the heuristics of some assemblers.

2) For those "unknown" reads, I think it is also helpful to assemble them as single ends, right?

Virtually all of the unknown-binned reads are non-overlapping LMP reads that you can't assemble into single reads (in the data I have examined). They end up in the unknown bin because the junction adapter was not visible in either read. But that usually means that the junctions were in the unsequenced portion between the two reads. This really depends on your insert size distribution (the physical insert size of the sequenced fragments, not the insert size of the long transposased pieces). If you fragment to substantially longer than 2x read length, a lot of LMP pairs will end up in the unknown bin because the junction is in the unsequenced middle. If you fragment to a shorter insert size, such that most of your pairs overlap, the unknown bin would consist more of PE reads.

The latest version of splitnextera has an option to attempt to merge the reads by overlap before looking for the junction. That way, it is better able to determine whether a pair belongs in the unknown bin - if they overlap, and do not contain a junction, they go to singleton rather than unknown; if they do contain a junction, they go to LMP; so the only pairs that end up in unknown are the ones that don't overlap AND don't have a junction adapter. So, fewer reads end up in unknown; but it will only be useful on libraries that have overlapping fragments.

**blsfoxfox** · 06-01-2015, 09:56 AM

Originally posted by Brian Bushnell View Post

You can... whether that's a good idea depends on the assembler. Some assemblers probably use LMP reads for kmers just like the pair-end reads. Particularly if you don't have enough coverage of PE reads, using the LMP reads that way is probably fine, though they will probably have a more biased coverage distribution than the PE reads, which could interfere with the heuristics of some assemblers.

Virtually all of the unknown-binned reads are non-overlapping LMP reads that you can't assemble into single reads (in the data I have examined). They end up in the unknown bin because the junction adapter was not visible in either read. But that usually means that the junctions were in the unsequenced portion between the two reads. This really depends on your insert size distribution (the physical insert size of the sequenced fragments, not the insert size of the long transposased pieces). If you fragment to substantially longer than 2x read length, a lot of LMP pairs will end up in the unknown bin because the junction is in the unsequenced middle. If you fragment to a shorter insert size, such that most of your pairs overlap, the unknown bin would consist more of PE reads.

The latest version of splitnextera has an option to attempt to merge the reads by overlap before looking for the junction. That way, it is better able to determine whether a pair belongs in the unknown bin - if they overlap, and do not contain a junction, they go to singleton rather than unknown; if they do contain a junction, they go to LMP; so the only pairs that end up in unknown are the ones that don't overlap AND don't have a junction adapter. So, fewer reads end up in unknown; but it will only be useful on libraries that have overlapping fragments.

Hi Brian,

Thanks for your reply!
I understand your points of the differences between "unknown" and "single-end" reads. However, I am not going to merge the "unknown" pairs into one single read, but treat one pair of reads as two single reads only for the contig assembly step. By doing this, we neither throw the "unknown" reads away nor rely on their mate-pair information to do scaffolding as insurance. Is it valid to you?

**Brian Bushnell** · 06-01-2015, 10:03 AM

Oh, yes, that sounds perfectly fine. You may want to adapter-trim the unknown reads first (treating them as singletons) by specifying the junction-adapter as the adapter sequence, like this:

bbduk.sh in=unknown.fq int=f out=trimmed.fq adapter=CTGTCTCTTATACACATCTAGATGTGTATAAGAGACAG k=21 mink=5 ktrim=r hdist=1

That will ensure you trim junction adapter from the end of the read, in case it was present but not detected because it was just a few bases. Otherwise the unknown bin may be enriched for reads with junction adapter at the end, that was just too short to be positively identified. Technically it could be present on the left side, as well. Making a base-frequency histogram of the unknown bin might be useful; I have not done that.

**luc** · 06-04-2015, 05:24 PM

Hi Brian,

what are the applications/assembly operations for which tadpole.sh is designed? In part it replaces BBmerge?

Thanks!

**Brian Bushnell** · 06-05-2015, 09:47 AM

Hi Luc,

Tadpole does not replace BBMerge, though I plan to further integrate them in the future - mainly, so that BBMerge can first attempt to merge, then if unsuccessful extend the reads using Tadpole, then attempt to merge again, and if still unsuccessful, undo all the changes.

The main reason I made "Tadpole" was because I need to quantify the insert size of libraries, to determine whether they are acceptable; for example, if a project needs a 2x150bp library with a 500bp insert size, for an unknown organism... how do you determine whether it passes? BBMerge only works when the reads are largely overlapping. So, I wrote Tadpole to extend the right end of non-overlapping reads so that they will overlap and can be merged. So far, it works really well on 2x150bp single-cell data with a 350bp insert size, but I have not tested it further than that.

Tadpole is a complete (and very fast) assembler; you can run "tadpole.sh in=reads.fq out=contigs.fa" and it will give you a conservative assembly with a low error rate. The main drawback is that currently the max kmer length is 31, and as such the continuity is poor. I'm evaluating it for use in making a quick assembly for mapping reads to recalibrate their quality, prior to feeding them to a more sophisticated assembler; for recalibration, a low error rate and low misassembly rate is more important than continuity.

For extending reads, the command would be:
tadpole.sh in=reads.fq extend=reads.fq oute=extended.fq mode=extend extendleft=100 extendright=100

That will extend the reads by up to 100bp in each direction, stopping early if a branch is hit. For extending paired reads so that they overlap, only “extendright” is needed, so “extendleft” should be set to zero.

**lankage** · 06-08-2015, 12:02 PM

Hi Brian, got another one for you. This time im working with Illumina miseq reads and mapping them to a HIV reference sequence for the purpose of determining the presence or absence of a variety of HIV drug resistance associated mutations. One step in my workflow requires the use of the mapping information to determine which mutation sites are spanned by a mapped read. I have a list of base coordinates based on the HIV reference sequence. I noticed that in some instances with a large insert relative to the reference, that the mapping start coordinate does not begin with the first base of the read sequence. In the example provided, the start coordinate of 3804 is actually 29 bases in from the start of the read. This is following a 27 base insert per the cigar string. I would like to be able to reliably determine the complete interval with which a read overlaps the reference sequence so I can know which mutation sites are covered. Can i tweak the parameters to get the start coordinate to reflect the start of the read sequence in a non-soft clipped mapping?

Example SAM record:

Code:

M00196:47:000000000-A1CEL:1:1103:11833:21618 1:N:0:1	16	NC_001802DRannotations_(modified)	3804	19	2=27I2=1X5=1X23=1X27=1X82=	*	0	0	ATCACTAGCCATTGCTCTCCAATTACTGTGAGCATGAAAAATATCACAGTAATTGGAGAGCTATGGCTAGTGATTTTAACCTGCCACCTATAGTAGCAAAAGAAATAGTAGCCAGCTGTGATAAATGTCAGCTAAAAGGAGAAGCCATGCATGGACAAGTAGACTGTAGTCC	????ABBBDDDDDEEDGGGGGGIIIIIIHHIHHHHIHIIIIIIIHIIIIHHHHHIIIIIHHHHHHIHIIIIIHHIIHHGHIIIHFFHIIIHGGGIHHHIIHHGFIHIIIIHIHHHFHGHHIHHHHHHFFHHHHHHIHHHHHHHHIIHHFIHIHHFFFFFFFDDDDDDDDBBB	NM:i:31	AM:i:19

Read with expanded cigar string:
ATCACTAGCCATTGCTCTCCAATTACTGTGAGCATGAAAAATATCACAGTAATTGGAGAGCTATGGCTAGTGATTTTAACCTGCCACCTATAGTAGCAAAAGAAATAGTAGCCAGCTGTGATAAATGTCAGCTAAAAGGAGAAGCCATGCATGGACAAGTAGACTGTAGTCC
MMIIIIIIIIIIIIIIIIIIIIIIIIIIIMMXMMMMMXMMMMMMMMMMMMMMMMMMMMMMMXMMMMMMMMMMMMMMMMMMMMMMMMMMMXMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM

HTML Code:

[PRE]
>>NC_001802DRannotations_(modified)                       (9181 nt)
 initn: 652 init1: 652 opt: 684  Z-score: 836.6  bits: 167.0 E(1): 8.7e-45
banded Smith-Waterman score: 684; 97.2% identity (97.2% similar) in 144 nt overlap (29-172:3805-3948)

                 10        20        30        40        50        
offend   ATCACTAGCCATTGCTCTCCAATTACTGTGAGCATGAAAAATATCACAGTAATTGGAG
                                     ::: ::::: ::::::::::::::::::::
NC_001 AUUUUUAGAUGGAAUAGAUAAGGCCCAAGAUGAACAUGAGAAAUAUCACAGUAAUUGGAG
         3780      3790      3800      3810      3820      3830    

       60        70        80        90       100       110        
offend AGCTATGGCTAGTGATTTTAACCTGCCACCTATAGTAGCAAAAGAAATAGTAGCCAGCTG
       ::: ::::::::::::::::::::::::::: ::::::::::::::::::::::::::::
NC_001 AGCAAUGGCUAGUGAUUUUAACCUGCCACCUGUAGUAGCAAAAGAAAUAGUAGCCAGCUG
         3840      3850      3860      3870      3880      3890    

      120       130       140       150       160       170        
offend TGATAAATGTCAGCTAAAAGGAGAAGCCATGCATGGACAAGTAGACTGTAGTCC      
       ::::::::::::::::::::::::::::::::::::::::::::::::::::::      
NC_001 UGAUAAAUGUCAGCUAAAAGGAGAAGCCAUGCAUGGACAAGUAGACUGUAGUCCAGGAAU
         3900      3910      3920      3930      3940      3950    

NC_001 AUGGCAACUAGAUUGUACACAUUUAGAAGGAAAAGUUAUCCUGGUAGCAGUUCAUGUAGC
         3960      3970      3980      3990      4000      4010[/PRE]

**Brian Bushnell** · 06-08-2015, 02:29 PM

You can change gap penalties and get a different alignment by adding the flag "msa=MultiStateAligner9Flat". That will generally reduce the rate of long indels by flattening the affine transforms. The resulting cigar string is "2=1D1X1=1X3=2X1=2X2=3I4=1I1X1=3I3=1X5=1X23=1X27=1X82=", which may not be exactly what you are looking for, but is much closer.

Alternatively, depending on what you are doing with the data, you can enable soft-clipping with BBMap's "local" flag, then calculate coverage using pileup.sh with the "includesoftclip" flag. This will calculate coverage including soft-clipped sequences.

Please let me know if neither of those is adequate...

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 49 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News