Seqanswers Leaderboard Ad

**jweger1988** · 12-05-2017, 07:30 AM

Hi Brian,

I'm using BBtools to call indels from small RNA virus genomes using both bbmap and callvariants. It's doing a great job of calling these. We are interested in the possibility of also calling inversion events from this data. I know this is somewhat common in transcriptomics data. Do these programs have this functionality?

To be more specific, is it possible to identify reads that are aligning to both the sense and antisense of the given reference?

Thanks in advance!

James

**mcmc** · 01-11-2018, 01:48 PM

bbmap local + idfilter

Brian et al., I am using the 'local' option with bbmap because I sometimes have a couple bases of adapter remaining (after bbduk mink=6) and my understanding is that 'local' will trim the ends to achieve a better score.

My question: does the idfilter option work on the local or global alignment? I have reads with 100% ID after trimming ends but they are not passing the idfilter. I'm curious whether there is another way to filter based on the local alignment percent id?

thanks!
MCMC

**sdriscoll** · 01-11-2018, 03:53 PM

mcmc- what are you setting the idfilter value to? Because reads still take a hit to their alignment score for soft-clipping. That is to say a read that's aligned with no mismatches but it was soft-clipped should have a lower score than a read that was mapped perfectly without soft-clipping. This strategy doesn't make sense when you're read is being soft-clipped due to adapter sequence at the end of the read...but for biological purposes it does make sense. Maybe a more appropriate question should be how to get rid of those "couple bases of adapter remaining" after trimming them.

**mcmc** · 01-16-2018, 03:58 PM

Originally posted by sdriscoll View Post

mcmc- what are you setting the idfilter value to? Because reads still take a hit to their alignment score for soft-clipping. That is to say a read that's aligned with no mismatches but it was soft-clipped should have a lower score than a read that was mapped perfectly without soft-clipping. This strategy doesn't make sense when you're read is being soft-clipped due to adapter sequence at the end of the read...but for biological purposes it does make sense. Maybe a more appropriate question should be how to get rid of those "couple bases of adapter remaining" after trimming them.

yes I know the scores may be different... but I'm simply wondering whether the idfilter and subfilter post-filtering options work on the soft-clipped alignments or the non-clipped alignments. I want to be able to filter with, say, 2 MM or 98% ID *after* the soft clipping.

Thanks,
MCMC

**kimng** · 03-15-2018, 12:52 AM

Bug in call variants with relation to sam files created from minimap2

Hello, I've been using different parts of the BBMap suite and mixing them with other softwares for WGS of bacterial samples (primarily for the purpose of automated QC). I'm a fan of callvariants.sh as it works quickly and doesn't require me to sort .sam files before hand and it's simple to clearfilters to return all values. Unfortunately when using callvariants.sh with .sam files created from minimap2 (https://github.com/lh3/minimap2) using illumina nextseq reads. I receive the error seen below. I was curious if this was a minor bug that could be corrected?

I'm running callvariants.sh from bbmap v37.90 obtained from conda (https://anaconda.org/bioconda/bbmap). Appreciate any assistance given.

-Kim

Error:
java -ea -Xmx48779m -Xms48779m -cp /home/ssi.ad/kimn/.conda/envs/env_kim/opt/bbmap-37.90/current/ var2.CallVariants in=aln.sam ref=contigs.fasta vcf=minimap.vcf ploidy=1 clearfilters
Executing var2.CallVariants [in=aln.sam, ref=contigs.fasta, vcf=minimap.vcf, ploidy=1, clearfilters]

Loading reference.
Time: 0.304 seconds.
Processing input files.
Exception in thread "Thread-3" Exception in thread "Thread-4" Exception in thread "Thread-5" java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
at stream.SamLine.toShortMatch(SamLine.java:1200)
at stream.SamLine.toRead(SamLine.java:1972)
at stream.SamLine.toRead(SamLine.java:1832)
at stream.SamReadStreamer$ProcessThread.makeReads(SamReadStreamer.java:206)
at stream.SamReadStreamer$ProcessThread.run(SamReadStreamer.java:135)
java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
at stream.SamLine.toShortMatch(SamLine.java:1200)
at stream.SamLine.toRead(SamLine.java:1972)
at stream.SamLine.toRead(SamLine.java:1832)
at stream.SamReadStreamer$ProcessThread.makeReads(SamReadStreamer.java:206)
at stream.SamReadStreamer$ProcessThread.run(SamReadStreamer.java:135)
java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
at stream.SamLine.toShortMatch(SamLine.java:1200)
at stream.SamLine.toRead(SamLine.java:1972)
at stream.SamLine.toRead(SamLine.java:1832)
at stream.SamReadStreamer$ProcessThread.makeReads(SamReadStreamer.java:206)
at stream.SamReadStreamer$ProcessThread.run(SamReadStreamer.java:135)

**GenoMax** · 03-15-2018, 04:21 AM

You can try reformatting your alignment file using "reformat.sh in=you.bam out=new.bam sam=1.4" to see if that helps.

**kimng** · 03-15-2018, 04:34 AM

Appreciate the reply but unfortunately it leads to the same error:

java -ea -Xmx200m -cp /home/ssi.ad/kimn/.conda/envs/env_kim/opt/bbmap-37.90/current/ jgi.ReformatReads in=minimap.sam out=new.bam sam=1.4
Executing jgi.ReformatReads [in=minimap.sam, out=new.bam, sam=1.4]

Input is being processed as unpaired
java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
at stream.SamLine.toShortMatch(SamLine.java:1200)
at stream.SamLine.toRead(SamLine.java:1972)
at stream.SamLine.toRead(SamLine.java:1832)
at stream.SamReadInputStream.toReadList(SamReadInputStream.java:118)
at stream.SamReadInputStream.fillBuffer(SamReadInputStream.java:89)
at stream.SamReadInputStream.hasMore(SamReadInputStream.java:53)
at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:664)
at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:653)

**GenoMax** · 03-15-2018, 05:25 AM

It was worth a try.

Based on the `TODO` tag I see perhaps this is something @Brian intends to work on. Unfortunately he has not been participating in forums of late so there is no guarantee as to when this question may be addressed. You could try creating a ticket at BBMap SF repository for this question.

**kimng** · 03-15-2018, 05:36 AM

Thanks for the info didn't realise that was the site for tickets so appreciate it, I'll create a ticket there.

**HESmith** · 03-15-2018, 06:48 AM

You can try the reformat.sh command per @GenoMax but with 'sam=1.3' flag. That worked for me with a different variant caller (FreeBayes).

**kimng** · 03-15-2018, 06:52 AM

Thanks for the input but same issue using sam=1.3, I think Geno's comment on something he intended to do is on the mark.

**darthsequencer** · 03-15-2018, 12:38 PM

Hi Brian - What are the right flags to set if I want all of the top scoring mappings for paired end reads?

Thanks

**GenoMax** · 03-15-2018, 04:00 PM

Have you tried ambig=all?

**darthsequencer** · 03-15-2018, 09:03 PM

Originally posted by GenoMax View Post

Have you tried ambig=all?

Yeah - I have but from what I can tell it maxes out at 6 reports unless maxsites is increased. I am wondering if there is a setting that avoids maxsites and reports everything above some score.

**SNPsaurus** · 03-16-2018, 09:41 AM

Originally posted by darthsequencer View Post

Yeah - I have but from what I can tell it maxes out at 6 reports unless maxsites is increased. I am wondering if there is a setting that avoids maxsites and reports everything above some score.

You might try the align2.BBMapPacBioSkimmer mapper with ambig=all and expectedsites= some high number. Not exactly what you were asking but I get dozens of hits with that.

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 48 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News