Seqanswers Leaderboard Ad

**Brian Bushnell** · 11-18-2016, 09:51 AM

I will update the description; it's a bit misleading. What it means is, reads with average quality below the setting will not be mapped. All of the reads will still be printed to "out", even if they failed the quality threshold. However, if you use the "outm" and "outu" streams, only mapped reads will go to outm, and only unmapped reads will go to outu. So, maq would suppress low-quality reads from going to outm.

Thanks for noting that the long version of the flag does not work. It's actually "minavgquality"; I'll fix that.

**Brian Bushnell** · 11-23-2016, 01:34 PM

Seem my response in this thread: http://seqanswers.com/forums/showthr...313#post201313

But basically, yes. And no, you don't need to be in the /ref/ directory, you need to be in the directory containing the ref directory.

By default, there is only one alignment per read.

**nicklisbon** · 11-25-2016, 06:11 AM

Good afternoon Brian,
In my lab we are dealing with a couple of thousands of whole genome seq. data. We are running some tests with BBtools and its performance looks really promising.
We would like to use Lumpy for structural variants call and this tool requires as inputs beside a "normal" BAM file also a bam file containing unpaired and split reads.
I have seen that it is possible to run BBmap in local alignment mode but is there any way to identify chimeras with BBmap?
Thanks in advance.

**Brian Bushnell** · 11-25-2016, 10:36 AM

BBMap does not have any explicit chimera identification capabilities; that would need to be done in post-processing. Also, it does not produce split alignments, but it does do a very good job of identifying long deletions, which it will capture in a single alignment.

You can run BBMap like this:

Code:

bbmap.sh in=reads.fq outm=normal.bam outu=unmapped.fq pairedonly ambig=all maxindel=400k

That will produce a bam file of the normally-mapped pairs. Then, mapping with a second pass:

Code:

bbmap.sh in=unmapped.fq out=odd.bam ambig=all local maxindel=400k

That will produce a file containing all of the unmapped and unpaired reads. I have not used Lumpy so I'm not sure if this is the best way to do it, but that sounds approximately like what it wants.

**nicklisbon** · 11-26-2016, 08:48 AM

So, the first alignment is run in global alignment mode, while the second one is run in local mode allowing indels up to 400 kb, isn't it?
Sounds like a good solution thanks.
We will try this option and check whether Lumpy likes it.

**Brian Bushnell** · 11-28-2016, 10:32 AM

The default is roughly 76% identity. You can adjust this with the "minid" flag (e.g. "minid=0.80" for 80% identity.) If you want to restrict alignments to a maximum number of substitutions, you can use "subfilter"; e.g., "subfilter=5" will discard alignments with more than 5 substitutions.

That said, what exactly are you trying to do?

**Brian Bushnell** · 11-28-2016, 01:31 PM

BBMap can output coverage, by the way. Just add the flag "covstats=covstats.txt". You may also want to add the flag "delcov=f" for this purpose.

**Brian Bushnell** · 11-28-2016, 02:34 PM

Pileup won't accept multiple input files, but it will accept standard in. So if the files were sam rather than bam, you could do this:

Code:

cat lib1.sam lib2.sam | pileup.sh in=stdin.sam out=stats.txt hist=histogram.txt

or for gzipped files:

Code:

cat lib1.sam.gz lib2.sam.gz | pileup.sh in=stdin.sam.gz out=stats.txt hist=histogram.txt

However, I don't see a way to do that with bam files. BBMap will produce sam files instead of bam files if you name the output *.sam instead of *.bam.

**Brian Bushnell** · 11-29-2016, 10:22 AM

Pileup uses the header lines to determine the names and lengths of the sequences in the reference. So, they must be present. However, it doesn't really care whether there are extra ones randomly in the file; they just get ignored. So, cat works fine.

**Brian Bushnell** · 12-11-2016, 09:36 AM

Originally posted by moistplus

Can you confirm that pileup doesn't need sam files to be sorted ?

Yes, that's correct.

**skbrimer** · 12-14-2016, 06:21 AM

Hi Brian,

I like to use freebayes for the vcf creation because all of our stuff is haploid, however I do not know how to work with the ref file that BBMap makes during a mapping. Freebayes accepts a fasta for the ref. The ref folder contains a compressed chromosome file which freebayes doesn't like and when I use the fasta that I feed to BBmap it doesn't like the header mismatch.

Can you tell me how to overcome this issue?

**Brian Bushnell** · 12-14-2016, 09:31 AM

Unfortunately, it sounds like Freebayes is trimming the names of the sequences after the first whitespace when loading them from the fasta file, but it's not doing that when looking at the sam file. To fix that problem, unfortunately you'd need to rerun the mapping, adding the "trd" flag (trim read description). Additionally, Freebayes is not compliant with the current Sam specification and does not understand cigar strings with "=" and "X" symbols, so you need to output in legacy Sam format with M symbols. Therefore, for Freebayes, you need to add two extra flags when mapping:

Code:

trd sam=1.3

**skbrimer** · 12-14-2016, 09:37 AM

Thank you Brain!

Is there a variant calling pipeline that works best with BBmap?

**GenoMax** · 12-14-2016, 10:00 AM

If you are going to re-do mapping then clean up the fasta ID's (and regenerate BBMap index) in your reference genome, if there are spaces in the words there.

BBMap suite now has a variant caller (callvariants.sh) :-)

**Brian Bushnell** · 12-14-2016, 10:09 AM

Originally posted by skbrimer View Post

Thank you Brain!

Is there a variant calling pipeline that works best with BBmap?

Funny you should ask! I just finished developing a variant-caller, because I was unhappy with the speed, usability, and some other aspects of Samtools mpileup and GATK (I've never used Freebayes, though). Also, it has a default ploidy of 1 which better reflects projects at JGI, though the ploidy can be changed to an arbitrary number.

I just uploaded a new version of BBTools (36.71) with the variant caller (older versions do not support vcf output). You can run it like this:

Code:

callvariants.sh in=mapped.sam vcf=vars.vcf ref=ref.fasta ploidy=1

CallVariants works on sam or bam, sorted or unsorted, and it's really fast. Also, it does not care what format the cigar strings are in or whether you trimmed sequence names after the first whitespace. I'd be eager to hear how it compares to FreeBayes, if you'd like to try it out. So far, other people at JGI have indicated that it does a much better job of identifying low-depth variants in high-ploidy organisms than Samtools or GATK (if you adjust the ploidy and minallelefraction flags).

That said, no, I don't have a recommended variant-calling workflow for BBMap, because I have not done rigorous benchmarking of all the possibilities yet. Oh, and to modify my previous post, for variant calling I suggest you also add the "mdtag" flag, which some variant-callers like to have. So, for Freebayes:

Code:

bbmap.sh in=reads.fq ref=ref.fa out=mapped.sam mdtag trd sam=1.3

Originally posted by GenoMax View Post

BBMap suite now has a variant caller (callvariants.sh) :-)

Indeed

But if you use it, I highly recommend downloading BBMap 36.71, as prior versions lack proper VCF output.

Topics	Statistics	Last Post
ASHG 2024 Highlights – Part Two by seqadmin Started by seqadmin, Today, 11:09 AM	0 responses 22 views 0 likes	Last Post by seqadmin Today, 11:09 AM
ASHG 2024 Highlights – Part One by seqadmin Started by seqadmin, Today, 06:13 AM	0 responses 20 views 0 likes	Last Post by seqadmin Today, 06:13 AM
Seq-Scope Expands Possibilities for High-Resolution Gene Expression Analysis by seqadmin Started by seqadmin, 11-01-2024, 06:09 AM	0 responses 30 views 0 likes	Last Post by seqadmin 11-01-2024, 06:09 AM
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, 10-30-2024, 05:31 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-30-2024, 05:31 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News