Introducing BBMap, a new short-read aligner for DNA and RNA

This topic is closed.

Brian Bushnell replied

07-07-2014, 09:37 AM
sdmoore,

By default, BBMap will look for much longer indels than BWA/Bowtie2, over 16000bp. You can limit this with the maxindel flag (e.g. "maxindel=40"). Soft-clipping (via "local" flag) can also reduce erroneous variation calls from chimeric or low-quality reads.
Leave a comment:
sdmoore replied

07-07-2014, 05:53 AM
Thanks Brian and dpryan.
I had to give up on bbmap for now, not for this problem (I found the AddOrReplaceReadGroups tool later: I edited the sams or the bams). Rather, the resulting vcf from mpileup on the BBmap alignments were "all over the place" (and took forever to process too), I don't know how else to describe it, large insert calls for a bunch of positions. Viewing the file was no help (tons of insert/asterisks displayed). Same mess from FreeBayes. BWA-mem and Bowtie2 assemblies don't show this and I can easily identify known errors in the reference file with either mpileup or freebayes. The assembly looked more like what I got from cushaw2 (and also dropped). We are at a stage now where we will Sanger sequence a few loci to clear things up (e.g., BWA never shows a collection of mutations that Bowtie2 does). I was hoping to have a third assembler "take sides", but I think it's faster for us to sequence and be sure.
Leave a comment:
dpryan replied

07-07-2014, 12:50 AM
@sdmoore: You actually just want AddOrReplaceReadGroups from Picard tools. The command I expect you were going for is "samtools reheader", though that won't really do what you want since read group information is also added to each alignment.

@Brian: It would be great if you could add read group support. That'll be needed by anyone doing SNP calling.
Leave a comment:
Brian Bushnell replied

07-06-2014, 08:25 PM
sdmoore,

BBMap does not have an option for setting the readgroup, since I never encountered a situation where I needed it. But if it's useful, I can add it to the next release. The solution in your linked thread looks reasonable and I'm not sure why it didn't work for you; I will let you know if I find a better solution.
Leave a comment:
sdmoore replied

07-05-2014, 09:25 AM
Possible to add Read Group in BBmap header?

*sorry, probably wrong thread, I found more activity in the release announcement thread*

Hello,
I used BBduk to process my read pairs and then mapped them using BBmap, then sam/bam and sorted.
I plan to use an alternative to mpileup to process this set (for comparison of the outputs), so I am trying to use GATK tools.

When running a GATK tool, it reports the error that the readgoup is not found in the header. With other mappers, this is an option (like -R for BWA). I found a methods to manually add readgoup information to the header (such as here), but I have limited linux skills and get errors when trying that approach (command "header" not found). I am also concerned that if I put the wrong RG info, I may pooch a downstream tool.

Is there a way to make the BBmap output compatible with GATK?

Last edited by sdmoore; 07-05-2014, 09:34 AM. Reason: wrong thread?
Leave a comment:
muol replied

06-27-2014, 04:11 PM
Excellent, just did a test run. This is very useful software!

Olaf
Leave a comment:
Brian Bushnell replied

06-27-2014, 03:26 PM
Olaf,

This has been fixed in the latest release, 33.04
Leave a comment:
muol replied

06-23-2014, 07:14 PM
Thanks for the info Brian, it wasn't a big issue.

Olaf
Leave a comment:
Brian Bushnell replied

06-23-2014, 06:02 PM
Olaf,

Currently, BBNorm uses single interleaved files for temporary storage when using multiple passes. And I have not implemented any way to specify dual files in intermediate stages, since everyone at JGI uses interleaved files for everything.

You have two options.
1) You could set "passes=1", which is faster, but I don't recommend it because it doesn't give as good results as 2-pass normalization.
or
2) You could specify only a single output file, which will get interleaved reads:

bbnorm.sh in1=R1.fastq.gz in2=R2.fastq.gz out=R12.bbnorm.fastq.gz prefilter=t tossbadreads=t ecc=t fixspikes=t qin=33 -Xmx72g target=40

...Then, if you need to, de-interleave it afterward:

reformat.sh in=R12.bbnorm.fastq.gz out1=R1.bbnorm.fastq.gz out2=R2.bbnorm.fastq.gz

Sorry for the inconvenience! I'll try to fix that by the next release, though unlike documenting the "qin" flag, this will take more work so no guarantees. Thanks for bringing it to my attention. FYI, the flag "interleaved" has no effect on output, only input.

-Brian

Last edited by Brian Bushnell; 06-23-2014, 06:05 PM.
Leave a comment:

muol replied

06-23-2014, 05:07 PM

Brian,

I ran into a smaller issue with bbnorm. When trying to input and output separate files for a PE library like this:

Code:

bbnorm.sh in1=R1.fastq.gz in2=R2.fastq.gz out1=R1.bbnorm.fastq.gz out2=R2.bbnorm.fastq.gz prefilter=t tossbadreads=t ecc=t fixspikes=t qin=33 -Xmx72g target=40

I receive this error during pass 2:

Code:

Exception in thread "main" java.lang.AssertionError: Please do not set 'interleaved=true' with dual input files.
	at stream.ConcurrentGenericReadInputStream.<init>(ConcurrentGenericReadInputStream.java:132)
	at stream.ConcurrentGenericReadInputStream.getReadInputStream(ConcurrentGenericReadInputStream.java:661)
	at stream.ConcurrentGenericReadInputStream.getReadInputStream(ConcurrentGenericReadInputStream.java:641)
	at kmer.KmerCount7MTA.countFastq(KmerCount7MTA.java:355)
	at kmer.KmerCount7MTA.makeKca(KmerCount7MTA.java:222)
	at jgi.KmerNormalize.runPass(KmerNormalize.java:1006)
	at jgi.KmerNormalize.main(KmerNormalize.java:736)

Setting interleaved=false doesn't change that. Outputting to a single, interleaved file (in1=xxx in2=xxx out=xxx) on the other hand works fine. Any ideas?

Olaf

Leave a comment:

muol replied

06-23-2014, 03:54 PM
Indeed, just tried it and it works well with bbnorm.

Thanks
Olaf
Leave a comment:
Brian Bushnell replied

06-23-2014, 03:44 PM
Olaf,

It's there, I just forgot to document it; sorry! I'll add that to the shellscript in the next release. I think that all of the programs in the package that read fastq input allow the "qin" flag.

-Brian
Leave a comment:
muol replied

06-23-2014, 03:28 PM
Hi Brian,

Is there an option to set read quality encoding in bbnorm? I had to set qin=33 in bbduk for some Illumina 1.9 paired end libraries, but this option doesn't seem to exist in bbnorm (used BBMap v. 32.32 for Java 7).

Thanks
Olaf
Leave a comment:
Corydoras replied

06-20-2014, 12:06 AM
Hi Brian,

Thanks so much for that explanation . I thought I wouldn't be able to go past 31 but it is best to double check.

Sorry as well for just deleting my post (and bombarding you with simple questions, new to the world of NGS!), I played around with updating the Java on our Linux machine and that did the trick .

Thanks again for your help! And the fantastic and easy to use script!!

Sarah
Leave a comment:
Brian Bushnell replied

06-19-2014, 09:48 AM
Sarah,

It might be better to normalize using a kmer length of 41, but BBNorm only supports a maximum of 31 In practice, it should make very little difference, though. Using long kmers is important for assembly, as it helps span short repeats that would otherwise cause contigs to terminate. But normalization is much less sensitive to that issue, and very long kmers can cause problems in the presence of errors. With k=31, a 100bp read with 1 error could yield 31 kmers with a depth of 1, out of a total of 70 kmers - in that case, the median depth would not be impacted. With k=63, there could be 63 of the 70 total kmers spanning the error, thus having a depth of 1, so the median depth of the read would look like 1 instead of its correct value. And BBNorm normalizes based on the median kmer depth of a read.

It's a lot more computationally efficient to use a max kmer length of 31, so that's how I designed it. I've tried shorter kmers down to about k=25 and not noticed an appreciable difference in normalization or error correction.

As for your prior (deleted) post, sorry for not responding - I think the problem was that you were running Java 6 instead of Java 7. Most of the programs in BBTools work fine in Java 6 but it looks like BBNorm requires Java 7 (or higher).
Leave a comment:

Previous 1 2 template Next

Genetic Variation in Immunogenetics and Antibody Diversity

by seqadmin

The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
- Channel: Articles
11-06-2024, 07:24 PM
Choosing Between NGS and qPCR

by seqadmin

Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
- Channel: Articles
10-18-2024, 07:11 AM

Topics	Statistics	Last Post
ASHG 2024 Highlights – Part Two by seqadmin Started by seqadmin, Today, 11:09 AM	0 responses 22 views 0 likes	Last Post by seqadmin Today, 11:09 AM
ASHG 2024 Highlights – Part One by seqadmin Started by seqadmin, Today, 06:13 AM	0 responses 20 views 0 likes	Last Post by seqadmin Today, 06:13 AM
Seq-Scope Expands Possibilities for High-Resolution Gene Expression Analysis by seqadmin Started by seqadmin, 11-01-2024, 06:09 AM	0 responses 30 views 0 likes	Last Post by seqadmin 11-01-2024, 06:09 AM
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, 10-30-2024, 05:31 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-30-2024, 05:31 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News