sdmoore,
By default, BBMap will look for much longer indels than BWA/Bowtie2, over 16000bp. You can limit this with the maxindel flag (e.g. "maxindel=40"). Soft-clipping (via "local" flag) can also reduce erroneous variation calls from chimeric or low-quality reads.
Seqanswers Leaderboard Ad
Collapse
This topic is closed.
X
X
-
Thanks Brian and dpryan.
I had to give up on bbmap for now, not for this problem (I found the AddOrReplaceReadGroups tool later: I edited the sams or the bams). Rather, the resulting vcf from mpileup on the BBmap alignments were "all over the place" (and took forever to process too), I don't know how else to describe it, large insert calls for a bunch of positions. Viewing the file was no help (tons of insert/asterisks displayed). Same mess from FreeBayes. BWA-mem and Bowtie2 assemblies don't show this and I can easily identify known errors in the reference file with either mpileup or freebayes. The assembly looked more like what I got from cushaw2 (and also dropped). We are at a stage now where we will Sanger sequence a few loci to clear things up (e.g., BWA never shows a collection of mutations that Bowtie2 does). I was hoping to have a third assembler "take sides", but I think it's faster for us to sequence and be sure.
Leave a comment:
-
-
@sdmoore: You actually just want AddOrReplaceReadGroups from Picard tools. The command I expect you were going for is "samtools reheader", though that won't really do what you want since read group information is also added to each alignment.
@Brian: It would be great if you could add read group support. That'll be needed by anyone doing SNP calling.
Leave a comment:
-
-
sdmoore,
BBMap does not have an option for setting the readgroup, since I never encountered a situation where I needed it. But if it's useful, I can add it to the next release. The solution in your linked thread looks reasonable and I'm not sure why it didn't work for you; I will let you know if I find a better solution.
Leave a comment:
-
-
Possible to add Read Group in BBmap header?
*sorry, probably wrong thread, I found more activity in the release announcement thread*
Hello,
I used BBduk to process my read pairs and then mapped them using BBmap, then sam/bam and sorted.
I plan to use an alternative to mpileup to process this set (for comparison of the outputs), so I am trying to use GATK tools.
When running a GATK tool, it reports the error that the readgoup is not found in the header. With other mappers, this is an option (like -R for BWA). I found a methods to manually add readgoup information to the header (such as here), but I have limited linux skills and get errors when trying that approach (command "header" not found). I am also concerned that if I put the wrong RG info, I may pooch a downstream tool.
Is there a way to make the BBmap output compatible with GATK?
Leave a comment:
-
-
Olaf,
Currently, BBNorm uses single interleaved files for temporary storage when using multiple passes. And I have not implemented any way to specify dual files in intermediate stages, since everyone at JGI uses interleaved files for everything.
You have two options.
1) You could set "passes=1", which is faster, but I don't recommend it because it doesn't give as good results as 2-pass normalization.
or
2) You could specify only a single output file, which will get interleaved reads:
bbnorm.sh in1=R1.fastq.gz in2=R2.fastq.gz out=R12.bbnorm.fastq.gz prefilter=t tossbadreads=t ecc=t fixspikes=t qin=33 -Xmx72g target=40
...Then, if you need to, de-interleave it afterward:
reformat.sh in=R12.bbnorm.fastq.gz out1=R1.bbnorm.fastq.gz out2=R2.bbnorm.fastq.gz
Sorry for the inconvenience! I'll try to fix that by the next release, though unlike documenting the "qin" flag, this will take more work so no guarantees. Thanks for bringing it to my attention. FYI, the flag "interleaved" has no effect on output, only input.
-BrianLast edited by Brian Bushnell; 06-23-2014, 06:05 PM.
Leave a comment:
-
-
Brian,
I ran into a smaller issue with bbnorm. When trying to input and output separate files for a PE library like this:
Code:bbnorm.sh in1=R1.fastq.gz in2=R2.fastq.gz out1=R1.bbnorm.fastq.gz out2=R2.bbnorm.fastq.gz prefilter=t tossbadreads=t ecc=t fixspikes=t qin=33 -Xmx72g target=40
Code:Exception in thread "main" java.lang.AssertionError: Please do not set 'interleaved=true' with dual input files. at stream.ConcurrentGenericReadInputStream.<init>(ConcurrentGenericReadInputStream.java:132) at stream.ConcurrentGenericReadInputStream.getReadInputStream(ConcurrentGenericReadInputStream.java:661) at stream.ConcurrentGenericReadInputStream.getReadInputStream(ConcurrentGenericReadInputStream.java:641) at kmer.KmerCount7MTA.countFastq(KmerCount7MTA.java:355) at kmer.KmerCount7MTA.makeKca(KmerCount7MTA.java:222) at jgi.KmerNormalize.runPass(KmerNormalize.java:1006) at jgi.KmerNormalize.main(KmerNormalize.java:736)
Olaf
Leave a comment:
-
-
Olaf,
It's there, I just forgot to document it; sorry! I'll add that to the shellscript in the next release. I think that all of the programs in the package that read fastq input allow the "qin" flag.
-Brian
Leave a comment:
-
-
Hi Brian,
Is there an option to set read quality encoding in bbnorm? I had to set qin=33 in bbduk for some Illumina 1.9 paired end libraries, but this option doesn't seem to exist in bbnorm (used BBMap v. 32.32 for Java 7).
Thanks
Olaf
Leave a comment:
-
-
Hi Brian,
Thanks so much for that explanation. I thought I wouldn't be able to go past 31 but it is best to double check.
Sorry as well for just deleting my post (and bombarding you with simple questions, new to the world of NGS!), I played around with updating the Java on our Linux machine and that did the trick.
Thanks again for your help! And the fantastic and easy to use script!!
Sarah
Leave a comment:
-
-
Sarah,
It might be better to normalize using a kmer length of 41, but BBNorm only supports a maximum of 31In practice, it should make very little difference, though. Using long kmers is important for assembly, as it helps span short repeats that would otherwise cause contigs to terminate. But normalization is much less sensitive to that issue, and very long kmers can cause problems in the presence of errors. With k=31, a 100bp read with 1 error could yield 31 kmers with a depth of 1, out of a total of 70 kmers - in that case, the median depth would not be impacted. With k=63, there could be 63 of the 70 total kmers spanning the error, thus having a depth of 1, so the median depth of the read would look like 1 instead of its correct value. And BBNorm normalizes based on the median kmer depth of a read.
It's a lot more computationally efficient to use a max kmer length of 31, so that's how I designed it. I've tried shorter kmers down to about k=25 and not noticed an appreciable difference in normalization or error correction.
As for your prior (deleted) post, sorry for not responding - I think the problem was that you were running Java 6 instead of Java 7. Most of the programs in BBTools work fine in Java 6 but it looks like BBNorm requires Java 7 (or higher).
Leave a comment:
-
Latest Articles
Collapse
-
by seqadmin
This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.
The Headliner
The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...-
Channel: Articles
03-03-2025, 01:39 PM -
-
by seqadmin
The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...-
Channel: Articles
02-24-2025, 06:31 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 07:27 AM
|
0 responses
10 views
0 reactions
|
Last Post
by seqadmin
Today, 07:27 AM
|
||
Started by seqadmin, Yesterday, 12:50 PM
|
0 responses
14 views
0 reactions
|
Last Post
by seqadmin
Yesterday, 12:50 PM
|
||
Started by seqadmin, 03-03-2025, 01:15 PM
|
0 responses
185 views
0 reactions
|
Last Post
by seqadmin
03-03-2025, 01:15 PM
|
||
Started by seqadmin, 02-28-2025, 12:58 PM
|
0 responses
282 views
0 reactions
|
Last Post
by seqadmin
02-28-2025, 12:58 PM
|
Leave a comment: