Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Brian Bushnell
    replied
    sdmoore,

    By default, BBMap will look for much longer indels than BWA/Bowtie2, over 16000bp. You can limit this with the maxindel flag (e.g. "maxindel=40"). Soft-clipping (via "local" flag) can also reduce erroneous variation calls from chimeric or low-quality reads.

    Leave a comment:


  • sdmoore
    replied
    Thanks Brian and dpryan.
    I had to give up on bbmap for now, not for this problem (I found the AddOrReplaceReadGroups tool later: I edited the sams or the bams). Rather, the resulting vcf from mpileup on the BBmap alignments were "all over the place" (and took forever to process too), I don't know how else to describe it, large insert calls for a bunch of positions. Viewing the file was no help (tons of insert/asterisks displayed). Same mess from FreeBayes. BWA-mem and Bowtie2 assemblies don't show this and I can easily identify known errors in the reference file with either mpileup or freebayes. The assembly looked more like what I got from cushaw2 (and also dropped). We are at a stage now where we will Sanger sequence a few loci to clear things up (e.g., BWA never shows a collection of mutations that Bowtie2 does). I was hoping to have a third assembler "take sides", but I think it's faster for us to sequence and be sure.

    Leave a comment:


  • dpryan
    replied
    @sdmoore: You actually just want AddOrReplaceReadGroups from Picard tools. The command I expect you were going for is "samtools reheader", though that won't really do what you want since read group information is also added to each alignment.

    @Brian: It would be great if you could add read group support. That'll be needed by anyone doing SNP calling.

    Leave a comment:


  • Brian Bushnell
    replied
    sdmoore,

    BBMap does not have an option for setting the readgroup, since I never encountered a situation where I needed it. But if it's useful, I can add it to the next release. The solution in your linked thread looks reasonable and I'm not sure why it didn't work for you; I will let you know if I find a better solution.

    Leave a comment:


  • sdmoore
    replied
    Possible to add Read Group in BBmap header?

    *sorry, probably wrong thread, I found more activity in the release announcement thread*

    Hello,
    I used BBduk to process my read pairs and then mapped them using BBmap, then sam/bam and sorted.
    I plan to use an alternative to mpileup to process this set (for comparison of the outputs), so I am trying to use GATK tools.

    When running a GATK tool, it reports the error that the readgoup is not found in the header. With other mappers, this is an option (like -R for BWA). I found a methods to manually add readgoup information to the header (such as here), but I have limited linux skills and get errors when trying that approach (command "header" not found). I am also concerned that if I put the wrong RG info, I may pooch a downstream tool.

    Is there a way to make the BBmap output compatible with GATK?
    Last edited by sdmoore; 07-05-2014, 09:34 AM. Reason: wrong thread?

    Leave a comment:


  • muol
    replied
    Excellent, just did a test run. This is very useful software!

    Olaf

    Leave a comment:


  • Brian Bushnell
    replied
    Olaf,

    This has been fixed in the latest release, 33.04

    Leave a comment:


  • muol
    replied
    Thanks for the info Brian, it wasn't a big issue.

    Olaf

    Leave a comment:


  • Brian Bushnell
    replied
    Olaf,

    Currently, BBNorm uses single interleaved files for temporary storage when using multiple passes. And I have not implemented any way to specify dual files in intermediate stages, since everyone at JGI uses interleaved files for everything.

    You have two options.
    1) You could set "passes=1", which is faster, but I don't recommend it because it doesn't give as good results as 2-pass normalization.
    or
    2) You could specify only a single output file, which will get interleaved reads:

    bbnorm.sh in1=R1.fastq.gz in2=R2.fastq.gz out=R12.bbnorm.fastq.gz prefilter=t tossbadreads=t ecc=t fixspikes=t qin=33 -Xmx72g target=40

    ...Then, if you need to, de-interleave it afterward:

    reformat.sh in=R12.bbnorm.fastq.gz out1=R1.bbnorm.fastq.gz out2=R2.bbnorm.fastq.gz

    Sorry for the inconvenience! I'll try to fix that by the next release, though unlike documenting the "qin" flag, this will take more work so no guarantees. Thanks for bringing it to my attention. FYI, the flag "interleaved" has no effect on output, only input.

    -Brian
    Last edited by Brian Bushnell; 06-23-2014, 06:05 PM.

    Leave a comment:


  • muol
    replied
    Brian,

    I ran into a smaller issue with bbnorm. When trying to input and output separate files for a PE library like this:

    Code:
    bbnorm.sh in1=R1.fastq.gz in2=R2.fastq.gz out1=R1.bbnorm.fastq.gz out2=R2.bbnorm.fastq.gz prefilter=t tossbadreads=t ecc=t fixspikes=t qin=33 -Xmx72g target=40
    I receive this error during pass 2:

    Code:
    Exception in thread "main" java.lang.AssertionError: Please do not set 'interleaved=true' with dual input files.
    	at stream.ConcurrentGenericReadInputStream.<init>(ConcurrentGenericReadInputStream.java:132)
    	at stream.ConcurrentGenericReadInputStream.getReadInputStream(ConcurrentGenericReadInputStream.java:661)
    	at stream.ConcurrentGenericReadInputStream.getReadInputStream(ConcurrentGenericReadInputStream.java:641)
    	at kmer.KmerCount7MTA.countFastq(KmerCount7MTA.java:355)
    	at kmer.KmerCount7MTA.makeKca(KmerCount7MTA.java:222)
    	at jgi.KmerNormalize.runPass(KmerNormalize.java:1006)
    	at jgi.KmerNormalize.main(KmerNormalize.java:736)
    Setting interleaved=false doesn't change that. Outputting to a single, interleaved file (in1=xxx in2=xxx out=xxx) on the other hand works fine. Any ideas?

    Olaf

    Leave a comment:


  • muol
    replied
    Indeed, just tried it and it works well with bbnorm.

    Thanks
    Olaf

    Leave a comment:


  • Brian Bushnell
    replied
    Olaf,

    It's there, I just forgot to document it; sorry! I'll add that to the shellscript in the next release. I think that all of the programs in the package that read fastq input allow the "qin" flag.

    -Brian

    Leave a comment:


  • muol
    replied
    Hi Brian,

    Is there an option to set read quality encoding in bbnorm? I had to set qin=33 in bbduk for some Illumina 1.9 paired end libraries, but this option doesn't seem to exist in bbnorm (used BBMap v. 32.32 for Java 7).

    Thanks
    Olaf

    Leave a comment:


  • Corydoras
    replied
    Hi Brian,

    Thanks so much for that explanation . I thought I wouldn't be able to go past 31 but it is best to double check.

    Sorry as well for just deleting my post (and bombarding you with simple questions, new to the world of NGS!), I played around with updating the Java on our Linux machine and that did the trick .

    Thanks again for your help! And the fantastic and easy to use script!!

    Sarah

    Leave a comment:


  • Brian Bushnell
    replied
    Sarah,

    It might be better to normalize using a kmer length of 41, but BBNorm only supports a maximum of 31 In practice, it should make very little difference, though. Using long kmers is important for assembly, as it helps span short repeats that would otherwise cause contigs to terminate. But normalization is much less sensitive to that issue, and very long kmers can cause problems in the presence of errors. With k=31, a 100bp read with 1 error could yield 31 kmers with a depth of 1, out of a total of 70 kmers - in that case, the median depth would not be impacted. With k=63, there could be 63 of the 70 total kmers spanning the error, thus having a depth of 1, so the median depth of the read would look like 1 instead of its correct value. And BBNorm normalizes based on the median kmer depth of a read.

    It's a lot more computationally efficient to use a max kmer length of 31, so that's how I designed it. I've tried shorter kmers down to about k=25 and not noticed an appreciable difference in normalization or error correction.

    As for your prior (deleted) post, sorry for not responding - I think the problem was that you were running Java 6 instead of Java 7. Most of the programs in BBTools work fine in Java 6 but it looks like BBNorm requires Java 7 (or higher).

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Genetic Variation in Immunogenetics and Antibody Diversity
    by seqadmin



    The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
    11-06-2024, 07:24 PM
  • seqadmin
    Choosing Between NGS and qPCR
    by seqadmin



    Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
    10-18-2024, 07:11 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 11:09 AM
0 responses
22 views
0 likes
Last Post seqadmin  
Started by seqadmin, Today, 06:13 AM
0 responses
20 views
0 likes
Last Post seqadmin  
Started by seqadmin, 11-01-2024, 06:09 AM
0 responses
30 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-30-2024, 05:31 AM
0 responses
21 views
0 likes
Last Post seqadmin  
Working...
X