Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Brian,

    I ran into a smaller issue with bbnorm. When trying to input and output separate files for a PE library like this:

    Code:
    bbnorm.sh in1=R1.fastq.gz in2=R2.fastq.gz out1=R1.bbnorm.fastq.gz out2=R2.bbnorm.fastq.gz prefilter=t tossbadreads=t ecc=t fixspikes=t qin=33 -Xmx72g target=40
    I receive this error during pass 2:

    Code:
    Exception in thread "main" java.lang.AssertionError: Please do not set 'interleaved=true' with dual input files.
    	at stream.ConcurrentGenericReadInputStream.<init>(ConcurrentGenericReadInputStream.java:132)
    	at stream.ConcurrentGenericReadInputStream.getReadInputStream(ConcurrentGenericReadInputStream.java:661)
    	at stream.ConcurrentGenericReadInputStream.getReadInputStream(ConcurrentGenericReadInputStream.java:641)
    	at kmer.KmerCount7MTA.countFastq(KmerCount7MTA.java:355)
    	at kmer.KmerCount7MTA.makeKca(KmerCount7MTA.java:222)
    	at jgi.KmerNormalize.runPass(KmerNormalize.java:1006)
    	at jgi.KmerNormalize.main(KmerNormalize.java:736)
    Setting interleaved=false doesn't change that. Outputting to a single, interleaved file (in1=xxx in2=xxx out=xxx) on the other hand works fine. Any ideas?

    Olaf

    Comment


    • #17
      Olaf,

      Currently, BBNorm uses single interleaved files for temporary storage when using multiple passes. And I have not implemented any way to specify dual files in intermediate stages, since everyone at JGI uses interleaved files for everything.

      You have two options.
      1) You could set "passes=1", which is faster, but I don't recommend it because it doesn't give as good results as 2-pass normalization.
      or
      2) You could specify only a single output file, which will get interleaved reads:

      bbnorm.sh in1=R1.fastq.gz in2=R2.fastq.gz out=R12.bbnorm.fastq.gz prefilter=t tossbadreads=t ecc=t fixspikes=t qin=33 -Xmx72g target=40

      ...Then, if you need to, de-interleave it afterward:

      reformat.sh in=R12.bbnorm.fastq.gz out1=R1.bbnorm.fastq.gz out2=R2.bbnorm.fastq.gz

      Sorry for the inconvenience! I'll try to fix that by the next release, though unlike documenting the "qin" flag, this will take more work so no guarantees. Thanks for bringing it to my attention. FYI, the flag "interleaved" has no effect on output, only input.

      -Brian
      Last edited by Brian Bushnell; 06-23-2014, 06:05 PM.

      Comment


      • #18
        Thanks for the info Brian, it wasn't a big issue.

        Olaf

        Comment


        • #19
          Olaf,

          This has been fixed in the latest release, 33.04

          Comment


          • #20
            Excellent, just did a test run. This is very useful software!

            Olaf

            Comment


            • #21
              Possible to add Read Group in BBmap header?

              *sorry, probably wrong thread, I found more activity in the release announcement thread*

              Hello,
              I used BBduk to process my read pairs and then mapped them using BBmap, then sam/bam and sorted.
              I plan to use an alternative to mpileup to process this set (for comparison of the outputs), so I am trying to use GATK tools.

              When running a GATK tool, it reports the error that the readgoup is not found in the header. With other mappers, this is an option (like -R for BWA). I found a methods to manually add readgoup information to the header (such as here), but I have limited linux skills and get errors when trying that approach (command "header" not found). I am also concerned that if I put the wrong RG info, I may pooch a downstream tool.

              Is there a way to make the BBmap output compatible with GATK?
              Last edited by sdmoore; 07-05-2014, 09:34 AM. Reason: wrong thread?

              Comment


              • #22
                sdmoore,

                BBMap does not have an option for setting the readgroup, since I never encountered a situation where I needed it. But if it's useful, I can add it to the next release. The solution in your linked thread looks reasonable and I'm not sure why it didn't work for you; I will let you know if I find a better solution.

                Comment


                • #23
                  @sdmoore: You actually just want AddOrReplaceReadGroups from Picard tools. The command I expect you were going for is "samtools reheader", though that won't really do what you want since read group information is also added to each alignment.

                  @Brian: It would be great if you could add read group support. That'll be needed by anyone doing SNP calling.

                  Comment


                  • #24
                    Thanks Brian and dpryan.
                    I had to give up on bbmap for now, not for this problem (I found the AddOrReplaceReadGroups tool later: I edited the sams or the bams). Rather, the resulting vcf from mpileup on the BBmap alignments were "all over the place" (and took forever to process too), I don't know how else to describe it, large insert calls for a bunch of positions. Viewing the file was no help (tons of insert/asterisks displayed). Same mess from FreeBayes. BWA-mem and Bowtie2 assemblies don't show this and I can easily identify known errors in the reference file with either mpileup or freebayes. The assembly looked more like what I got from cushaw2 (and also dropped). We are at a stage now where we will Sanger sequence a few loci to clear things up (e.g., BWA never shows a collection of mutations that Bowtie2 does). I was hoping to have a third assembler "take sides", but I think it's faster for us to sequence and be sure.

                    Comment


                    • #25
                      sdmoore,

                      By default, BBMap will look for much longer indels than BWA/Bowtie2, over 16000bp. You can limit this with the maxindel flag (e.g. "maxindel=40"). Soft-clipping (via "local" flag) can also reduce erroneous variation calls from chimeric or low-quality reads.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Best Practices for Single-Cell Sequencing Analysis
                        by seqadmin



                        While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                        06-06-2024, 07:15 AM
                      • seqadmin
                        Latest Developments in Precision Medicine
                        by seqadmin



                        Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                        Somatic Genomics
                        “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                        05-24-2024, 01:16 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 06-14-2024, 07:24 AM
                      0 responses
                      12 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 06-13-2024, 08:58 AM
                      0 responses
                      13 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 06-12-2024, 02:20 PM
                      0 responses
                      17 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 06-07-2024, 06:58 AM
                      0 responses
                      184 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X