Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • I will update the description; it's a bit misleading. What it means is, reads with average quality below the setting will not be mapped. All of the reads will still be printed to "out", even if they failed the quality threshold. However, if you use the "outm" and "outu" streams, only mapped reads will go to outm, and only unmapped reads will go to outu. So, maq would suppress low-quality reads from going to outm.

    Thanks for noting that the long version of the flag does not work. It's actually "minavgquality"; I'll fix that.

    Comment


    • Seem my response in this thread: http://seqanswers.com/forums/showthr...313#post201313

      But basically, yes. And no, you don't need to be in the /ref/ directory, you need to be in the directory containing the ref directory.

      By default, there is only one alignment per read.

      Comment


      • Good afternoon Brian,
        In my lab we are dealing with a couple of thousands of whole genome seq. data. We are running some tests with BBtools and its performance looks really promising.
        We would like to use Lumpy for structural variants call and this tool requires as inputs beside a "normal" BAM file also a bam file containing unpaired and split reads.
        I have seen that it is possible to run BBmap in local alignment mode but is there any way to identify chimeras with BBmap?
        Thanks in advance.

        Comment


        • BBMap does not have any explicit chimera identification capabilities; that would need to be done in post-processing. Also, it does not produce split alignments, but it does do a very good job of identifying long deletions, which it will capture in a single alignment.

          You can run BBMap like this:

          Code:
          bbmap.sh in=reads.fq outm=normal.bam outu=unmapped.fq pairedonly ambig=all maxindel=400k
          That will produce a bam file of the normally-mapped pairs. Then, mapping with a second pass:

          Code:
          bbmap.sh in=unmapped.fq out=odd.bam ambig=all local maxindel=400k
          That will produce a file containing all of the unmapped and unpaired reads. I have not used Lumpy so I'm not sure if this is the best way to do it, but that sounds approximately like what it wants.

          Comment


          • So, the first alignment is run in global alignment mode, while the second one is run in local mode allowing indels up to 400 kb, isn't it?
            Sounds like a good solution thanks.
            We will try this option and check whether Lumpy likes it.

            Comment


            • The default is roughly 76% identity. You can adjust this with the "minid" flag (e.g. "minid=0.80" for 80% identity.) If you want to restrict alignments to a maximum number of substitutions, you can use "subfilter"; e.g., "subfilter=5" will discard alignments with more than 5 substitutions.

              That said, what exactly are you trying to do?
              Last edited by Brian Bushnell; 11-28-2016, 10:35 AM.

              Comment


              • BBMap can output coverage, by the way. Just add the flag "covstats=covstats.txt". You may also want to add the flag "delcov=f" for this purpose.

                Comment


                • Pileup won't accept multiple input files, but it will accept standard in. So if the files were sam rather than bam, you could do this:

                  Code:
                  cat lib1.sam lib2.sam | pileup.sh in=stdin.sam out=stats.txt hist=histogram.txt
                  or for gzipped files:

                  Code:
                  cat lib1.sam.gz lib2.sam.gz | pileup.sh in=stdin.sam.gz out=stats.txt hist=histogram.txt
                  However, I don't see a way to do that with bam files. BBMap will produce sam files instead of bam files if you name the output *.sam instead of *.bam.

                  Comment


                  • Pileup uses the header lines to determine the names and lengths of the sequences in the reference. So, they must be present. However, it doesn't really care whether there are extra ones randomly in the file; they just get ignored. So, cat works fine.

                    Comment


                    • Originally posted by moistplus
                      Can you confirm that pileup doesn't need sam files to be sorted ?
                      Yes, that's correct.

                      Comment


                      • Hi Brian,

                        I like to use freebayes for the vcf creation because all of our stuff is haploid, however I do not know how to work with the ref file that BBMap makes during a mapping. Freebayes accepts a fasta for the ref. The ref folder contains a compressed chromosome file which freebayes doesn't like and when I use the fasta that I feed to BBmap it doesn't like the header mismatch.

                        Can you tell me how to overcome this issue?

                        Comment


                        • Unfortunately, it sounds like Freebayes is trimming the names of the sequences after the first whitespace when loading them from the fasta file, but it's not doing that when looking at the sam file. To fix that problem, unfortunately you'd need to rerun the mapping, adding the "trd" flag (trim read description). Additionally, Freebayes is not compliant with the current Sam specification and does not understand cigar strings with "=" and "X" symbols, so you need to output in legacy Sam format with M symbols. Therefore, for Freebayes, you need to add two extra flags when mapping:

                          Code:
                          trd sam=1.3

                          Comment


                          • Thank you Brain!

                            Is there a variant calling pipeline that works best with BBmap?

                            Comment


                            • If you are going to re-do mapping then clean up the fasta ID's (and regenerate BBMap index) in your reference genome, if there are spaces in the words there.

                              BBMap suite now has a variant caller (callvariants.sh) :-)

                              Comment


                              • Originally posted by skbrimer View Post
                                Thank you Brain!

                                Is there a variant calling pipeline that works best with BBmap?
                                Funny you should ask! I just finished developing a variant-caller, because I was unhappy with the speed, usability, and some other aspects of Samtools mpileup and GATK (I've never used Freebayes, though). Also, it has a default ploidy of 1 which better reflects projects at JGI, though the ploidy can be changed to an arbitrary number.

                                I just uploaded a new version of BBTools (36.71) with the variant caller (older versions do not support vcf output). You can run it like this:

                                Code:
                                callvariants.sh in=mapped.sam vcf=vars.vcf ref=ref.fasta ploidy=1
                                CallVariants works on sam or bam, sorted or unsorted, and it's really fast. Also, it does not care what format the cigar strings are in or whether you trimmed sequence names after the first whitespace. I'd be eager to hear how it compares to FreeBayes, if you'd like to try it out. So far, other people at JGI have indicated that it does a much better job of identifying low-depth variants in high-ploidy organisms than Samtools or GATK (if you adjust the ploidy and minallelefraction flags).

                                That said, no, I don't have a recommended variant-calling workflow for BBMap, because I have not done rigorous benchmarking of all the possibilities yet. Oh, and to modify my previous post, for variant calling I suggest you also add the "mdtag" flag, which some variant-callers like to have. So, for Freebayes:

                                Code:
                                bbmap.sh in=reads.fq ref=ref.fa out=mapped.sam mdtag trd sam=1.3

                                Originally posted by GenoMax View Post
                                BBMap suite now has a variant caller (callvariants.sh) :-)
                                Indeed But if you use it, I highly recommend downloading BBMap 36.71, as prior versions lack proper VCF output.
                                Last edited by Brian Bushnell; 12-14-2016, 10:12 AM.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Exploring the Dynamics of the Tumor Microenvironment
                                  by seqadmin




                                  The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                                  07-08-2024, 03:19 PM
                                • seqadmin
                                  Exploring Human Diversity Through Large-Scale Omics
                                  by seqadmin


                                  In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                                  06-25-2024, 06:43 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 07-10-2024, 07:30 AM
                                0 responses
                                30 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-03-2024, 09:45 AM
                                0 responses
                                202 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-03-2024, 08:54 AM
                                0 responses
                                212 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-02-2024, 03:00 PM
                                0 responses
                                194 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X