Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • You can reduce kmer length and increase sensitivity - change "k=13" to "k=11" and add "slow". But ultimately, since BBMap is a global aligner, it does not like large structural rearrangements when aligning long queries, since they do not fit into the model of "match/substitution/insertion/deletion" reported in single alignments.

    Comment


    • High error rate with BBmap?

      Hi Brian,

      while trying to call variants on a human MT DNA I noticed the unusually high error rate in mapping with bbmap (37.10 and 36.92 tested) compared to bwa (0.7.12-r1039). Both were run with the default parameters, and below is the output of samtools stats (after sorting and indexing bams). IGV shows many more inverted pairs with bbmap mapping, and this came as a surprise.

      I can provide details, however I prefer to share them offline due to the sensitivity of some of the data. Any hints on how to track down the issue?

      Many thanks!

      BBmap
      Code:
      BBM: raw total sequences:       537764
      BBM: filtered sequences:        0
      BBM: sequences: 537764
      BBM: is sorted: 1
      BBM: 1st fragments:     268882
      BBM: last fragments:    268882
      BBM: reads mapped:      256366
      BBM: reads mapped and paired:   230042  # paired-end technology bit set + both mates mapped
      BBM: reads unmapped:    281398
      BBM: reads properly paired:     228748  # proper-pair bit set
      BBM: reads paired:      537764  # paired-end technology bit set
      BBM: reads duplicated:  0       # PCR or optical duplicate bit set
      BBM: reads MQ0: 0       # mapped and MQ=0
      BBM: reads QC failed:   4282
      BBM: non-primary alignments:    0
      BBM: total length:      67353071        # ignores clipping
      BBM: bases mapped:      32229667        # ignores clipping
      BBM: bases mapped (cigar):      29307   # more accurate
      BBM: bases trimmed:     0
      BBM: bases duplicated:  0
      BBM: mismatches:        1672477 # from NM fields
      BBM: error rate:        5.706749e+01    # mismatches / bases mapped (cigar)
      BBM: average length:    125
      BBM: maximum length:    151
      BBM: average quality:   30.4
      BBM: insert size average:       255.7
      BBM: insert size standard deviation:    117.0
      BBM: inward oriented pairs:     60121
      BBM: outward oriented pairs:    2977
      BBM: pairs with other orientation:      48
      BBM: pairs on different chromosomes:    0
      BWA
      Code:
      BWA: raw total sequences:       537764
      BWA: filtered sequences:        0
      BWA: sequences: 537764
      BWA: is sorted: 1
      BWA: 1st fragments:     268882
      BWA: last fragments:    268882
      BWA: reads mapped:      266324
      BWA: reads mapped and paired:   248064  # paired-end technology bit set + both mates mapped
      BWA: reads unmapped:    271440
      BWA: reads properly paired:     245908  # proper-pair bit set
      BWA: reads paired:      537764  # paired-end technology bit set
      BWA: reads duplicated:  0       # PCR or optical duplicate bit set
      BWA: reads MQ0: 178     # mapped and MQ=0
      BWA: reads QC failed:   0
      BWA: non-primary alignments:    0
      BWA: total length:      67353071        # ignores clipping
      BWA: bases mapped:      33736838        # ignores clipping
      BWA: bases mapped (cigar):      32160973        # more accurate
      BWA: bases trimmed:     0
      BWA: bases duplicated:  0
      BWA: mismatches:        603462  # from NM fields
      BWA: error rate:        1.876380e-02    # mismatches / bases mapped (cigar)
      BWA: average length:    125
      BWA: maximum length:    151
      BWA: average quality:   30.4
      BWA: insert size average:       384.2
      BWA: insert size standard deviation:    1040.7
      BWA: inward oriented pairs:     67383
      BWA: outward oriented pairs:    3645
      BWA: pairs with other orientation:      112
      BWA: pairs on different chromosomes:    0

      Comment


      • I noticed this line:
        BBM: bases mapped (cigar): 29307 # more accurate
        I'm guessing you are not counting "=" and "X" symbols as mapped, just "M"... generally, I would classify "M", "=", "X", and "I" as mapped bases. What version of samtools are you using?

        Also, I find it odd that the insert size average and stdev differ so much. Often the mode or median is more stable as it is less influenced by outliers.

        By inverted pairs, do you mean both reads map to the same side of the reference? If not, I wonder if the fact that the reference is very small and circular could play some role in the differences.

        Anyway, please feel free to email me both bam files and the reference, and I'll look at them. It's possible that the circularity, or the fact that bwa looks for local rather than global alignments, is the primary factor behind the difference.

        Comment


        • Originally posted by Brian Bushnell View Post
          I noticed this line:
          Anyway, please feel free to email me both bam files and the reference, and I'll look at them.
          Thanks Brian, I sent you a PM with the details. I'm using samtools 1.3.1 (and the above is the result of samtools stats

          Comment


          • Hi Kristian,

            Is this a Nextera long-mate pair library? Those need special processing before they can be mapped. Or... can you give me any more information about the library construction, and the trimming methodology? The library has an extremely high error rate (particularly with read 2), less than half of the reads map to the mito, and it appears that both adapters and transposase are still present.... also, I'm measuring the median insert size as 159 (BBMap) or 133 (BBMerge), so there are a lot of pairs with insert size shorter than the sequenced read length; those might be displayed differently in IGV depending on whether the adapter portion was soft-clipped (which bwa would do by default) or not (bbmap does not soft-clip by default).

            I adapter-trimmed the reads and error-corrected them, but still under 50% map. I'm not really sure what's wrong with the library. But, I don't see anything unusual about the pairing orientations. I get 45.5670% properly paired with "rcs=f" (require correct strand = false) and 45.5481% with "rcs=t", so only 0.02% map in the wrong orientation.
            Last edited by Brian Bushnell; 04-17-2017, 02:38 PM.

            Comment


            • Hi Brian,

              Originally posted by Brian Bushnell View Post
              Is this a Nextera long-mate pair library?
              This is Nextera XT prep, and the adapters/linker were trimmed along with demultplexing directly through Illumina RTA pipeline. On my end the bbduk2 left and right trimming had almost no effect, and the splitnextera also had only a few hits (0.1%) to the adapter, so I'd guess the dataset is (for the most part) trimmed.
              Last edited by Kristian; 04-18-2017, 05:21 AM.

              Comment


              • Hi Brian,

                I'm wondering if there is any way you can get bbmap (or another of your tools) to give a consensus sequence of an alignment? I went to map to a ref sequence and then use that to create a consensus sequence and then use that to map again and then call variants.

                Thanks!

                Comment


                • Hi Jweger,

                  Sorry, I don't have anything that will do that. Clumpify allows something sort of similar; you can create consensus sequence from raw reads, and map those. But that loses per-base depth information which is important for variant-calling, so I don't think it's what you want.

                  Comment


                  • I was testing something with BBMap today and I realized I had forgotten something about how to use it and couldn't figure it out. I was mapping RNA-Seq and I thought that it would report spliced alignment cigars (using sam 1.3) as /[0-9]+M[0-9]+N[0-9]+M/ but it was reporting the introns as deletions with the [D] cigar value. Is that right or is there a way to get it to not do that?

                    Thanks-
                    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                    Salk Institute for Biological Studies, La Jolla, CA, USA */

                    Comment


                    • You can use the flag "intronlen" to control this. For example, "intronlen=100" will change the reporting so that all deletions of at least 100bp will reported using 'N' instead of 'D' in the cigar string.

                      Comment


                      • Oh yes! Thanks Brian.
                        /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                        Salk Institute for Biological Studies, La Jolla, CA, USA */

                        Comment


                        • Hi Brian,

                          Is there anyway to use bbmap (or any other of your tools) to map a read to a reference file and then trim anything to the left of the reference sequence?

                          For example

                          My reference is
                          XXXXXXXXXXXXXXXXXX

                          And my reads are

                          NNNNXXXXXXXXXXXXXXXXXX
                          NNNXXXXXXXXXXXXXXXXXX
                          NNXXXXXXXXXXXXXXXXXX

                          I want them to be
                          XXXXXXXXXXXXXXXXXX
                          XXXXXXXXXXXXXXXXXX
                          XXXXXXXXXXXXXXXXXX

                          I basically want to just trim anything of the left of the reads that doesn't match my reference? Thanks in advance.

                          Comment


                          • To do that, you'd need to trim all soft-clipped bases. I don't have any programs that will do so, but it looks like I could add it to Reformat without too much difficulty.

                            Comment


                            • Hi Brian,
                              Is there a way to set a minimum input fragment length when mapping with BBMap?

                              Comment


                              • You could pre-filter input data with
                                Code:
                                reformat.sh in=file.fq out=filt.fq minlength=N

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin




                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  Nobel Prize for MicroRNA Discovery
                                  This week,...
                                  10-07-2024, 08:07 AM
                                • seqadmin
                                  Recent Developments in Metagenomics
                                  by seqadmin





                                  Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                                  09-23-2024, 06:35 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Today, 06:35 AM
                                0 responses
                                7 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 02:44 PM
                                0 responses
                                7 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-11-2024, 06:55 AM
                                0 responses
                                15 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-02-2024, 04:51 AM
                                0 responses
                                111 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X