Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #76
    pairlen= broken in 33.41

    Hi Brian

    I was using pairlen=1200 to limit insert sizes for paired end mapping and with the newest version 33.41 bbmap reports unkown option, while in 33.40b it used to work. What has happened?

    Cheers
    Harald

    Comment


    • #77
      Harald,

      Thanks for noting that! I accidentally deleted the line that parsed that command. I've uploaded the fixed version, 33.42.


      On another topic, I'd like to announce that a second developer, Jon Rood, has begun porting certain aspects of BBTools over to C using JNI calls. The currently ported classes are BBMerge, BBMap, and Dedupe. This is optional (you can enable it at runtime with the 'usejni' flag), and the output is identical, but there is a substantial speedup:
      BBMap: +30%
      BBMerge: +60%
      Dedupe: up to +200% (when allowing an edit distance)

      If you are interested in a free speed increase, instructions for compiling the C code for OS X or Linux are in /bbmap/jni/README.txt

      Comment


      • #78
        Hello,

        I hope this is the right place to post questions related to BBMap, but since the last reply wasn't too long ago....

        I've been using BBMap to map paired-end Fastq reads where the headers have been renamed for downstream analysis ("_1" and "_2" have been added to header names for forward and reverse reads respectively). When I look at the SAM file from the mapping, the forward and reverse reads that map have nonzero POS field, but the PNEXT fields are always zero. Is this caused by my editing the read names? Bowtie2 doesn't have the same problem, and assembling with SPAdes and IDBA-UD worked normally with the edited read names.

        Example of the SAM entries for a read pair:
        HWI-ST863:279:H03F7ADXX:1:1101:7656:2184_1 16 NODE_207_length_5463_cov_10917.5_ID_413 125 44 13=1X137= * 0 0 [...] [...] NM:i:1 AM:i:44
        HWI-ST863:279:H03F7ADXX:1:1101:7656:2184_2 0 NODE_207_length_5463_cov_10917.5_ID_413 5275 45 151= * 0 0 [...] [...] NM:i:0 AM:i:45
        Thanks a lot in advance for your help.
        Brandon

        Comment


        • #79
          Brandon,

          Those were not recognized as paired. BBMap recognizes only the normal Illumina naming schemes:

          "* /1"
          and
          "* 1:"

          If those reads are interleaved in a single file, use the "int=t" flag which will force BBMap to recognized them as being interleaved.

          Comment


          • #80
            bbmap hitstats - unambiguous Hits

            Hi Brian
            I was looking at the hitstats files and I realized that the %unambiguousReads
            can add up to more than 100%, and similarly the unambiguousReads count can be higher than the total input reads... How can that be?

            Cheers
            Harald

            Comment


            • #81
              Harald,

              Thanks for noticing that. It works correctly for single-ended reads, but it appears that improper pairs (where one read maps to one scaffold, and the other maps to a different scaffold) are double-incrementing the counts on both scaffolds. I'll fix that in the next release.

              -Brian

              Comment


              • #82
                Thanks for the quick reply, Brian!

                Comment


                • #83
                  Originally posted by kbseah View Post
                  Thanks for the quick reply, Brian!
                  You're welcome!
                  Originally posted by HGV View Post
                  Hi Brian
                  I was looking at the hitstats files and I realized that the %unambiguousReads
                  can add up to more than 100%, and similarly the unambiguousReads count can be higher than the total input reads...
                  Fixed now, as of v33.46.

                  Comment


                  • #84
                    Hi,

                    I'm developing a tool for analysis of sequence reads from viral genetic material, and mapping to reference viral genomes is part of the process. First version used bowtie2, but now I'm trying to make a new version with better user flexibility, more options etc. And I'd like to use bbmap, as it seems to be the best for this purpose. The user manual to bbmap seems more straightforward, and also people I talked to who used both mappers told me bbmap is generally better.

                    Now, I did some tests, e.g. I ran bbmap on made up sequences to see how it would perform. I created a fastq file with one read, and two fasta files as references. One fasta file had only one sequence similar to the read, while the other fasta file had 4 such sequences (one of which was the same as the one in the other file, and this one was most similar). Naturally, the read was always mapped to the sequence with highest similarity. The mapping quality was the same in the 2 cases. According to that, I can say that mapping quality is completely independent from other sequences in the reference, it only depends on the read and the particular reference sequence to which it was mapped. I'm not sure, though, so I wanted to ask whether this assumption is actually true. Thanks a lot.

                    Comment


                    • #85
                      The mapping quality is dependent on the other reference sequences, but only if they are within some threshold of similarity (roughly 6 edits) to the best site. If you copy the same reference sequence twice in the fasta file, the read will map ambiguously and get a score of 3 or less. If you add one or two edits to one copy, the read will map to the unedited one but get a reduced score. But if they differ by, say, 10 edits, then the best mapping location will not get any score penalty. The penalty is also influenced by the number of alternative sites; for example, if there are 5 sites that are each 2 edits worse than the best site, that will give a greater score penalty than if there is only one alternative site.

                      Comment


                      • #86
                        pileup.sh inconsistent with samtools pileup

                        Hi Brian,

                        I tried using the "fastaorf" function in pileup.sh, to look at the read depth for a bunch of ORFs predicted with Prodigal. The input is in Prodigal's output format as specified. However, I get per-orf coverage results (the depthSum field) that are inconsistent with the output from samtools mpileup.

                        Briefly: I produced a pileup file with samtools mpileup and for each orf, simply summed the read depth (4th column of the pileup file) for each position that falls within the orf.
                        I double-checked this by converting the Prodigal output to a BED feature table, and used bedtools multicov and the original BAM file to produce a per-feature read depth. This gives per-feature read depths which are not identical to what I got by summing the depths but roughly a multiple (i.e. plotting the depths per orf from both methods against each other gives an approximately linear relationship).
                        The output from pileup.sh, on the other hand, doesn't give anything close to a linear relationship.

                        Is there something different in how pileup.sh calculates the per-orf coverage?

                        Thanks a lot,
                        Brandon

                        Comment


                        • #87
                          Brandon,

                          I did introduce a bug recently when I added support for tracking only read start positions rather than total coverage, which manifested in some situations. It's fixed now and I just uploaded the fixed version (33.57). Would you mind downloading that and confirming whether it works correctly?

                          Thanks!

                          Comment


                          • #88
                            Hi Brian,

                            Thanks for your reply. Unfortunately the output seems to be the same. Should I send you the output from pileup.sh vs. the output from bedtools so that you can see what I mean?

                            Best,
                            Brandon

                            Originally posted by Brian Bushnell View Post
                            Brandon,

                            I did introduce a bug recently when I added support for tracking only read start positions rather than total coverage, which manifested in some situations. It's fixed now and I just uploaded the fixed version (33.57). Would you mind downloading that and confirming whether it works correctly?

                            Thanks!

                            Comment


                            • #89
                              Originally posted by kbseah View Post
                              Hi Brian,

                              Thanks for your reply. Unfortunately the output seems to be the same. Should I send you the output from pileup.sh vs. the output from bedtools so that you can see what I mean?

                              Best,
                              Brandon
                              Yes, please do, as well as the command line and stdout/stderr messages.

                              Comment


                              • #90
                                Hi Brian,

                                I would be interested in knowing when you intend to publish BBmap in a paper, can you enlighten us?

                                Cheers

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM
                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin




                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  Nobel Prize for MicroRNA Discovery
                                  This week,...
                                  10-07-2024, 08:07 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 05:31 AM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-24-2024, 06:58 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-23-2024, 08:43 AM
                                0 responses
                                50 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-17-2024, 07:29 AM
                                0 responses
                                58 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X