Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BBMAP mapping tool

    Hello,
    I wanted to know if bbmap gives me the following informations because I have not found in the tool's description.
    Does BBmap only give me statistics on the mapping of reads on the reference genome?
    Can it give me information if the reads cover the entire reference genome ?
    thanks

  • #2
    You can determine how much of your genome is covered, and at what read depth, using BEDTools 'genomecov' command (see here for description).

    Comment


    • #3
      so BBmap do not do that?
      i must use BEDToos?
      Thanks

      Comment


      • #4
        Try BBMap's 'covstats' command - I think it provides a summary of coverage.

        Comment


        • #5
          Yep - you can use "covstats=covstats.txt covhist=covhist.txt" to print the coverage information of the whole genome as well as on a per-scaffold basis, and also give a distribution of base coverage over the genome. For more information see the "Coverage output parameters" section of bbmap.sh. Once the reads are mapped, using the sam or bam file, you can do the same thing with pileup.sh.

          Comment


          • #6
            thank you for your response.
            Can you explain me this output please.

            Average coverage: 25,49
            Standard deviation: 6,38
            Percent scaffolds with any coverage: 100,00
            Percent of reference bases covered: 99,85


            and in covstats.txt:

            Code:
            #ID	Avg_fold	Length	Ref_GC	Covered_percent	Covered_bases	Plus_reads	Minus_reads	Read_GC	Median_fold	Std_Dev
            gi|556503834|ref|NC_000913.3| Escherichia coli str. K-12 substr. MG1655, complete genome	26,6251	4641652	0,5079	99,8479	4634590	123276	118657	0,5042	26	6,38

            Comment


            • #7
              Did you inspect the alignment using IGV or a genome browser of some kind? It should be reasonably clear what those numbers mean. Looks like you have a small fraction of the genome (0.15%) that does not have at least one read covering it.

              Differences like this could be due to your strain being slightly different than the reference sequence.

              Comment


              • #8
                I did a mapping of the long reads on a reference genome using bbmap (with covstats option). I wanted to know if all the reference genome is covred by the long reads.

                I do not mean what the different words mean: "Average coverage, Standard deviation, Percent scaffolds with any coverage, Percent of reference bases covered".??

                How can I know if the whole reference genomeis covered by long reads?

                Thanks.

                Comment


                • #9
                  Since the Percent of reference bases covered: 99,85 is not 100% there are at least 0.15% of bases that are not covered by one read, long or otherwise.
                  Last edited by GenoMax; 06-08-2017, 04:02 AM.

                  Comment


                  • #10
                    Bacteria have circular genomes, represented in fasta files as linear with a breakpoint somewhere. Mapping (and thus coverage calculation) is less accurate at the ends due to the artificial break; BBMap will place a read spanning the break on either the left end or the right end, but not both. You might want to look at the ends to see if the uncovered bases are there.

                    Comment


                    • #11
                      hello Brian,
                      Can you explain me the algorithm of BBmap please?
                      Thanks

                      Comment


                      • #12
                        That would require weeks of work and is out of the scope of this forum... but I suggest you read this paper:

                        Comment


                        • #13
                          ok. thank you for the link.
                          I have one more question to the output file of BBmap.
                          This is an output file:
                          Code:
                          Read 1 data:            pct reads       num reads       pct bases          num bases
                          
                          mapped:                  99,7746%         2365043        99,8117%         1168097245
                          unambiguous:             96,5758%         2289219        96,6551%         1131155596
                          ambiguous:                3,1988%           75824         3,1566%           36941649
                          low-Q discards:           0,0000%               0         0,0000%                  0
                          
                          perfect best site:       36,0494%          854511        36,4545%          426626981
                          semiperfect site:        36,0505%          854537        36,4556%          426639736
                          
                          Match Rate:                   NA               NA        99,5357%         1166772129
                          Error Rate:              63,8415%         1510509         0,4634%            5431521
                          Sub Rate:                13,2884%          314407         0,0677%             794173
                          Del Rate:                56,8578%         1345272         0,3512%            4117267
                          Ins Rate:                15,1778%          359111         0,0444%             520081
                          N Rate:                   0,0128%             302         0,0009%              10862
                          
                          Average coverage:                       96,08
                          Standard deviation:                     26,28
                          Percent scaffolds with any coverage:    100,00
                          Percent of reference bases covered:     99,28
                          can you explain me "perfect best site" and "semiperfect site"
                          and
                          if "pct reads" is the percent of mapped reads. are there "num reads" are the number of mapped reads? because if t'is the number of mapped reads i haven't 2365043 reads?

                          and why the Error rate for "pct reads" is high?
                          thanks for your help.
                          Last edited by mido1951; 06-11-2017, 08:34 AM.

                          Comment


                          • #14
                            "perfect" means every base in the read matched the reference (no mismatches, indels, or Ns). "semiperfect" is similar but it allows Ns in the reference.

                            The "num reads" column in "mapped" row indicates the number of reads that were mapped.

                            Don't worry about the high error rate. That just indicates that 63.8% of your reads had at least one error (mismatching base, indel, etc). The more important number is the % of bases, which is only 0.46%, so the reads have roughly 99.5% identity to the reference.

                            Comment


                            • #15
                              Originally posted by Brian Bushnell View Post
                              The "num reads" column in "mapped" row indicates the number of reads that were mapped.
                              Sorry but i had juste ~27000 reads not 2365043??
                              thank your for your response.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                Yesterday, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              58 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              45 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              55 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X