Unconfigured Ad

**HESmith** · 06-07-2017, 06:49 AM

You can determine how much of your genome is covered, and at what read depth, using BEDTools 'genomecov' command (see here for description).

**mido1951** · 06-07-2017, 06:59 AM

so BBmap do not do that?
i must use BEDToos?
Thanks

**HESmith** · 06-07-2017, 07:05 AM

Try BBMap's 'covstats' command - I think it provides a summary of coverage.

**Brian Bushnell** · 06-07-2017, 08:44 AM

Yep - you can use "covstats=covstats.txt covhist=covhist.txt" to print the coverage information of the whole genome as well as on a per-scaffold basis, and also give a distribution of base coverage over the genome. For more information see the "Coverage output parameters" section of bbmap.sh. Once the reads are mapped, using the sam or bam file, you can do the same thing with pileup.sh.

**mido1951** · 06-08-2017, 02:04 AM

thank you for your response.
Can you explain me this output please.

Average coverage: 25,49
Standard deviation: 6,38
Percent scaffolds with any coverage: 100,00
Percent of reference bases covered: 99,85

and in covstats.txt:

Code:

#ID	Avg_fold	Length	Ref_GC	Covered_percent	Covered_bases	Plus_reads	Minus_reads	Read_GC	Median_fold	Std_Dev
gi|556503834|ref|NC_000913.3| Escherichia coli str. K-12 substr. MG1655, complete genome	26,6251	4641652	0,5079	99,8479	4634590	123276	118657	0,5042	26	6,38

**GenoMax** · 06-08-2017, 03:43 AM

Did you inspect the alignment using IGV or a genome browser of some kind? It should be reasonably clear what those numbers mean. Looks like you have a small fraction of the genome (0.15%) that does not have at least one read covering it.

Differences like this could be due to your strain being slightly different than the reference sequence.

**mido1951** · 06-08-2017, 03:50 AM

I did a mapping of the long reads on a reference genome using bbmap (with covstats option). I wanted to know if all the reference genome is covred by the long reads.

I do not mean what the different words mean: "Average coverage, Standard deviation, Percent scaffolds with any coverage, Percent of reference bases covered".??

How can I know if the whole reference genomeis covered by long reads?

Thanks.

**GenoMax** · 06-08-2017, 03:54 AM

Since the Percent of reference bases covered: 99,85 is not 100% there are at least 0.15% of bases that are not covered by one read, long or otherwise.

**Brian Bushnell** · 06-08-2017, 09:04 AM

Bacteria have circular genomes, represented in fasta files as linear with a breakpoint somewhere. Mapping (and thus coverage calculation) is less accurate at the ends due to the artificial break; BBMap will place a read spanning the break on either the left end or the right end, but not both. You might want to look at the ends to see if the uncovered bases are there.

**mido1951** · 06-08-2017, 11:44 PM

hello Brian,
Can you explain me the algorithm of BBmap please?
Thanks

**Brian Bushnell** · 06-09-2017, 09:52 AM

That would require weeks of work and is out of the scope of this forum... but I suggest you read this paper:

http://bib.irb.hr/datoteka/773712.Igor_Jerkovic_diplomski.pdf

**mido1951** · 06-11-2017, 08:31 AM

ok. thank you for the link.
I have one more question to the output file of BBmap.
This is an output file:

Code:

Read 1 data:            pct reads       num reads       pct bases          num bases

mapped:                  99,7746%         2365043        99,8117%         1168097245
unambiguous:             96,5758%         2289219        96,6551%         1131155596
ambiguous:                3,1988%           75824         3,1566%           36941649
low-Q discards:           0,0000%               0         0,0000%                  0

perfect best site:       36,0494%          854511        36,4545%          426626981
semiperfect site:        36,0505%          854537        36,4556%          426639736

Match Rate:                   NA               NA        99,5357%         1166772129
Error Rate:              63,8415%         1510509         0,4634%            5431521
Sub Rate:                13,2884%          314407         0,0677%             794173
Del Rate:                56,8578%         1345272         0,3512%            4117267
Ins Rate:                15,1778%          359111         0,0444%             520081
N Rate:                   0,0128%             302         0,0009%              10862

Average coverage:                       96,08
Standard deviation:                     26,28
Percent scaffolds with any coverage:    100,00
Percent of reference bases covered:     99,28

can you explain me "perfect best site" and "semiperfect site"
and
if "pct reads" is the percent of mapped reads. are there "num reads" are the number of mapped reads? because if t'is the number of mapped reads i haven't 2365043 reads?

and why the Error rate for "pct reads" is high?
thanks for your help.

**Brian Bushnell** · 06-11-2017, 11:39 AM

"perfect" means every base in the read matched the reference (no mismatches, indels, or Ns). "semiperfect" is similar but it allows Ns in the reference.

The "num reads" column in "mapped" row indicates the number of reads that were mapped.

Don't worry about the high error rate. That just indicates that 63.8% of your reads had at least one error (mismatching base, indel, etc). The more important number is the % of bases, which is only 0.46%, so the reads have roughly 99.5% identity to the reference.

**mido1951** · 06-11-2017, 01:11 PM

Originally posted by Brian Bushnell View Post

The "num reads" column in "mapped" row indicates the number of reads that were mapped.

Sorry but i had juste ~27000 reads not 2365043??
thank your for your response.

Topics	Statistics	Last Post
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, Today, 05:37 AM	0 responses 5 views 0 reactions	Last Post by SEQadmin2 Today, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 109 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM

Unconfigured Ad

BBMAP mapping tool

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News