Originally posted by aaronrjex
View Post
Header Leaderboard Ad
Collapse
How to estimating the genome size
Collapse
Announcement
Collapse
No announcement yet.
X
-
-
Hello folks,
I am trying to understand the genome size estimation method, and here is the part that is not clear to me.
BGI is estimating coverage depth by peak of frequency histogram, but in highly repetitive genomes the peak shifts down. E.g. check the second chart here, where we generated simulated reads from sea urchin genome at 50X coverage, but the histogram peak came at ~40 due to repeats.
If that is true, how does BGI estimate genome size so precisely? In later section of bat paper, they claimed that the assembly size is close to the 'estimated size', which is puzzling given the impreciseness in genome size estimation.
Maybe I am missing something. Please help !!
---------------
Edit. I am rerunning the above simulation to make sure everything is done correctly. Results will be reported here.Last edited by samanta; 04-10-2013, 04:59 PM.
Comment
-
Lizhenyu from BGI explained where I made the error in thinking. When a read has 100 nucleotides and is split into 21-mers, the read will produce only 80 k-mers, not 100. Here is his full response, which agrees with aaronrjex's post above.
"Hi,
I think you mixed the concepts of base coverage depth and kmer coverage depth.
When you refered 50X genome coverage, it meant base coverage which is obtained by
total_base_num/genome_size=(read_num*read_length)/genome_size.
Similarly, the kmer coverage depth, the peak value in kmer frequency curve, is calculated by
total_kmer_num/genome_size=read_num*(read_length-kmer_size+1)/genome_size.
So the relationship between base coverage depth and kmer coverage depth is:
kmer_coverage_depth = base_coverage_depth*(read_length-kmer_size+1)/read_length.
In your case, kmer_coverage_depth = 50 * (100 - 21 + 1)/100 = 40, which is exactly
the peak value in you plot.
best,"
Comment
Latest Articles
Collapse
-
by seqadmin
Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...-
Channel: Articles
09-07-2023, 11:15 PM -
-
by seqadmin
Ribonucleic acid (RNA) represents a range of diverse molecules that play a crucial role in many cellular processes. From serving as a protein template to regulating genes, the complex processes involving RNA make it a focal point of study for many scientists. This article will spotlight various methods scientists have developed to investigate different RNA subtypes and the broader transcriptome.
Whole Transcriptome RNA-seq
Whole transcriptome sequencing...-
Channel: Articles
08-31-2023, 11:07 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Multiplexed Biomarker Detection with Nanopore Technology: A Leap in Precision Diagnostics
by seqadmin
Started by seqadmin, Today, 07:42 AM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
Today, 07:42 AM
|
||
Started by seqadmin, 09-22-2023, 09:05 AM
|
0 responses
23 views
0 likes
|
Last Post
by seqadmin
09-22-2023, 09:05 AM
|
||
Started by seqadmin, 09-21-2023, 06:18 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
09-21-2023, 06:18 AM
|
||
Started by seqadmin, 09-20-2023, 09:17 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
09-20-2023, 09:17 AM
|
Comment