Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Brian Bushnell
    replied
    Hmmm, I wrote a program that does this. Well, two, actually. Their usage is about the same.

    khist.sh in=reads.fq khist=khist.txt peaks=peaks.txt
    or
    kmercountexact.sh in=reads.fq khist=khist.txt peaks=peaks.txt

    The first uses approximate counts, while the second uses exact counts (and thus potentially more memory). The peaks file header contains estimates of genome size and heterozygousity. You can also add the flag "ploidy=2" for diploid organisms, so that it won't need to autodetect the ploidy (and thus potentially make a mistake).

    These are both distributed with BBTools.

    Leave a comment:


  • WormSeeq
    replied
    I just came across this paper on arxiv "Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects"
    Background: With the fast development of next generation sequencing technologies, increasing numbers of genomes are being de novo sequenced and assembled. However, most are in fragmental and incomplete draft status, and thus it is often difficult to know the accurate genome size and repeat content. Furthermore, many genomes are highly repetitive or heterozygous, posing problems to current assemblers utilizing short reads. Therefore, it is necessary to develop efficient assembly-independent methods for accurate estimation of these genomic characteristics. Results: Here we present a framework for modeling the distribution of k-mer frequency from sequencing data and estimating the genomic characteristics such as genome size, repeat structure and heterozygous rate. By introducing novel techniques of k-mer individuals, float precision estimation, and proper treatment of sequencing error and coverage bias, the estimation accuracy of our method is significantly improved over existing methods. We also studied how the various genomic and sequencing characteristics affect the estimation accuracy using simulated sequencing data, and discussed the limitations on applying our method to real sequencing data. Conclusion: Based on this research, we show that the k-mer frequency analysis can be used as a general and assembly-independent method for estimating genomic characteristics, which can improve our understanding of a species genome, help design the sequencing strategy of genome projects, and guide the development of assembly algorithms. The programs developed in this research are written using C/C++, and freely accessible at Github URL (https://github.com/fanagislab/GCE) or BGI ftp ( ftp://ftp.genomics.org.cn/pub/gce).

    I have not tried their tool though! It is available at ftp://ftp.genomics.org.cn/pub/gce/
    Best,
    ~wormSeeq.

    Leave a comment:


  • Melissa
    replied
    Perhaps you want to look into Ka/Ks estimation.

    Leave a comment:


  • ebioman
    replied
    push

    Actually I have no responds neither, I am afraid.
    I am just asking myself the same question and wondered whether you were able to solve that question ?

    Leave a comment:


  • Estimating heterozygosity from kmer frequency distribution

    Is there a program that can estimate the heterozygosity of a sample using the kmer frequency distribution of the raw reads? I have whole genome, Illumina data (100bp PE reads, from 300bp fragments). The kmer frequency plot has a clear bimodal distribution, so I can get a rough estimate by eyeballing the areas under the curves for the two peaks. I am hoping to find a more robust method and more automated since I have over 100 samples.

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Technologies
    by seqadmin







    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

    Long-Read Sequencing
    Long-read sequencing has...
    12-02-2024, 01:49 PM
  • seqadmin
    Genetic Variation in Immunogenetics and Antibody Diversity
    by seqadmin



    The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
    11-06-2024, 07:24 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 12-02-2024, 09:29 AM
0 responses
139 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-02-2024, 09:06 AM
0 responses
50 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-02-2024, 08:03 AM
0 responses
38 views
0 likes
Last Post seqadmin  
Started by seqadmin, 11-22-2024, 07:36 AM
0 responses
70 views
0 likes
Last Post seqadmin  
Working...
X