Header Leaderboard Ad

Collapse

using khist to generate kmer coverage histogram

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • using khist to generate kmer coverage histogram

    I used khist from bbmap to generate kmer coverage, the attached included the output and the log file. There are a few things I am not very clear. what's different between "Raw count and unique_kmer" in the output file?
    Is "unique kmer" defined as "the kmer only appear once in the dataset"? what's the differences between unique kmer and the kmers whose depth is one?
    When I plot the dataset for coverage, which column should I use? raw count or unique kmer?

    The data is a metagenome data.

    Thanks
    Attached Files

  • #2
    The program (conceptually) allocates one counter per kmer, and increments it every time that kmer is seen. So, let's say K=2 and the input string is "CATTATTT".

    That breaks down into these kmers:
    CA, AT, TT, TA, AT, TT, TT

    After reverse-complementing to store only a single canonical copy, either forward or reverse, we get this:

    CA, AT, AA, AT, AT, AA, AA

    So the kmer counts stored by the program would be:

    AA: 3
    AT: 3
    CA: 1

    This would equate to 3 unique kmers. The histogram would look like this:
    Code:
    #Depth	Raw_Count	Unique_Kmers
    1	1	1
    3	6	2
    Generally, (column 3) = (column 2)/(column 1).

    So line 1 means there was a single kmer (CA) that occurred exactly once, and it was counted exactly once. Line 2 means that there were 2 unique kmers (AT and AA) that each occurred 3 times, for a total of 6 occurrences.

    Therefore - if you want to plot the coverage with respect to the genome, I suggest plotting the "unique" column. And to clarify, the number of "unique kmers" is not the same as the number of kmers that only occur once (I would call those "singleton kmers") - the second number of row 1 gives you the number of singleton kmers counted (1, in this case).
    Last edited by Brian Bushnell; 08-29-2014, 12:52 PM.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      How RNA-Seq is Transforming Cancer Studies
      by seqadmin



      Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
      09-07-2023, 11:15 PM
    • seqadmin
      Methods for Investigating the Transcriptome
      by seqadmin




      Ribonucleic acid (RNA) represents a range of diverse molecules that play a crucial role in many cellular processes. From serving as a protein template to regulating genes, the complex processes involving RNA make it a focal point of study for many scientists. This article will spotlight various methods scientists have developed to investigate different RNA subtypes and the broader transcriptome.

      Whole Transcriptome RNA-seq
      Whole transcriptome sequencing...
      08-31-2023, 11:07 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 09:05 AM
    0 responses
    14 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 09-21-2023, 06:18 AM
    0 responses
    11 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 09-20-2023, 09:17 AM
    0 responses
    13 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 09-19-2023, 09:23 AM
    0 responses
    28 views
    0 likes
    Last Post seqadmin  
    Working...
    X