Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • cyyuan
    Junior Member
    • Jan 2012
    • 2

    Kmer Distribution Problem

    Hi there,

    I have sequenced four fungi strains by illumina Hiseq. Three of them were assembled well expect one strain. I made a kmer distribution of the reads by jellyfish, and I found that there is no peak on the curve. The total amount of data is more than 50X. What's problem could cause this result? Does anyone can give a suggestion? Thanks!
    Click image for larger version

Name:	A5_GCCAAT_L004_R1_001.fastq.hist.png
Views:	1
Size:	7.5 KB
ID:	307815
  • Wallysb01
    Senior Member
    • Feb 2011
    • 286

    #2
    Originally posted by cyyuan View Post
    Hi there,

    I have sequenced four fungi strains by illumina Hiseq. Three of them were assembled well expect one strain. I made a kmer distribution of the reads by jellyfish, and I found that there is no peak on the curve. The total amount of data is more than 50X. What's problem could cause this result? Does anyone can give a suggestion? Thanks!
    [ATTACH]1406[/ATTACH]
    It looks like you have a little bit of a peak at about 15. But have you done any quality trimming? If so, what kind? You might want to play around with different quality cut offs, or simply taking off the last X-bps, or some combination of both. Usually the high numbers of unique or low occurrence kmers is simply a product of sequencing errors.

    The other option is heterozygosity/ploidy. So, if you're sequencing a very diverse set of individuals, you'll have lower occurrence kmers, in general. Of course, depending on what you're sequencing, you might not be able to get around this. But usually people try to sequence one individual, or a clonal set of individuals in order to create their reference genome.

    Comment

    • cyyuan
      Junior Member
      • Jan 2012
      • 2

      #3
      Thanks for you reply!!

      Originally posted by Wallysb01 View Post
      It looks like you have a little bit of a peak at about 15. But have you done any quality trimming? If so, what kind? You might want to play around with different quality cut offs, or simply taking off the last X-bps, or some combination of both. Usually the high numbers of unique or low occurrence kmers is simply a product of sequencing errors.
      This is the original data, I haven't made any quality trimming on it. And the reads quailty is similar to the other three strains.

      The other option is heterozygosity/ploidy. So, if you're sequencing a very diverse set of individuals, you'll have lower occurrence kmers, in general. Of course, depending on what you're sequencing, you might not be able to get around this. But usually people try to sequence one individual, or a clonal set of individuals in order to create their reference genome.
      We always extract DNA from a single colony, but I am not sure whether it is heterozygosity. I will check it later. Is it possible it is caused by the sequencing library, which is not well built?

      Comment

      • Wallysb01
        Senior Member
        • Feb 2011
        • 286

        #4
        Its hard to know without more information, though its interesting that the other libraries are not producing this same thing while you seem confident the quality is similar between them.

        I can only guess that something less than ideal might have happened during the illumina library prep or during the run itself, which is not at all uncommon, and you are getting some strange bias that won't be shown in the quality scores. So, you might look at the nucleotide distribution across the length of the read. If you see things bouncing around in places, you should trim off those bases.

        I might be able to help more if you can give me information about each illumina run (i.e., did you barcode, what went into each lane), and some basic quality stats. I know absolutely nothing about any fungus specific issues, however, so if the problem is related to that, you'll have to hope someone else stops by.

        Comment

        Latest Articles

        Collapse

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, 06-05-2026, 10:09 AM
        0 responses
        11 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-04-2026, 08:59 AM
        0 responses
        23 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-02-2026, 12:03 PM
        0 responses
        28 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-02-2026, 11:40 AM
        0 responses
        22 views
        0 reactions
        Last Post SEQadmin2  
        Working...