Seqanswers Leaderboard Ad

**pallevillesen** · 12-12-2012, 01:34 AM

Your 250 bp library looks a little weird (nearly bimodal - or a very "broad" peak).

Other than that you're right - you'll expect up to 4 peaks though:

1. Depth 1-2: sequencing errors
2. Heterozygote positions (small peak) - a small bump with low depth from kmer covering heterozygote positions
3. The large peak - the typical coverage (the one you clearly see in your 500bp library) - used for genome size estimate.
4. The repeat peak - a small bump with high depth covering repeat regions

This is assuming that a random 17mer is typically unique in the genome. But no matter what: two libraries may scale differently (i.e. different coverage due to library size differences) - but the shape of the kmer spectrum should NOT be different - and it is in your case.

What about quality check of the two libraries (fastqc?)

**Gorgarian** · 12-12-2012, 02:24 PM

OK, thanks for that advice. I installed fastqc and ran the 250 and 500 fastq files through it, and all looks good. I have attached examples of the fastqc output.

Maybe there is a double peak in there in the 250 set (leading to the "broad peak") and the double peak becomes better defined as I add in more data from the 500 and 800bp reads. There may be no problem at all?

Maybe do a subtraction between the 250 kmer set and the 500 kmer set to see if there is any systematic difference in representation. Might that clear the issue up? Any idea on how to do such a subtraction on jellyfish output files?

Attached Files

qc.pdf (46.4 KB, 50 views)

**pallevillesen** · 12-13-2012, 12:06 AM

I think there is a problem - but maybe that relates to the genome of your sample(?)

Is it a secret organism - or can you reveal anything? I thought a little more and I have more ugly suggestion: contamination (if you're sampling two genome with different coverage, you'll also get two peaks).

I would probably try and assemble it (if it's an unknown organism) - and then maybe remap all the 500bp lib reads to the genome - the scaffolds with reads are from your target organism.

Then the scaffolds only getting hits from the 250bp library and NOT the 500bp library is the "contaminant" - then you can blast and check it.

A lot of work - maybe it's not worth it - depends on your question/project.

On topic: I don't know how to subtract two jellyfish kmer spectra.

Topics	Statistics	Last Post
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, Yesterday, 02:46 PM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 13 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 17 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, 05-02-2024, 08:06 AM	0 responses 23 views 0 likes	Last Post by seqadmin 05-02-2024, 08:06 AM

Seqanswers Leaderboard Ad

Announcement

Kmer spectrum question

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News