Hi all,
This is my first post on the forum, and I am new to genomic analysis so please bear with me.
I am doing a k-mer analysis on Illumina reads from 250bp, 500bp and 800bp insert libraries. K=17, using jellyfish
jellyfish count -m 17 -o a.out -C -c 7 -s 1000000000 -t 24 a.fas
The kmer spectra look normal when I run the analysis separately for each library, that is, a huge number of kmers represented only once or twice (read errors?), and a single mode, and long tail out to the right (non-orthogous kmers arising from repetitive elements?).
Here is the rub. Only the 250bp analysis yields a sensible estimate of genome size (1.8 Gb, estimated independently) (using number of Kmers/peak/2), and when I combine the spectra
jellyfish merge -o 250+500.out 250.out 500.out
I get two peaks. I would have thought the Illumina runs on the same samples using different short insert libraries would have been sampling the same overall sequence, and so the unimodal spectra should combine to yield a unimodal spectrum.
Any Illumina buffs or bioinformaticists out there who can shed some light on what might be happening here?
I have attached a file with the spectra.
kmer_spectra.pdf
This is my first post on the forum, and I am new to genomic analysis so please bear with me.
I am doing a k-mer analysis on Illumina reads from 250bp, 500bp and 800bp insert libraries. K=17, using jellyfish
jellyfish count -m 17 -o a.out -C -c 7 -s 1000000000 -t 24 a.fas
The kmer spectra look normal when I run the analysis separately for each library, that is, a huge number of kmers represented only once or twice (read errors?), and a single mode, and long tail out to the right (non-orthogous kmers arising from repetitive elements?).
Here is the rub. Only the 250bp analysis yields a sensible estimate of genome size (1.8 Gb, estimated independently) (using number of Kmers/peak/2), and when I combine the spectra
jellyfish merge -o 250+500.out 250.out 500.out
I get two peaks. I would have thought the Illumina runs on the same samples using different short insert libraries would have been sampling the same overall sequence, and so the unimodal spectra should combine to yield a unimodal spectrum.
Any Illumina buffs or bioinformaticists out there who can shed some light on what might be happening here?
I have attached a file with the spectra.
kmer_spectra.pdf
Comment