Hi,
I want to use digital normalization on a set of single cell sequencing data as well as metagenomic date from low complexity communities. I'm probably missing some really obvious point, but I just really not sure how to apply the recommended diginorm cutoffs to my relatively long Miseq-reads.
Both, our single cell sequencing and our low-complex metagenomic sequencing data, were produced on a Miseq, yielding several million paired-end reads of ~250-300 bp length each.
The general recommendations in the khmer documentation state that you should normalize to a coverage of 1x to 5x using three-pass normalization and a kmer size of 20.
My question is: are those recommendations really suited for modern "long read" illumina data? If i reduce the kmer coverage for all kmers of length 20 to 5x or less, won't that reduce the coverage for larger kmers far too extremely?
Without diginorm, the optimal kmer-size using e.g. metavelvet is mostly around k81-101 for my datasets. How can there be enough kmer-coverage left for kmers at that size for deBruiJn-graph based assemblies if already the kmers of length 20 are reduced to less than 5x coverage?
My version of khmer doesn't seem to support using kmers larger than 31 so apparently larger kmer-sizes are simply not needed for diginorm. I just do not understand why...
I want to use digital normalization on a set of single cell sequencing data as well as metagenomic date from low complexity communities. I'm probably missing some really obvious point, but I just really not sure how to apply the recommended diginorm cutoffs to my relatively long Miseq-reads.
Both, our single cell sequencing and our low-complex metagenomic sequencing data, were produced on a Miseq, yielding several million paired-end reads of ~250-300 bp length each.
The general recommendations in the khmer documentation state that you should normalize to a coverage of 1x to 5x using three-pass normalization and a kmer size of 20.
My question is: are those recommendations really suited for modern "long read" illumina data? If i reduce the kmer coverage for all kmers of length 20 to 5x or less, won't that reduce the coverage for larger kmers far too extremely?
Without diginorm, the optimal kmer-size using e.g. metavelvet is mostly around k81-101 for my datasets. How can there be enough kmer-coverage left for kmers at that size for deBruiJn-graph based assemblies if already the kmers of length 20 are reduced to less than 5x coverage?
My version of khmer doesn't seem to support using kmers larger than 31 so apparently larger kmer-sizes are simply not needed for diginorm. I just do not understand why...
Comment