Hello,
We are going to sequence about 180 different strains of bacteria. These strains are all from the same genus but cover about 7 different species.
The idea is to study phylogeny within this genus, compare whole genome data with other typing/identification systems and to find genomic markers of virulence and markers for species identification.
We may want to map these genomes to a reference. But since there are no good reference genomes of these particular species, and that some strains can barely be assigned a species using current methods, and also because we're talking about a number of different species that we want to compare together, we may also want to do de novo assembly of the genomes.
Mean genome size is about 2.5 Mb. We asked for a mean coverage of 50x. The sequencing center suggested a couple of options:
a. Paired ends library, sequencing on Miseq for 2x250bp of 2x300bp reads, two sequencing runs for each group of 96 strains (or four runs of 48 strains each, I guess).
b. Paired ends libraries, sequencing on Hiseq for 2x150 reads (they say longer reads are possible but could have some quality issues), a single run for each group of 96 strains would suffice for coverage requested.
I guess that for mapping purposes it won't make a difference. But having in mind that we may want to do de novo assemblies, some questions:
1) What would be the best option? Larger reads with lower coverage or shorter reads with higher coverage? Does 2x150bp to 2x300bp make a real difference during assembly?
2) How "essential" would it be to sequence the same genomes with different library sizes? Again, considering it would be within Illumina's range possibilitites (no Pacbio, 454 etc.).
Thank you in advance for suggestions :-)
We are going to sequence about 180 different strains of bacteria. These strains are all from the same genus but cover about 7 different species.
The idea is to study phylogeny within this genus, compare whole genome data with other typing/identification systems and to find genomic markers of virulence and markers for species identification.
We may want to map these genomes to a reference. But since there are no good reference genomes of these particular species, and that some strains can barely be assigned a species using current methods, and also because we're talking about a number of different species that we want to compare together, we may also want to do de novo assembly of the genomes.
Mean genome size is about 2.5 Mb. We asked for a mean coverage of 50x. The sequencing center suggested a couple of options:
a. Paired ends library, sequencing on Miseq for 2x250bp of 2x300bp reads, two sequencing runs for each group of 96 strains (or four runs of 48 strains each, I guess).
b. Paired ends libraries, sequencing on Hiseq for 2x150 reads (they say longer reads are possible but could have some quality issues), a single run for each group of 96 strains would suffice for coverage requested.
I guess that for mapping purposes it won't make a difference. But having in mind that we may want to do de novo assemblies, some questions:
1) What would be the best option? Larger reads with lower coverage or shorter reads with higher coverage? Does 2x150bp to 2x300bp make a real difference during assembly?
2) How "essential" would it be to sequence the same genomes with different library sizes? Again, considering it would be within Illumina's range possibilitites (no Pacbio, 454 etc.).
Thank you in advance for suggestions :-)
Comment