Hi,
I am asking for thoughts and suggestions on a re-sequencing project examining pooled samples of individual mosquitoes which are highly susceptible to malaria vs resistant.
Here is my general experimental design:
Infect 500 genetically diverse mosquitoes with malaria (These mosquitoes are from a colony that was founded from 100 wild-caught mated females and have been kept in the lab at reasonable population size for 5 years).
Take the phenotypic extremes from these 500 individuals:
40 with very high infections,
20 that have very low infections, and
20 that didn't get infected at all.
(Exclude the 420 mosquitoes that fall in the middle).
Carry out DNA extractions on all 80 individual mosquitoes.
PicoGreen estimates of DNA quantity so equimolar amounts can be pooled.
That is where I am now. My goal is to identify alleles (SNPs) which have very different frequencies between the phenotypes (e.g., Highly susceptible vs Resistant)
Here is what I am struggling with.
I have limited resources so can only afford 1 lane (possibly 2 if it makes a huge difference) of HiSeq PE100.
My gut instinct is that the best approach would be to divide my DNA preps so that I make 3 separate library pools per phenotype using DIFFERENT individuals resulting in 9 pools (each phenotype with 3 (sort of) biological replicates) as follows:
HiPoolA=13 individuals
HiPoolB=13 individuals
HiPoolC=14 individuals
LowPoolA=6 individuals
LowPoolB=7 individuals
LowPoolC=7 individuals
ResistantPoolA=6 individuals
ResistantPoolB=7 individuals
ResistantPoolC=7 individuals
Each of the 9 pools could then be sequenced to ~8x on a single lane (mosquito genome is ~220Mb).
Alternatively, I could forget this "biological replicate" idea, and pool all individuals by phenotype and sequence each of the 3 pools to ~25x.
HiPool=40 individuals
LowPool=20 individuals
ResistantPool=20 individuals
Given the small sample sizes that I have, the low coverage I can achieve per phenotype, and the resampling issues (sequencing the same individual repeatedly just by chance, leading to skewed allele frequencies), which design is better? Or is there another option that I am not considering (sequencing each individual 1x is not feasible because of the high cost of indexing).
I am grateful for any thoughts/suggestions.
Thank you!
Mara
I am asking for thoughts and suggestions on a re-sequencing project examining pooled samples of individual mosquitoes which are highly susceptible to malaria vs resistant.
Here is my general experimental design:
Infect 500 genetically diverse mosquitoes with malaria (These mosquitoes are from a colony that was founded from 100 wild-caught mated females and have been kept in the lab at reasonable population size for 5 years).
Take the phenotypic extremes from these 500 individuals:
40 with very high infections,
20 that have very low infections, and
20 that didn't get infected at all.
(Exclude the 420 mosquitoes that fall in the middle).
Carry out DNA extractions on all 80 individual mosquitoes.
PicoGreen estimates of DNA quantity so equimolar amounts can be pooled.
That is where I am now. My goal is to identify alleles (SNPs) which have very different frequencies between the phenotypes (e.g., Highly susceptible vs Resistant)
Here is what I am struggling with.
I have limited resources so can only afford 1 lane (possibly 2 if it makes a huge difference) of HiSeq PE100.
My gut instinct is that the best approach would be to divide my DNA preps so that I make 3 separate library pools per phenotype using DIFFERENT individuals resulting in 9 pools (each phenotype with 3 (sort of) biological replicates) as follows:
HiPoolA=13 individuals
HiPoolB=13 individuals
HiPoolC=14 individuals
LowPoolA=6 individuals
LowPoolB=7 individuals
LowPoolC=7 individuals
ResistantPoolA=6 individuals
ResistantPoolB=7 individuals
ResistantPoolC=7 individuals
Each of the 9 pools could then be sequenced to ~8x on a single lane (mosquito genome is ~220Mb).
Alternatively, I could forget this "biological replicate" idea, and pool all individuals by phenotype and sequence each of the 3 pools to ~25x.
HiPool=40 individuals
LowPool=20 individuals
ResistantPool=20 individuals
Given the small sample sizes that I have, the low coverage I can achieve per phenotype, and the resampling issues (sequencing the same individual repeatedly just by chance, leading to skewed allele frequencies), which design is better? Or is there another option that I am not considering (sequencing each individual 1x is not feasible because of the high cost of indexing).
I am grateful for any thoughts/suggestions.
Thank you!
Mara