As the starting point of a new project, I want to characterize the sequences of, perhaps, a dozen genes in a species of fish. While I could make a BAC library and try to fish out those genes from it, I’m thinking that I could obtain the genome sequence with an Illumina run and fish out the relevant gene fragments by alignment to orthologs in characterized fish genomes. Based on the information in the “Field guide to next-generation DNA sequencers” paper (http://www.ncbi.nlm.nih.gov/pubmed/21592312), I have made the following calculations:
While not known, the likely size of the fish genome is ~ 1x10^9. A single lane (cell) on a GAIIx utilizing 150+150 paired reads should produce 1.3x10^10 bases of sequence thus providing me with a ~13x coverage. Similarly, on a HiSeq utilizing 100+100 paired reads ~1.2x10^10 bases of sequence should be produced per lane for ~ 12x coverage. Alternately, if HiSeq version 3 is available, a lane should yield 3.6x10^10 bases of sequence for ~36x coverage.
So, my questions for those familiar with this technology are:
1) Are these numbers realistic in terms of the output I can expect or would I likely see lower sequence yields?
2) Will the indicated levels of coverage provide a high enough likelihood that I will be able to assemble each of the genes of interest (and hopefully immediately adjacent genes as well)?
3) The paper cited above indicates a cost of about $3,000 to $3,500 at an academic core facility. My institution does not have such a core facility so I would have to utilize a commercial provider. Any idea of what the likely cost of my proposed sequencing would be? Any recommendations of facility I could use?
4) I assume I would only provide the facility with some amount of genomic DNA and the facility would shear and prep the DNA samples. I usually use the Qiagen DNeasy tissue kit for genomic DNA isolation. Is this acceptable for Illumina sequencing or is there a recommended purification kit/process?
5) Finally. Any recommendations for free software that would allow me to do the targeted alignments and assembly?
Thanks in advance for any words of wisdom.
Leos
While not known, the likely size of the fish genome is ~ 1x10^9. A single lane (cell) on a GAIIx utilizing 150+150 paired reads should produce 1.3x10^10 bases of sequence thus providing me with a ~13x coverage. Similarly, on a HiSeq utilizing 100+100 paired reads ~1.2x10^10 bases of sequence should be produced per lane for ~ 12x coverage. Alternately, if HiSeq version 3 is available, a lane should yield 3.6x10^10 bases of sequence for ~36x coverage.
So, my questions for those familiar with this technology are:
1) Are these numbers realistic in terms of the output I can expect or would I likely see lower sequence yields?
2) Will the indicated levels of coverage provide a high enough likelihood that I will be able to assemble each of the genes of interest (and hopefully immediately adjacent genes as well)?
3) The paper cited above indicates a cost of about $3,000 to $3,500 at an academic core facility. My institution does not have such a core facility so I would have to utilize a commercial provider. Any idea of what the likely cost of my proposed sequencing would be? Any recommendations of facility I could use?
4) I assume I would only provide the facility with some amount of genomic DNA and the facility would shear and prep the DNA samples. I usually use the Qiagen DNeasy tissue kit for genomic DNA isolation. Is this acceptable for Illumina sequencing or is there a recommended purification kit/process?
5) Finally. Any recommendations for free software that would allow me to do the targeted alignments and assembly?
Thanks in advance for any words of wisdom.
Leos
Comment