Hi,
My name is Dan Bolser, and I recently started a Post Doc. in Dundee working on the potato genome sequencing project. Although this is a really exciting and far reaching project, I should say that I am very new to the field of sequencing!
I have a degree in biochemistry, and subsequently I did a masters, PhD and Post Doc. in structural bioinformatics and interactomics. So, although I have studied 'DNA' and molecular genetics during my degree, its all a bit hazy these days ;-)
The first 'bulk' of sequencing data that we have here in the UK / Ireland consortium (Chromosome 4) has been generated from several 'interesting' BACs using "capillary based Big Dye chemistries" (ABI 3730 DNA Analyzers). There are a few things that I am keen to learn more about. (I'll ask similar questions in the other appropriate forums, but I may as well list the main issues that I am facing as a beginner.)
1) What is "capillary based Big Dye chemistries" (ABI 3730 DNA Analyzers)? ;-)
OK, its not that bad, I do have the broad idea, but where can I find out more? What books should I be reading? Which websites have the best information? How does this kind of sequencing compare to NextGen sequencing in terms of speed, throughput, cost, coverage, de-novo 'assemble-ability', etc?
2) What kinds of questions should I be asking of the sequence data? So far I just have a bunch of chromatogram files (ABI format) broken down into groups by BAC. I think I need to know (or it would be useful to know) the following basic things about the data:
* sub-cloning (sequencing) vector sequence
* cloning vector sequence
* insert size
* BAC size
* ...
What else should I be asking (before starting the assembly)?
3) What kind of assembly pipelines are routinely used on this kind of data?
Currently I am playing with phred/phrap, but perhaps this is considered old hat? Not that I want (or need) to be pushing the bleeding edge, but I would like to be doing something relatively 'standard'. For this kind of sequencing data, is phred/phrap more or less a popular choice?
4) Once I have run (vanilla) phred/phrap, how should I be visualizing the results? I had a look at consed, but it gives me very detailed views of the contigs. I would like to be able to compare different sets of contigs in 'overview'. While I think it should be relatively easy to parse the phred/phrap output and produce some visual assembly and quality reports, I don't want to start coding something that has already been done. What are common visualization methods for sets of similar 'contigs'? i.e. if I am varying assembly stringency and want to compare the output of the assembler.
5) What other questions should I be asking? I know its not easy to assess, but what kinds of thing do beginners tend to be ignorant of? What are the 'key texts' that I should read before asking anything else?
Well, there are my '5 potatoes of ignorance' - I'd be delighted for any kind of feedback on any of them!
Dan.
My name is Dan Bolser, and I recently started a Post Doc. in Dundee working on the potato genome sequencing project. Although this is a really exciting and far reaching project, I should say that I am very new to the field of sequencing!
I have a degree in biochemistry, and subsequently I did a masters, PhD and Post Doc. in structural bioinformatics and interactomics. So, although I have studied 'DNA' and molecular genetics during my degree, its all a bit hazy these days ;-)
The first 'bulk' of sequencing data that we have here in the UK / Ireland consortium (Chromosome 4) has been generated from several 'interesting' BACs using "capillary based Big Dye chemistries" (ABI 3730 DNA Analyzers). There are a few things that I am keen to learn more about. (I'll ask similar questions in the other appropriate forums, but I may as well list the main issues that I am facing as a beginner.)
1) What is "capillary based Big Dye chemistries" (ABI 3730 DNA Analyzers)? ;-)
OK, its not that bad, I do have the broad idea, but where can I find out more? What books should I be reading? Which websites have the best information? How does this kind of sequencing compare to NextGen sequencing in terms of speed, throughput, cost, coverage, de-novo 'assemble-ability', etc?
2) What kinds of questions should I be asking of the sequence data? So far I just have a bunch of chromatogram files (ABI format) broken down into groups by BAC. I think I need to know (or it would be useful to know) the following basic things about the data:
* sub-cloning (sequencing) vector sequence
* cloning vector sequence
* insert size
* BAC size
* ...
What else should I be asking (before starting the assembly)?
3) What kind of assembly pipelines are routinely used on this kind of data?
Currently I am playing with phred/phrap, but perhaps this is considered old hat? Not that I want (or need) to be pushing the bleeding edge, but I would like to be doing something relatively 'standard'. For this kind of sequencing data, is phred/phrap more or less a popular choice?
4) Once I have run (vanilla) phred/phrap, how should I be visualizing the results? I had a look at consed, but it gives me very detailed views of the contigs. I would like to be able to compare different sets of contigs in 'overview'. While I think it should be relatively easy to parse the phred/phrap output and produce some visual assembly and quality reports, I don't want to start coding something that has already been done. What are common visualization methods for sets of similar 'contigs'? i.e. if I am varying assembly stringency and want to compare the output of the assembler.
5) What other questions should I be asking? I know its not easy to assess, but what kinds of thing do beginners tend to be ignorant of? What are the 'key texts' that I should read before asking anything else?
Well, there are my '5 potatoes of ignorance' - I'd be delighted for any kind of feedback on any of them!
Dan.
Comment