Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hello from Dundee! (Scotland!)

    Hi,

    My name is Dan Bolser, and I recently started a Post Doc. in Dundee working on the potato genome sequencing project. Although this is a really exciting and far reaching project, I should say that I am very new to the field of sequencing!

    I have a degree in biochemistry, and subsequently I did a masters, PhD and Post Doc. in structural bioinformatics and interactomics. So, although I have studied 'DNA' and molecular genetics during my degree, its all a bit hazy these days ;-)

    The first 'bulk' of sequencing data that we have here in the UK / Ireland consortium (Chromosome 4) has been generated from several 'interesting' BACs using "capillary based Big Dye chemistries" (ABI 3730 DNA Analyzers). There are a few things that I am keen to learn more about. (I'll ask similar questions in the other appropriate forums, but I may as well list the main issues that I am facing as a beginner.)


    1) What is "capillary based Big Dye chemistries" (ABI 3730 DNA Analyzers)? ;-)

    OK, its not that bad, I do have the broad idea, but where can I find out more? What books should I be reading? Which websites have the best information? How does this kind of sequencing compare to NextGen sequencing in terms of speed, throughput, cost, coverage, de-novo 'assemble-ability', etc?


    2) What kinds of questions should I be asking of the sequence data? So far I just have a bunch of chromatogram files (ABI format) broken down into groups by BAC. I think I need to know (or it would be useful to know) the following basic things about the data:

    * sub-cloning (sequencing) vector sequence
    * cloning vector sequence
    * insert size
    * BAC size
    * ...

    What else should I be asking (before starting the assembly)?


    3) What kind of assembly pipelines are routinely used on this kind of data?

    Currently I am playing with phred/phrap, but perhaps this is considered old hat? Not that I want (or need) to be pushing the bleeding edge, but I would like to be doing something relatively 'standard'. For this kind of sequencing data, is phred/phrap more or less a popular choice?


    4) Once I have run (vanilla) phred/phrap, how should I be visualizing the results? I had a look at consed, but it gives me very detailed views of the contigs. I would like to be able to compare different sets of contigs in 'overview'. While I think it should be relatively easy to parse the phred/phrap output and produce some visual assembly and quality reports, I don't want to start coding something that has already been done. What are common visualization methods for sets of similar 'contigs'? i.e. if I am varying assembly stringency and want to compare the output of the assembler.


    5) What other questions should I be asking? I know its not easy to assess, but what kinds of thing do beginners tend to be ignorant of? What are the 'key texts' that I should read before asking anything else?


    Well, there are my '5 potatoes of ignorance' - I'd be delighted for any kind of feedback on any of them!

    Dan.
    Homepage: Dan Bolser
    MetaBase the database of biological databases.

  • #2
    Hi Dan, welcome to SEQanswers.

    I'll take a quick shot at answering your first question, though I should mention that this forum, in general, is for people using the Next-Gen platforms, so this might be the wrong place to be asking these questions. (I've never really done any serious capillary based sequencing or assembly.) Regardless, we're all friendly here, I believe, and I'm certain that some of the members migrated to the next-gen after years on the "Big Dye chemistry" platforms. Hopefully one of them will chime in and give you better answers than I can.

    Originally posted by dan View Post
    Hi,
    OK, its not that bad, I do have the broad idea, but where can I find out more? What books should I be reading? Which websites have the best information? How does this kind of sequencing compare to NextGen sequencing in terms of speed, throughput, cost, coverage, de-novo 'assemble-ability', etc?
    If you're asking about non-Next-Gen sequencing, you're basically referring to all of the sequencing done before the next-gen platforms arrived in 2006/2007. If you pick up any reasonable molecular biology or biotech textbook, it'll probably have a few paragraphs on it. (Look up Dideoxy or Sanger DNA sequencing for the chemistry, and capillary sequencing for the machines. I'd be surprised if you couldn't find a few hundred web pages on it - there are nearly a million on capillary sequencing.

    In terms of cost, Sanger sequencing can be an order of magnitude (or more) expensive per base, but has some very good features: It's accurate, it's targeted (using primer pairs) and is a trusted method. The key is that it's not competing with Next-Gen sequencing - they have very different applications. Now that Next-Gen is available, I don't think it's particularly cost effective to sequence a genome using Sanger sequencing, but it has been done (eg. the human genome), although pretty much everyone doing next gen work will use Sanger sequencing to verify any predictions they make.

    Generally, I would sum it up as this: Sanger sequencing is used to look at a single site of dna, (eg, a BAC) with great specificity and for reads of about 1000bp in length. Next-Gen sequencing is more of the "pick X million random locations" type (length and number of sequences depend on the technology used), which wouldn't make sense if you wanted to look at a single BAC.

    (Or course, if you have access to next-gen sequencing, you wouldn't be making a BAC library in the first place.)

    As for information, I suspect another good place to start is pubmed. Papers before 2006 will all be discussing how they did assembly with this type of sequencing information. I'm certain there are many applications out there to assist in this task. Their manuals would also be full of helpful hints.

    Hopefully that's enough to get you pointed in the right direction, though I've left most of your questions unanswered.

    Good luck
    The more you know, the more you know you don't know. —Aristotle

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Non-Coding RNA Research and Technologies
      by seqadmin


      Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

      [Article Coming Soon!]...
      Yesterday, 08:07 AM
    • seqadmin
      Recent Developments in Metagenomics
      by seqadmin





      Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
      09-23-2024, 06:35 AM
    • seqadmin
      Understanding Genetic Influence on Infectious Disease
      by seqadmin




      During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

      Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
      09-09-2024, 10:59 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 10-02-2024, 04:51 AM
    0 responses
    14 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 10-01-2024, 07:10 AM
    0 responses
    25 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 09-30-2024, 08:33 AM
    1 response
    31 views
    0 likes
    Last Post EmiTom
    by EmiTom
     
    Started by seqadmin, 09-26-2024, 12:57 PM
    0 responses
    20 views
    0 likes
    Last Post seqadmin  
    Working...
    X