Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • saima
    Member
    • Feb 2010
    • 15

    How many gaps???

    Hi,

    while doing mapping assembly, how many gaps are acceptable before finishing the sequence? if there are a lot of gaps in the assembly output, and few gaps are too long even 20,000 to 40,000 bps, what should we do???

    Thanks
  • Torst
    Senior Member
    • Apr 2008
    • 275

    #2
    Originally posted by saima View Post
    while doing mapping assembly, how many gaps are acceptable before finishing the sequence? if there are a lot of gaps in the assembly output, and few gaps are too long even 20,000 to 40,000 bps, what should we do???
    It depends on the size and complexity of your genome. See http://www.ncbi.nlm.nih.gov/pubmed/20064230

    If you are working on a human or related genome, this is a good guide: http://www.genome.gov/10001812

    Comment

    • saima
      Member
      • Feb 2010
      • 15

      #3
      Hi Torst,
      thanks for your reply, but my genome is a bacterial genome, streptococcus agalactie. can you guide me fot it.

      thanks

      Comment

      • Torst
        Senior Member
        • Apr 2008
        • 275

        #4
        Originally posted by saima View Post
        thanks for your reply, but my genome is a bacterial genome, streptococcus agalactie. can you guide me fot it. thanks
        A genome is never finished until it is closed to to its constituent chromosomes and plasmids. It all depends on what purpose you have for it? There are already three S.ag genomes closed genomes in Genbank (2603, A909, NEM316). If you are only interested in SNPs, there is no need to close/finish the genome. If large scale structure and repeat distribution is important, then you'll need to do more Sanger sequencing / PCRs to disambiguate.

        Comment

        • Torst
          Senior Member
          • Apr 2008
          • 275

          #5
          Originally posted by saima View Post
          if there are a lot of gaps in the assembly output, and few gaps are too long even 20,000 to 40,000 bps, what should we do???
          By any chance was this done with 454 sequencing? The Newbler assembler tends to put in VERY LONG runs of "N"s where it thinks contigs are joined, but in practice there isn't really a 40,000 N gap.

          Comment

          • kmcarr
            Senior Member
            • May 2008
            • 1181

            #6
            Originally posted by saima View Post
            Hi,

            while doing mapping assembly, how many gaps are acceptable before finishing the sequence? if there are a lot of gaps in the assembly output, and few gaps are too long even 20,000 to 40,000 bps, what should we do???

            Thanks
            It's been observed for many species of bacteria that there exists a "pan-genome", which includes the entirety of the genomic content for a the species, but that any single strain will contain only a subset of this. There is always a core genome, which every strain will share. Other sections of the genome may be present or absent. These optional parts of the genome may provide optional phenotypes, e.g. virulence factors, alternate metabolic potential, etc.

            If the strain you are sequencing is different than the strain you are using for a reference sequence it is quite possible that these large gaps represent sequence elements that are simply absent from your strain. Among the three completed genomes of S. agalactie the difference is genome size between the largest and smallest is > 80,000 bp. Is there a large number of reads which you could not map to the reference genome? If so these may represent sequence elements present in your strain which are absent in the reference. I there is a large pile of unmapped reads try performing a de novo assembly of these.

            Comment

            • Torst
              Senior Member
              • Apr 2008
              • 275

              #7
              If the strain you are sequencing is different than the strain you are using for a reference sequence it is quite possible that these large gaps represent sequence elements that are simply absent from your strain. Among the three completed genomes of S. agalactie the difference is genome size between the largest and smallest is > 80,000 bp. Is there a large number of reads which you could not map to the reference genome? If so these may represent sequence elements present in your strain which are absent in the reference. I there is a large pile of unmapped reads try performing a de novo assembly of these.
              I have to apologise, I mis-read the original post which stated 'mapping assembly' rather than 'de novo assembly'.

              I agree with everything kmcarr has said. 80kbp difference between strains is not unusual - in fact it is what makes it a "different strain" (could argue on taxonomy for hours of course). A mapping assembly needs to treated carefully - if you allowed reads to map to multiple places, and allowed partial reads to map (local alignment) etc, some inferences could be faulty.

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM
              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Yesterday, 11:10 AM
              0 responses
              7 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              42 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              102 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              125 views
              0 reactions
              Last Post SEQadmin2  
              Working...