Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How many gaps???

    Hi,

    while doing mapping assembly, how many gaps are acceptable before finishing the sequence? if there are a lot of gaps in the assembly output, and few gaps are too long even 20,000 to 40,000 bps, what should we do???

    Thanks

  • #2
    Originally posted by saima View Post
    while doing mapping assembly, how many gaps are acceptable before finishing the sequence? if there are a lot of gaps in the assembly output, and few gaps are too long even 20,000 to 40,000 bps, what should we do???
    It depends on the size and complexity of your genome. See http://www.ncbi.nlm.nih.gov/pubmed/20064230

    If you are working on a human or related genome, this is a good guide: http://www.genome.gov/10001812

    Comment


    • #3
      Hi Torst,
      thanks for your reply, but my genome is a bacterial genome, streptococcus agalactie. can you guide me fot it.

      thanks

      Comment


      • #4
        Originally posted by saima View Post
        thanks for your reply, but my genome is a bacterial genome, streptococcus agalactie. can you guide me fot it. thanks
        A genome is never finished until it is closed to to its constituent chromosomes and plasmids. It all depends on what purpose you have for it? There are already three S.ag genomes closed genomes in Genbank (2603, A909, NEM316). If you are only interested in SNPs, there is no need to close/finish the genome. If large scale structure and repeat distribution is important, then you'll need to do more Sanger sequencing / PCRs to disambiguate.

        Comment


        • #5
          Originally posted by saima View Post
          if there are a lot of gaps in the assembly output, and few gaps are too long even 20,000 to 40,000 bps, what should we do???
          By any chance was this done with 454 sequencing? The Newbler assembler tends to put in VERY LONG runs of "N"s where it thinks contigs are joined, but in practice there isn't really a 40,000 N gap.

          Comment


          • #6
            Originally posted by saima View Post
            Hi,

            while doing mapping assembly, how many gaps are acceptable before finishing the sequence? if there are a lot of gaps in the assembly output, and few gaps are too long even 20,000 to 40,000 bps, what should we do???

            Thanks
            It's been observed for many species of bacteria that there exists a "pan-genome", which includes the entirety of the genomic content for a the species, but that any single strain will contain only a subset of this. There is always a core genome, which every strain will share. Other sections of the genome may be present or absent. These optional parts of the genome may provide optional phenotypes, e.g. virulence factors, alternate metabolic potential, etc.

            If the strain you are sequencing is different than the strain you are using for a reference sequence it is quite possible that these large gaps represent sequence elements that are simply absent from your strain. Among the three completed genomes of S. agalactie the difference is genome size between the largest and smallest is > 80,000 bp. Is there a large number of reads which you could not map to the reference genome? If so these may represent sequence elements present in your strain which are absent in the reference. I there is a large pile of unmapped reads try performing a de novo assembly of these.

            Comment


            • #7
              If the strain you are sequencing is different than the strain you are using for a reference sequence it is quite possible that these large gaps represent sequence elements that are simply absent from your strain. Among the three completed genomes of S. agalactie the difference is genome size between the largest and smallest is > 80,000 bp. Is there a large number of reads which you could not map to the reference genome? If so these may represent sequence elements present in your strain which are absent in the reference. I there is a large pile of unmapped reads try performing a de novo assembly of these.
              I have to apologise, I mis-read the original post which stated 'mapping assembly' rather than 'de novo assembly'.

              I agree with everything kmcarr has said. 80kbp difference between strains is not unusual - in fact it is what makes it a "different strain" (could argue on taxonomy for hours of course). A mapping assembly needs to treated carefully - if you allowed reads to map to multiple places, and allowed partial reads to map (local alignment) etc, some inferences could be faulty.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Advances in Sequencing Analysis Tools
                by seqadmin


                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                05-06-2024, 07:48 AM
              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 05-14-2024, 07:03 AM
              0 responses
              17 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-10-2024, 06:35 AM
              0 responses
              40 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-09-2024, 02:46 PM
              0 responses
              50 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-07-2024, 06:57 AM
              0 responses
              41 views
              0 likes
              Last Post seqadmin  
              Working...
              X