Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Genome assembly - 2 similar samples - one good, one bad

    Hi, currently doing genome assemblies on 2 very similar samples.

    - 1 assembled brilliantly very quickly.
    - 1 is highly fragmented

    Any reason why one sample should behave so differently from another? - same sequence (Illumina HiSeq), same heterozygosity and repeat content, both screened for contaminants, both collected together, FastQC very similar for both, same assembly methodology, adapter trimmed.

    Thoughts:
    - adapters in the middle of reads?
    - could a virus have inserted itself?

    Any comments welcomed.

  • #2
    Hmm.. interesting.

    Some thoughts:
    1. Was one more inbred than the other perhaps?
    2. Did you check the insert sizes of the libraries? I'm thinking perhaps the mate pair library for the poor assembly resulting one wasn't as good as the other one.
    3. Also, I have seen adapters in the middle of reads. You can quickly check for this if you know the adapter sequence.
    4. I'm thinking if there was a virus, the virus sequence's kmer coverage would've been high enough for the assembler (de-bruijn graph based ones) to screen it out.

    Comment


    • #3
      Thanks Smurali.

      These are field samples, not inbred. Everything is pointing towards a virus being integrated into the chromosome. Will do some more work on this and if I find out anything useful, will add another post.

      Comment


      • #4
        Originally posted by Elsie View Post
        Thoughts:
        - adapters in the middle of reads?
        - could a virus have inserted itself?

        Any comments welcomed.
        Even if there were adapters in the middle of reads, they still would have been trimmed. And I don't see why a virus would cause a poor assembly, unless it randomly inserted itself into a different place in every cell. If it inserted itself once, then the cell replicated, you'd still get a good assembly.

        It sounds more like cancer to me (depending on the organism), or degraded DNA. Have you looked at the insert size distribution and actual error rates of mapped reads (as opposed to just the quality scores)? Also, what is the read length, target insert size, and specific Illumina platform (e.g. HS2500) and run mode, and what kind of organism is it? Diploid or haploid? ...etc.
        Last edited by Brian Bushnell; 09-21-2015, 08:19 AM.

        Comment


        • #5
          Thanks for the comments Brian.
          100bp PE, belong to the Hymenoptera order. Current evidence is pointing towards Polydnaviruses.

          Comment


          • #6
            Wow. this is certainly interesting.
            I can only think of something external that somehow passed through your contamination screening and made it to the sequencing so the virus looks highly possible here.
            When we sequenced and assembled a bunch of arthropods before, the final assembly sometimes did have a lot of contamination from Homo sapiens (on blood feeders), plants and viruses, so this is expected.
            However, I am still intrigued by the fact that it is causing the assembly to be so highly fragmented. Are you going to try and assemble after removing the reads belonging to the virus, Elsie?

            Comment

            Latest Articles

            Collapse

            • seqadmin
              The Impact of AI in Genomic Medicine
              by seqadmin



              Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
              02-26-2024, 02:07 PM
            • seqadmin
              Multiomics Techniques Advancing Disease Research
              by seqadmin


              New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

              A major leap in the field has
              ...
              02-08-2024, 06:33 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 02-28-2024, 06:12 AM
            0 responses
            28 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 02-23-2024, 04:11 PM
            0 responses
            74 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 02-21-2024, 08:52 AM
            0 responses
            82 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 02-20-2024, 08:57 AM
            0 responses
            69 views
            0 likes
            Last Post seqadmin  
            Working...
            X