Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identifying collapsed repeats and other misassemblies

    I have been using R to browse and produce graphs of the alignmentinfo file pproduced by Newbler.
    Graphs of consensus Depth have revealed some interesting features. For example Read depth has a mean of 21. However there are peaks in the graphs where read depth climbs to over 400.
    Examining the contig where this occurs reveals that these contigs are very short (always less that 300bp).
    I am curious as to how i should interpret these features. I am assuming that they are collapsed repeats.

    regards

    Brian

  • #2
    Originally posted by coldturkey View Post
    I am curious as to how i should interpret these features. I am assuming that they are collapsed repeats.
    It helps to BLAST those things quickly at the NCBI.

    Most of the time they are repeats: rRNA (in bacteria), pseudogenes, SINEs, LINEs etc.

    Sometimes it's also some weird kind of contamination ... e.g., be wary of herpes simplex virus sequence in your bacterial 454 sequencing project

    B.

    Comment


    • #3
      Thanks BaCH,

      Why HSV in particular or is it just an example?

      Comment


      • #4
        Originally posted by coldturkey View Post
        Why HSV in particular or is it just an example?
        Just an example I have seen. Other examples include sequences from bacteria that carry "gingivitis" in the name etc.pp. I suspect that in these cases, contamination occured because someone breathed either onto the culture medium, during sample preparation or whenever ... the high-throughout sequencing machines will really sequence everything.

        I've seen this in data from at least three different sources (US and Europe). I'm sure I'd find more if I really searched for it.

        Though the rates are usually pretty low and can be easilly filtered out in bacterial sequencing projects, I wonder whether the instrument vendors should update the workflow recommendations toward higher "standards" (like wearing masks when preparing the DNA) when working with eukaryotic samples (plants excepted).

        B.

        Comment


        • #5
          So given that these reads are all on small contigs (under 300bp) is it safe to exclude them from the assembly?

          Comment


          • #6
            I check with BLAST against the nucleotide collection of the NCBI. If they clearly don't belong to the organsim one is analysing, discard. If there's a remote possibility, keep (but perhaps annotate as dubious).

            Comment


            • #7
              thanks again

              Brian

              Comment


              • #8
                Originally posted by coldturkey View Post
                So given that these reads are all on small contigs (under 300bp) is it safe to exclude them from the assembly?

                Small contigs are useful for blast QC.

                But let's say even if the small contigs are not contaminations, they are less trust worthy, of poor quality.

                We do QC on small contigs, But at end, we usually discard them and use large contigs for down stream work.

                Comment


                • #9
                  By the way, I do believe in highly repetitive regions, particularly short tandem repeat regions, the newbler would either not assemble them, label them as repeat reads, or put them into tons of little small contigs.

                  But chances of contaminations in small contigs are also high. Usually small traces of other species would end up in small contigs, but not enough to form big contigs.

                  Comment


                  • #10
                    amosvalidate might help you analyze the problematic regions in your assembly.

                    Comment


                    • #11
                      Yeah I tired amos validate, but I couldn't get it to identify my mate pairs. I was told then that amosvalidate did not support 454 mate pairs at the moment and I should skim this part of the validation

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Exploring the Dynamics of the Tumor Microenvironment
                        by seqadmin




                        The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                        07-08-2024, 03:19 PM
                      • seqadmin
                        Exploring Human Diversity Through Large-Scale Omics
                        by seqadmin


                        In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                        06-25-2024, 06:43 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 07-19-2024, 07:20 AM
                      0 responses
                      27 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 07-16-2024, 05:49 AM
                      0 responses
                      41 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 07-15-2024, 06:53 AM
                      0 responses
                      46 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 07-10-2024, 07:30 AM
                      0 responses
                      43 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X