Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • HESmith
    Senior Member
    • Oct 2009
    • 512

    de novo assembly of repeat elements

    As part of a de novo assembly project, I'd like to try to identify repeat elements - everything from single gene duplications (difficult) to transposons (less difficult). The data are Illumina PE-101 reads, ~50X coverage. My (admittedly unsophisticated) approach is to assemble contigs (I'll try both de Bruijn and overlap assemblers), then flag those with >2X average read depth.

    Two questions:
    1) are there any tools designed for this application?
    2) any suggestions for alternative strategies (e.g., candidate identification by sequence conservation, branch counting of de Bruijn graphs, etc.)?

    Thanks,
    Harold
  • jimmybee
    Senior Member
    • Sep 2010
    • 119

    #2
    Whats your species?

    Comment

    • Hobbe
      Member
      • Apr 2010
      • 29

      #3
      Repeatmasker both identifies and masks repeat-elements. Some assembly programs, like Mira, also mark repeat regions with tags. In Mira you can check the sequences identified as repeats in the projectname_info_readrepeats.lst file.

      Comment

      • HESmith
        Senior Member
        • Oct 2009
        • 512

        #4
        Originally posted by jimmybee View Post
        Whats your species?
        Nematodes for now, but there are likely to be others in the future.

        Harold

        Comment

        • HESmith
          Senior Member
          • Oct 2009
          • 512

          #5
          Thanks for the recommendations, Hobbe. I'll look into them.

          Harold

          Comment

          • saemi
            Junior Member
            • Oct 2010
            • 5

            #6
            Hi Harold

            How is your project regarding the de novo assembly of transposons going? I'm interested in doing a similar project to compare transposons among closely related plant species using illumina sequencing. What programs are you using?

            Cheers,Saemi

            Comment

            • pmiguel
              Senior Member
              • Aug 2008
              • 2328

              #7
              For plants you will see lots of LTR retrotransposons. These might assemble with the LTRs in the middle -- since the LTRs are, well long (0.2-5 kb, for the most part) and also, repeats that flank the internal domains of these ubiquitous transposable elements.

              There was even an program designed by Jeremy DeBarry when he was at UGA in Bennetzen lab, that took advantage of this to pull LTR retros from full genome assemblies and reconstruct their LTRs in the correct positions. He called it the "AAARF" algorithm. (Get it? UGA Bulldogs, aaarf?)

              Ah, here it is. Also a publication. Looks like it is also usable for other sorts of elements as well. There you go: code from a maize lab -- you know maize, the organism where transposable elements were discovered? Worth a look.

              --
              Phillip

              Comment

              • HESmith
                Senior Member
                • Oct 2009
                • 512

                #8
                Hi Saemi,

                I'm still waiting to obtain the sequence data for this project, so I don't have any results to report. I'll keep you posted regarding my progress.

                Harold

                Comment

                • saemi
                  Junior Member
                  • Oct 2010
                  • 5

                  #9
                  Hi

                  @Harold, OK great, I'm looking forward to hear from you.

                  @Phillip, Great thank you very much for the information and the paper. I'll will take a close look at it. One of the things I'm concerned about is the fact that I plan to use an Illumina Hi-Seq in my project, on species which don't have a reference genome. Most of the available methods for looking at transposons in a shotgun library I've seen, work on 454 sequences. I guess one way to go is to do a de novo assembly first on the data but them I'm afraid to loose information from my dataset.

                  Thank you guys
                  Saemi

                  Comment

                  • Claudia34
                    Junior Member
                    • Sep 2010
                    • 9

                    #10
                    Hi,

                    We have the project to identify genome-wide transposition events in flies after know-down of a protein of interest. We think sequencing and de novo assembly are the best way to do it. Do you agree ?
                    I don't know if Illumina is the best technology for these kinds of analysis because of the sequencing length. Do you have any recommendation?

                    Thanks,
                    Claudia

                    Comment

                    • HESmith
                      Senior Member
                      • Oct 2009
                      • 512

                      #11
                      Hi Claudia,

                      If you already have a genome assembly for your species and you know the sequences of your transposons, you can use the strategy described here. Briefly, use paired-end sequencing and map the different ends to genomic and transposon sequences to identify insertion sites. Let me know if you want more details.

                      Harold

                      Comment

                      • Claudia34
                        Junior Member
                        • Sep 2010
                        • 9

                        #12
                        Hi Harold,

                        Thanks a lot for your answer. I think we can use this strategy because we are working on drosophila melanogaster. I will carefully read this paper.

                        Thanks again,
                        Claudia

                        Comment

                        • adaptivegenome
                          Super Moderator
                          • Nov 2009
                          • 436

                          #13
                          Originally posted by Claudia34 View Post
                          Hi Harold,

                          Thanks a lot for your answer. I think we can use this strategy because we are working on drosophila melanogaster. I will carefully read this paper.

                          Thanks again,
                          Claudia
                          What are you suppressing in the flies, Hsp90?

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Pathogen Surveillance with Advanced Genomic Tools
                            by seqadmin




                            The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                            03-24-2025, 11:48 AM
                          • seqadmin
                            New Genomics Tools and Methods Shared at AGBT 2025
                            by seqadmin


                            This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                            The Headliner
                            The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                            03-03-2025, 01:39 PM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 03-20-2025, 05:03 AM
                          0 responses
                          42 views
                          0 reactions
                          Last Post seqadmin  
                          Started by seqadmin, 03-19-2025, 07:27 AM
                          0 responses
                          52 views
                          0 reactions
                          Last Post seqadmin  
                          Started by seqadmin, 03-18-2025, 12:50 PM
                          0 responses
                          38 views
                          0 reactions
                          Last Post seqadmin  
                          Started by seqadmin, 03-03-2025, 01:15 PM
                          0 responses
                          194 views
                          0 reactions
                          Last Post seqadmin  
                          Working...