Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging transposable element libraries

    Hi!
    I am trying to annotate the transposable elements of a fly genome. But I am particularly interested in doing a comprehensive work.

    I am using programs like RepeatModeler and REPET to find and classify transposable elements. I am using more than one program to maximize the sequences recovered.

    My problem is I don’t know how to merge the outputs. First I did it with RepeatMasker, using one of the libraries as a genome and another as a library and vice versa. And then going through the outputs trying to find the most complete and better classified copy. But it is a very long process and I am sure there are better options.

    Now, I am thinking of letting RepeatModeler build the consensus, just concatenate all the libraries and use them as a genome. The drawback is I lose the option to keep the better classification.

    Did you have a similar problem? Or do you think the programs are not that different to deserve the effort of using more than one?

    Thank you for your help

    Nuria

  • #2
    Hi Nuria - don't pipelines such as REPET already do this?

    Comment


    • #3
      Hi Mike,

      I wasn't sure I could use my own libraries (the ones I built manually and ones the resulting from other pipelines) with REPET. I am still working with RepeatModeler and REPCLASS. But I will try as soon as I install REPET.

      Thank you for your comment

      Comment


      • #4
        Hi!
        After some time I've managed to install REPET and Torque (using this: http://willworkforscience.blogspot.c...llelizing.html) ).
        But I ran TEdenovo with REPET example (DmelChr4) and I got an error. My problem is that I don't know if there is something wrong with REPET or with Torque.

        Any help is really appreciated!


        This is the output, it seems like the job stopped but I don't know why it couldn't finish.

        START TEdenovo.py (2013-04-23 11:10:03)
        version 2.0
        project name = DmelChr4
        project directory = /home/nuria/work/DmelChr4_TEdenovo
        beginning of step 1
        submitting job(s) with groupid 'DmelChr4_TEdenovo_prepareBatches' (2013-04-23 11:10:03)
        waiting for 1 job(s) with groupid 'DmelChr4_TEdenovo_prepareBatches' (2013-04-23 11:10:03)
        Traceback (most recent call last):
        File "/home/nuria/REPET/REPET_linux-x64_2.0//bin/TEdenovo.py", line 2527, in <module>
        main()
        File "/home/nuria/REPET/REPET_linux-x64_2.0//bin/TEdenovo.py", line 278, in main
        iPrepareBatches.run()
        File "/home/nuria/REPET/REPET_linux-x64_2.0//bin/TEdenovo.py", line 366, in run
        self._launchCommands(sectionName)
        File "/home/nuria/REPET/REPET_linux-x64_2.0//bin/TEdenovo.py", line 458, in _launchCommands
        iLauncher.runLauncherForMultipleJobs(acronym, lCmdsTuples, cleanMustBeDone)
        File "/home/nuria/REPET/REPET_linux-x64_2.0/pyRepetUnit/commons/launcher/Launcher.py", line 162, in runLauncherForMultipleJobs
        self.endRun(nodesMustBeCleaned)
        File "/home/nuria/REPET/REPET_linux-x64_2.0/pyRepetUnit/commons/launcher/Launcher.py", line 96, in endRun
        self.jobdb.waitJobGroup(self.job.groupid)
        File "/home/nuria/REPET/REPET_linux-x64_2.0/pyRepetUnit/commons/sql/TableJobAdaptator.py", line 189, in waitJobGroup
        nbFinishedJobs = self.getCountStatus(groupid, "finished")
        File "/home/nuria/REPET/REPET_linux-x64_2.0/pyRepetUnit/commons/sql/TableJobAdaptator.py", line 138, in getCountStatus
        self._iDb.execute(qry)
        File "/home/nuria/REPET/REPET_linux-x64_2.0/pyRepetUnit/commons/sql/DbMySql.py", line 170, in execute
        self.cursor.execute(qry)
        File "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 174, in execute
        self.errorhandler(self, exc, value)
        File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
        raise errorclass, errorvalue
        _mysql_exceptions.OperationalError: (1317, 'Query execution was interrupted')



        Also, I'm running this on a VirtualMachine with SO: Ubuntu Server 12.04.2 LTS, hhd: 10Gb, 14 cores and ram: 20gb.

        Thank you


        Nuria

        Comment


        • #5
          This is an error with torque. I got similar and was fixed by restarting the PBS scheduler as it had stopped.

          Comment


          • #6
            Originally posted by HeyIamNuria View Post
            Hi!
            I am trying to annotate the transposable elements of a fly genome. But I am particularly interested in doing a comprehensive work.

            I am using programs like RepeatModeler and REPET to find and classify transposable elements. I am using more than one program to maximize the sequences recovered.
            If I understand correctly, you are working with Drosophila. I would not use REPET in this case. The species you are investigating has a library of annotated repeats because there have been may TE studies in this system. In fact, I believe REPET was developed by people that work on Drosophilia, so that data should be available. Just download sequences and use something like RepeatMasker. Let me know if you are working with an unassembled genome (I could make other suggestions), my assumption is you have an assembly.

            Comment


            • #7
              Hi Nuria,

              I am also starting up a project with annotating transposable elements but in a fungus. Do you have any experience in how many and what programs are enough if you want to do a more comprehensive annotation of the TEs?

              Kajsa

              Comment


              • #8
                Originally posted by kajsa View Post
                Hi Nuria,

                I am also starting up a project with annotating transposable elements but in a fungus. Do you have any experience in how many and what programs are enough if you want to do a more comprehensive annotation of the TEs?

                Kajsa
                What kind of data do you have? The correct approach depends on whether or not you have a genome assembly.

                Comment


                • #9
                  I have one 30Mb fungal genome assembled by JGI and another four closely related 30 Mb genomes that we will assemble ourselves.

                  Comment


                  • #10
                    Do you know if the best method to find TEs is directly from unassembled reads or from an already assembled genome?

                    Thank you in advance!

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin




                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                      04-22-2024, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 10:49 AM
                    0 responses
                    9 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-25-2024, 11:49 AM
                    0 responses
                    21 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-24-2024, 08:47 AM
                    0 responses
                    20 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    62 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X