Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RepeatModeler

    I,

    I want to use repeatModeler:
    I have created my database witout error, but when I launch the RepeatModeler script, I have this error in the output file :


    nohup: ignoring input
    RepeatModeler Version open-1.0.5
    ================================
    Search Engine = ncbi
    Database = PM ..
    - Sequences = 12940
    - Bases = 93195127
    Using temporary directory = /home/chris/ReapeatModeler/PM/RM_15527.TueOct161418482012


    RepeatModeler Round # 1
    ========================
    Searching for Repeats
    -- Sampling from the database...
    BeGINNING...
    - Gathering up to 40000000 bp

    RepeatModeler::sampleFromDB() Could not obtain sequence ncbi ( entry = 1-0-5, start = 1 end = 46138 ) from the database!





    Have you an idea?
    Thanks by advance

    Chris

  • #2
    I got the same problem, cannot figure out......

    Comment


    • #3
      RepeatModeler

      Any luck with figuring out your problem? I'm similarly lost.

      Comment


      • #4
        Originally posted by chrisbioinfo View Post
        I,

        I want to use repeatModeler:
        I have created my database witout error, but when I launch the RepeatModeler script, I have this error in the output file :


        nohup: ignoring input
        RepeatModeler Version open-1.0.5
        ================================
        Search Engine = ncbi
        Database = PM ..
        - Sequences = 12940
        - Bases = 93195127
        Using temporary directory = /home/chris/ReapeatModeler/PM/RM_15527.TueOct161418482012


        RepeatModeler Round # 1
        ========================
        Searching for Repeats
        -- Sampling from the database...
        BeGINNING...
        - Gathering up to 40000000 bp

        RepeatModeler::sampleFromDB() Could not obtain sequence ncbi ( entry = 1-0-5, start = 1 end = 46138 ) from the database!





        Have you an idea?
        Thanks by advance

        Chris
        Hi, I think you could try to change the engine by "-engine abblast". I don't know why, but it works for me when I have a similar problem.
        lyn

        Comment


        • #5
          I also got the same problem using NCBI rpsblast engine.
          After several fixation below, RepeatModeler started to run, though I don't know whether there are some problems or not, and there remains the possiblity that my fasta input might have been incorrect. At least, rondomely selected genomic DNA sequences were generated for statistic calculation of repetition.
          Anyway, the main problem was calling of "blastdbcmd" from the RepeatModeler perl script.

          (The below Line numbers might be inaccurate because I modified the file.)

          Line 281: Modification
          `$RepModelConfig::NCBIDBCMD_PRGM -db $genomeDB -entry all -outfmt "%g %l"`
          ( "%t %l" -> "%g %l" )
          #In my environment, the outfmt %t outputted nothing. So, I used %g instead.

          Line 1779: Modification
          my $openCoordStart = $start
          ( $start - 1 -> $start )
          #In my database, $start often outputted zero (0) though blastdbcmd program doesn't accept zero as input in -range option. So, I deleted "- 1" in the script.

          Line 1780: Insertion
          $seq = `$RepModelConfig::NCBIDBCMD_PRGM -db $dbFile -entry $seqID -range openCoordStart-$openCoordEnd`;
          #It seems that the program does not accept input without regitering our rmsblast database with gi| tags. So, I ignored " if ( $seqID =~ /gi\|(\d+)/ ) { ..." sentence and inserted another input line.

          Line 1783: Modification
          `$RepModelConfig::NCBIDBCMD_PRGM -db $dbFile -entry $seqID -range $openCoordStart-$openCoordEnd`;
          ( -range $openCoordEnd-$openCoordStart -> -range $openCoordStart-$openCoordEnd )
          #The correct input format of coordinate values for "-range" option of blastdbcmd is "Start"-"End". However, the order was reverse in the script.

          Comment


          • #6
            I think you might be my new hero Tando, thanks!

            I will point out that my version of the script 1.0.5 is slightly different..
            For me these changes got the program to work:

            line 281: change ( %t --> %g )

            line 1775: remove -1; ($start - 1 --> $start)

            line 1776: remove the If condition
            it seems that when i use BuildDatabase the seqID takes the form: gi|1:3333 (as opposed to gi|1 ).. I just removed the statement.. so my $seqID is a full gi|1:333 and not just a number. IF this becomes a problem then I should just redefine $seqID

            line 1778: my script was
            $seq = `$RepModelConfig::NCBIDBCMD_PRGM -db $dbFile -entry $1 -range $openCoordStart-$openCoordEnd`;
            I changed ( -entry $1 --> -entry $seqID ).
            $1 is defined as the seqID earlier in the script but that value doesn't get passed to the subdomain for sampleFromDB() . rather it uses some other definition of $1, and it ended up using "1.0.5" (the script version number) as the entry number. my perl skills are pretty weak and I couldn't determine what exactly was happening here, but your version makes more sense and seems to work.

            Thanks again, I for one, appreciate it!

            Comment


            • #7
              $1, $2, $3 ... are the special variables that receive the 1st, 2nd and 3rd ... matches of regular expression, respectively.

              The script is assuming that $1 receives sequence IDs when conducting RegExp match at Line:1780 ( if ( $seqID =~ /gi\|(\d+)/...).

              However, without any match in this line (without the gi| tag), $1 (and $seq in the successive if sentence) are not renewed, and unfortunately, there remains the previously matched characters of the script version, "1-0-5" in $1.

              This causes aborting at the next "die if ($seq eq "") ..." lines and output "1-0-5" message.

              Comment


              • #8
                Hi guys,

                I got the same problem with RepeatModeler_1.0.5

                I changed the script like you proposed, except the Lines 1776 and 1778:
                if ( $seqID =~ /gi\|(\d+)/ ) {
                $seq =
                `$RepModelConfig::NCBIDBCMD_PRGM -db $dbFile -entry $1 -range $openCoordStart-$openCoordEnd`;
                }


                Tando, you said, that you ignored the 1st line and inserted another one.

                How does these lines have to look like exactly then?

                I would be very grateful for some help.
                Thanks in advance!

                Comment


                • #9
                  Hi guys!
                  I am trying to install RrepeatModeler, but when I give it RepeatMasker path it returns:
                  “RepeatMasker is too old. Must be open-4.0.0 or later. Install a newer version of RepeatMasker and re-run configure.”

                  So I re-installed the latest version of RepeatMasker (Latest Released Version: 1/10/2013: RepeatMasker-open-4-0-0.tar.gz) and tried again with RepeatModeler, but it keeps saying the same, even if it is the version it is asking for.

                  It may be because of the name of the file. My file doesn’t have the version number (open-4.0.0), when I unpacked it changes to RepeatMasker only. But it may not be this.

                  Any ideas?
                  Thanks in advance

                  Nuria

                  Comment


                  • #10
                    Originally posted by HeyIamNuria View Post
                    Hi guys!
                    I am trying to install RrepeatModeler, but when I give it RepeatMasker path it returns:
                    “RepeatMasker is too old. Must be open-4.0.0 or later. Install a newer version of RepeatMasker and re-run configure.”

                    So I re-installed the latest version of RepeatMasker (Latest Released Version: 1/10/2013: RepeatMasker-open-4-0-0.tar.gz) and tried again with RepeatModeler, but it keeps saying the same, even if it is the version it is asking for.

                    It may be because of the name of the file. My file doesn’t have the version number (open-4.0.0), when I unpacked it changes to RepeatMasker only. But it may not be this.

                    Any ideas?
                    Thanks in advance

                    Nuria
                    Hi Nuria,

                    modify in the configure script line 214
                    '$version <= 400' should be '$version < 400'

                    Stephane

                    Comment


                    • #11
                      Thank you

                      Thank you very much for your help Stephane

                      I changed it and it worked!!

                      Nuria

                      Comment


                      • #12
                        RepeatScout fails in RepeatModeler

                        Hello all,

                        I am able to successfully run RepeatModeler (1-0-7) and it returns several hundred repeat models in my genome. However, all of these models are a result of RECON; nothing is returned by RepeatScout. RepeatScout is called during RepeatModeler round 1 but at the end it says "NOTE: RepeatScout did not return any models." RepeatScout is not called again by RepeatModeler. However, when I run RepeatScout directly on my genome it returns several hundred repeat models.

                        Has anybody successfully gotten RepeatScout to return repeat models within RepeatModeler? I don't understand why this would happen since RepeatScout works when I run it outside of RepeatModeler.

                        Any ideas?

                        Thanks,
                        Ben

                        Comment


                        • #13
                          Hi Everyone,
                          I've exactly the same problem as Ben described above, no models returned by 'RepeatScout' with 'RepeatModeler' run, however, many repeat models with independent 'RepeatScout' run.
                          Also is there any option to make 'RepeatModeler' run faster (e.g. parallel processing like that of RepeatMasker ?

                          Cheers.

                          Comment


                          • #14
                            Hi Ben,

                            RepeatScout is a great program for finding highly conserved repetitive elements. As a consequence we run RepeatScout first ( and only one round ) in order to find and remove the young elements first before moving on to RECON. RepeatScout will often will find tandem repeats and low complexity sequences in its return set. These are filtered out in RepeatModeler. You may want to check your hand-run result set isn't completely simple/low complexity by running nseg/trf on it. Another consideration is your choice of lmer size for RepeatScout. To fairly compare the results from both programs you need to use the same lmer size and the same sample ( from the input ) sequence. I rarely check seqanswers so please feel free to contact us through our website if you have further questions ( www.repeatmasker.org ).

                            -R

                            Comment


                            • #15
                              Thank you for your input Robert. My problem turned out to be with RepeatScout, not RepeatModeler. Line 26 and 27 of the RepeatScout script "filter-stage-1.prl" are:

                              my $TRF_COMMAND = $ENV{'TRF_COMMAND'} || "trf";
                              my $NSEG_COMMAND = $ENV{'NSEG_COMMAND'} || "nseg";

                              I changed this to:

                              my $TRF_COMMAND = "trf";
                              my $NSEG_COMMAND = "nseg";

                              Note that both "trf" and "nseg" are executables in my path.

                              I don't know perl so I don't fully understand what is going on, but I think that RepeatScout was failing to find tandem repeat finder (TRF) and, without anything back from TRF, it determined that everything was a tandem repeat and filtered it all out. However, this must have something to do with calling TRF from within RepeatModeler, as RepeatScout returned models for me when I used it independently, so something funny appears to be happening with paths. Regardless, the RepeatModeler pipeline is now fully functional for me and recovers repeat models from RepeatScout as well as RECON.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X