Header Leaderboard Ad

Collapse

RepeatMasker & RepeatScout

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Plz help me guys.. give me some reply...

    Comment


    • #32
      It may be a good idea to try a subset of your data (select a few large contigs and/or a known sequence with the right repeats) before you start running a large genome file through some of these tools. Depending of the size of data set the run times can increase logarithmically.

      Comment


      • #33
        Thank You.. GenoMax

        I did that and i got the result. I have one more problem
        I have installed repeatmodeler. But when i am building database it is showing error

        ./BuildDatabase -name test test.fa

        RepModelConfig.pm did not return a true value at ./BuildDatabase line 146.
        BEGIN failed--compilation aborted at ./BuildDatabase line 146.

        Can you tell me why the error is coming?

        Comment


        • #34
          Originally posted by tnguyen View Post
          Hi Rahul,

          How large was your genome? How much memory was needed for your run? I received this error message at the start of Step 2:

          "Could not allocate space for sequence"
          Please change the code in build_repeat_families.c

          sequence = (char *) malloc( (2 * MAXLENGTH + 3 * PADLENGTH) * sizeof(char) );
          if( NULL == sequence ) {
          fprintf(stderr, "Could not allocate space for sequence\n");
          exit(1);
          }

          to

          sequence = (char *) malloc( (2 * (size_t)MAXLENGTH + 3 * (size_t)PADLENGTH) * sizeof(char) );
          if( NULL == sequence ) {
          fprintf(stderr, "Could not allocate space for sequence\n");
          exit(1);
          }

          otherwise calculation of big numbers (files more than about 1 GB) are not correct and results in much much bigger memory allocations than neccessary. I had this situation previously under FreeBSD, Linux and Solaris. That change helped me to overcome this allocation error... Actually it is running under FreeBSD :-)

          Cheers, sunnyseq

          Comment


          • #35
            Hi guys, I still have the same problem that people in this list previously had.

            I followed the suggestions above and here is my command for running the step 2 of the RepeatScout:

            RepeatScout
            -sequence genome.fasta
            -output genome_repeat.fasta
            -freq genome.freq
            -l 14

            I get this error : "Could not allocate space for sequence" .

            I ran the test file and its running, so the installation is not a problem. Although I realized that the genome.fasta file in the test is only one concensus fasta sequence. However, my genome.fasta is an assembly containing multiple contigs but in fasta format. I should also add that I am giving a big time memory to the machine, so I doubt that its a problem.

            Anybody has suggestion.

            Thanks a lot, Solidether

            Comment


            • #36
              Originally posted by solidether View Post
              Hi guys, I still have the same problem that people in this list previously had.

              I followed the suggestions above and here is my command for running the step 2 of the RepeatScout:

              RepeatScout
              -sequence genome.fasta
              -output genome_repeat.fasta
              -freq genome.freq
              -l 14

              I get this error : "Could not allocate space for sequence" .

              I ran the test file and its running, so the installation is not a problem. Although I realized that the genome.fasta file in the test is only one concensus fasta sequence. However, my genome.fasta is an assembly containing multiple contigs but in fasta format. I should also add that I am giving a big time memory to the machine, so I doubt that its a problem.

              Anybody has suggestion.

              Thanks a lot, Solidether
              I have the same experience. It happens with genomes bigger than roughly 2 GB. The problem, I guess is with the allocation within RepeatScout itself. You can give it any RAM memory you want, but I think one of the variables is wrongly declared, so it cannot contain any more data. So I guess it's a bug.

              Comment


              • #37
                The error message ""Could not allocate space for sequence"

                The error message ""Could not allocate space for sequence" :
                The reason for this error is in the RepeatScout software itself.

                In the source code file "build_repeat_families.c" there are two
                steps where memory allocation is done with command:
                malloc( (2 * MAXLENGTH + 3 * PADLENGTH) * sizeof(char) )

                This command tries to allocate proper amount of memory, based on the size of your input file. However, for some reason the allocation fails when the input file size is more than 2 GB.

                I don't know enough about programming with C to say, why there is
                this limit of 2 GB. Anyhow, for testing purposes I created a modified RepeatScout version (RepeatScout_fixmem) where the memory
                allocation is allways 5 GB. ( malloc( 5000000000 ) )

                After these modifications I was able to run the repeatscout analysis.

                Comment


                • #38
                  It's probably because ((2 * MAXLENGTH + 3 * PADLENGTH) * sizeof(char) ) is a signed int. I suspect casting the terms as 64-bit integers would work.

                  Comment


                  • #39
                    Originally posted by solidether View Post
                    The error message ""Could not allocate space for sequence" :
                    The reason for this error is in the RepeatScout software itself.

                    In the source code file "build_repeat_families.c" there are two
                    steps where memory allocation is done with command:
                    malloc( (2 * MAXLENGTH + 3 * PADLENGTH) * sizeof(char) )

                    This command tries to allocate proper amount of memory, based on the size of your input file. However, for some reason the allocation fails when the input file size is more than 2 GB.

                    I don't know enough about programming with C to say, why there is
                    this limit of 2 GB. Anyhow, for testing purposes I created a modified RepeatScout version (RepeatScout_fixmem) where the memory
                    allocation is allways 5 GB. ( malloc( 5000000000 ) )

                    After these modifications I was able to run the repeatscout analysis.
                    I've changed three instances of this allocation, two in build_repeat_families.c and one in build_lmer_table. While I no longer see the allocation error, build_lmer_table finishes almost immediately, with:

                    Done allocating headptr
                    Done building headptr
                    There are 0 l-mers
                    Done sorting headptr
                    OOPS no good lmers

                    Any ideas?

                    Comment


                    • #40
                      hello evryone i have an error when i write the second command of RepeatScout if anyone have an idea please share

                      $ ./RepeatScout -sequence Ca_dromedarius_kacst.fna -output output_repeats -freq output -l 14

                      RepeatScout(9531,0x7fff9faf2380) malloc: *** mach_vm_map(size=18446744073479073792) failed (error code=3)
                      *** error: can't allocate region
                      *** set a breakpoint in malloc_error_break to debug
                      Could not allocate space for sequence

                      Comment


                      • #41
                        Hello. I know that is an old thread but I don't find people able to answer.
                        I'm running Repeatscout. I built the l-mer table called myfile.freq of myfile.fa
                        Can anyone tell me what do they mean the second and third columns produced as output?
                        here I report an example:

                        ```
                        AAAAAAAAGCGGGA 3 107776875
                        AAAAAAACTGTATG 10 83440519
                        AAAAAAAAGGCGTA 3 41037187
                        AAAAAAACTTGAAT 7 94493612
                        CATACATGCATGCA 1065 125671338
                        CATACATGCTTGAA 7 121799834
                        AAAAAAATCATGCA 10 95493021
                        AAAAAAAGTCCAGT 3 125127980
                        AATTCACATGTATG 7 102505668
                        ```
                        Thank you

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Improved Targeted Sequencing: A Comprehensive Guide to Amplicon Sequencing
                          by seqadmin



                          Amplicon sequencing is a targeted approach that allows researchers to investigate specific regions of the genome. This technique is routinely used in applications such as variant identification, clinical research, and infectious disease surveillance. The amplicon sequencing process begins by designing primers that flank the regions of interest. The DNA sequences are then amplified through PCR (typically multiplex PCR) to produce amplicons complementary to the targets. RNA targets...
                          Yesterday, 01:49 PM
                        • seqadmin
                          Targeted Sequencing: Choosing Between Hybridization Capture and Amplicon Sequencing
                          by seqadmin




                          Targeted sequencing is an effective way to sequence and analyze specific genomic regions of interest. This method enables researchers to focus their efforts on their desired targets, as opposed to other methods like whole genome sequencing that involve the sequencing of total DNA. Utilizing targeted sequencing is an attractive option for many researchers because it is often faster, more cost-effective, and only generates applicable data. While there are many approaches...
                          03-10-2023, 05:31 AM
                        • seqadmin
                          Expert Advice on Automating Your Library Preparations
                          by seqadmin



                          Using automation to prepare sequencing libraries isn’t a new concept, and most researchers are aware that there are numerous benefits to automating this process. However, many labs are still hesitant to switch to automation and often believe that it’s not suitable for their lab. To combat these concerns, we’ll cover some of the key advantages, review the most important considerations, and get real-world advice from automation experts to remove any lingering anxieties....
                          02-21-2023, 02:14 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 03-17-2023, 12:32 PM
                        0 responses
                        12 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-15-2023, 12:42 PM
                        0 responses
                        18 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-09-2023, 10:17 AM
                        0 responses
                        67 views
                        1 like
                        Last Post seqadmin  
                        Started by seqadmin, 03-03-2023, 12:03 PM
                        0 responses
                        64 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X