Header Leaderboard Ad


RepeatMasker & RepeatScout



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • RepeatMasker & RepeatScout

    Hello there,

    I was wondering whether anybody on this list could knows how to run RepeatScout (1.0.5) and RepeatMasker (3.2.8).

    Basically I have a new genome, and want to use RepeatScout to make a
    library for RepeatMasker.

    Here is what I do:

    build_lmer_table -sequence genome.fa -freq genome.fq
    RepeatScout -sequence genome.fa -output repeats.fa -freq genome.fq
    filter-stage-1.prl repeats.fa &> repeats.fa.filter_1
    RepeatMasker genome.fa -e abblast -lib repeats.fa.filter_1

    Do I use the correct file for -lib?
    RepeatMasker is still complaining about not finding Libraries/RepeatMasker.lib
    and Libraries/RepeatmaskerLib.embl.

    Thanks a lot in advance for any help.

  • #2
    Here is a recipe how to install and run RepeatScout:


    Hope it helps,



    • #3
      I actually followed those instructions.

      RepeatMasker is complaining still about missing libraries (ie Libraries/RepeatMasker.lib etc) and advises to get something from www.girinst.org.

      The whole point of running RepeatScout for me is to build my own library. Is there a flag to teach RepeatMasker not to look for those libraries or is there a reason RepeatMasker must have those libraries?


      • #4
        Hi Zimbobo,

        Can I ask you how you edit the perl script, filter-stage-1.prl to allow it point to the TRF path that we install?
        Which line of filter-stage-1.prl that we need to edit the path of TRF?
        My server keep on shown the below message:
        "No such file or directory at ./filter-stage-1.prl line 110"
        Thanks a lot for your sharing and guiding.


        • #5

          hello everyone
          i m try to work with repeatscout but every time when i m runninf filter-stage-1.prl, the filtered library generated is created empty( no data)..... any solution???


          • #6
            same thing happened to me -- the filtered output file is empty after running for a very long time. it was run on a repeat-rich genome.

            could it be that i don't have nseg and TRF properly installed? there is no output about those two programs that i can see...


            • #7
              I don't know if its still an actual problem, but I had it too and was able to solve it on my system (ubuntu11, 64bit).
              The libs RepeatMasker is looking for are not the downloaded ones, but the blast dbs that should have been created by rmblast. rmblast itself is looking for a libpcre.so.0 file which it could not find on my system. The file is known to cause problems with some progs as symlinks are not made correctly during updates.
              Therefore I just created symlinks manually in my /lib/ and /lib32/ folder to the actual file (so just type "sudo ln -s /lib/libpcre.so.3 /lib/libpcre.so.0" and "sudo ln -s /lib32/libpcre.so.3 /lib32/libpcre.so.0") and afterwards everything worked fine for me

              @edge: you don't need to change anything in the .prl file, but you need to rename the trf404-linux64 (or else) executable to simply to trf.
              Last edited by WhatsOEver; 04-20-2012, 01:27 AM.


              • #8

                This is not a direct answer to your question, but there is a tool from the Repeat Masker group.
                Its called Repeat Modeler, this tool integrates Repeat Scout, RECON and TRF.
                It creates a de-novo repeat library and then annotates the sequences.
                Repeat Modeler



                • #9
                  Thats true and it works fine, but RepeatModeler also uses RepeatMasker and eventually the rmblast package, so you might have to face the same problems as described before.


                  • #10
                    Dear All,
                    I ran Repeatscout successfully, Commands I used:
                    1225 ##RepeatSout Run
                    1226 #step1
                    1227 build_lmer_table -l 14 -sequence Final_assembly.fasta -freq Final_assembly.freq
                    1228 #step2
                    1229 RepeatScout -sequence Final_assembly.fasta -output Final_assembly_repeats.fasta -freq Final_assembly.freq -l 14
                    1230 #step3
                    1231 cat Final_assemblyf_repeats.fasta | filter-stage-1.prl > Final_assembly_repeats_filtered_stg1.fasta
                    1232 #step4
                    1233 RepeatMasker -pa 20 -s -lib Final_assembly_repeats_filtered_stg1.fasta Final_assembly.fasta &
                    1234 #step5
                    1235 cat Final_assembly_repeats_filtered_stg1.fasta | filter-stage-2.prl --cat=Final_assembly.fasta.out --thresh=3 > Final_assembly_repeats_filtered_stg2_thresh3.fasta
                    1236 #step6
                    1237 RepeatMasker -pa 20 -s -lib Final_assembly_repeats_filtered_stg2_thresh3.fasta Final_assembly.fasta &
                    Rahul Sharma,
                    Frankfurt am Main, Germany


                    • #11
                      Hi Rahul,

                      How large was your genome? How much memory was needed for your run? I received this error message at the start of Step 2:

                      "Could not allocate space for sequence"
                      Last edited by tnguyen; 09-22-2012, 07:01 AM.


                      • #12
                        Sorry the full error message was:

                        "Could not allocate space for sequence"
                        Last edited by tnguyen; 09-22-2012, 07:02 AM.


                        • #13
                          Hi tnguyen,
                          sorry for replying late. Genome was of ~20Mb and other one was in Gb's. Actually I ran on the cluster and I did'nt check the memory it used.
                          Best wishes,
                          Rahul Sharma,
                          Frankfurt am Main, Germany


                          • #14
                            Thank you Rahul,
                            My genome size is ~1.7Gb, any idea how to make RepeatScout to work for large genome?


                            • #15
                              You probably don't need to use the whole genome for RepeatScout. Just use a few chromosomes or supercontigs. If repeats are distributed across all the chromosomes in the genome, scanning just a few of them with RepeatScout should be enough to find then and create consensus sequences that you can input to RepeatMasker. Then, mask the whole genome with RepeatMasker.


                              Latest Articles


                              • seqadmin
                                A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
                                by seqadmin

                                ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

                                01-24-2023, 01:19 PM
                              • seqadmin
                                Introduction to Single-Cell Sequencing
                                by seqadmin
                                Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

                                The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
                                01-09-2023, 03:10 PM
                              • seqadmin
                                AVITI from Element Biosciences: Latest Sequencing Technologies—Part 6
                                by seqadmin
                                Element Biosciences made its sequencing market debut this year when it released AVITI, its first sequencer. The AVITI System uses avidity sequencing, a novel sequencing chemistry that delivers higher quality data, decreases cycle times, and requires lower reagent concentrations. This new instrument reportedly features lower operating and start-up costs while maintaining quality sequencing.

                                Read type and length
                                AVITI is a short-read benchtop sequencer that also offers an innovative...
                                12-29-2022, 10:44 AM