Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to construct a combined library for repeatmasker

    Thanks for your attention.
    I am constructing a repeat library for a genome sized ~970 Mb.
    Firstly I used repeatmodeler to generate a de novo repeat consensus library (libA.fas).
    At the same time, I used ltr_struc and ltr_finder to generate a LTR sequences library (libB.fas).
    Then I cat libA.fas, libB.fas, RepBase library and another library from MIPS to one file (LIB.fas).
    But I get a wired result.
    When I used LIB.fas as a input for "-lib" option of repeatmasker, I got 24.45 % region masked in the genome.
    While when I used libA.fas (output of repeatmodeler) as a input library, I got 47.78 % region masked.

    Can anyone tell me why I used a smaller library to get a larger repeat region masked?
    There are some parameters different between two runs, but I can not decide which one could cause this large difference.

    Thanks a lot!

    My command for repeatmasker is :
    for libA.fas:
    RepeatMasker -pa 10 genome.fa -no_is -nolow -norna -lib libA.fas

    For LIB.fas:
    RepeatMasker -lib database/LIB.fas -xsmall -no_is -nolow -pa 10 -frag 4000000 -a -gff genome.fa >Rmask_genome.out

  • #2
    Well, now the reason is found.
    I run another two test runs, the only difference of which is the parameter "-frag".
    The run without "-frag 4000000" assigned gave 45.60 % repeat region close to the expected.

    So in the future I will use "-frag" options carefully!

    ps: I did not check the script for that effection, though in the help document I cannot find a reason as the "-frag " is explained as Max limit, "Maximum sequence length masked without fragmenting".

    Comment


    • #3
      But there is still a question, why it does not matter when I set "-frag 4000000" with a library as small as 940 KB?
      I might check it in the future.

      Comment


      • #4
        Hi sunhh

        I have some problem in repeatmodeler and ltr_finder. Can you guide me how you construct library in repeatmodeler , ltr_struct and ltr_finder. From last 3 days ltr_finder is runnig but file size is not increasing. Plz guide me...

        Thanks...

        Comment


        • #5
          Originally posted by amitbik View Post
          Hi sunhh

          I have some problem in repeatmodeler and ltr_finder. Can you guide me how you construct library in repeatmodeler , ltr_struct and ltr_finder. From last 3 days ltr_finder is runnig but file size is not increasing. Plz guide me...

          Thanks...
          Hi amitbik,

          Could you show what problems you met? I simply followed the instruction of repeatmodeler and ltr_finder, and they works.
          I didn't use ltr_struct.

          Well, there is a small problem in repeatmodeler, where you need to correct the path for RECON in some file. And after I change -num_threads paramter of blastn from 4 to 30, the time used decreased to half.
          I cannot access my computing server now, maybe I can post more details later.

          Comment


          • #6
            Thank you.. sunhh for your reply..

            Actually I have installed repeatmodeler. But when i am building database it is showing error

            ./BuildDatabase -name test test.fa

            RepModelConfig.pm did not return a true value at ./BuildDatabase line 146.
            BEGIN failed--compilation aborted at ./BuildDatabase line 146.

            And one more thing RepModelConfig.pm file is empty.

            In ltr_finder i am giving this command and i am getting output like this

            ltr_finder -p 30 -w -C file.fa > ltr.fa

            output-

            Predict protein Domains 0.000 second
            >Sequence: Contig2 Len:9055
            No LTR Retrotransposons Found


            Do i have give my assembly file directly in repeatmodeler and ltr_finder or have to process some filteration?
            Last edited by amitbik; 02-05-2014, 10:21 PM.

            Comment


            • #7
              Originally posted by amitbik View Post
              Thank you.. sunhh for your reply..

              Actually I have installed repeatmodeler. But when i am building database it is showing error

              ./BuildDatabase -name test test.fa

              RepModelConfig.pm did not return a true value at ./BuildDatabase line 146.
              BEGIN failed--compilation aborted at ./BuildDatabase line 146.

              And one more thing RepModelConfig.pm file is empty.

              Do i have give my assembly file directly in repeatmodeler and ltr_finder or have to process some filteration?
              Hi,
              For building database, I think you might need to add "-engine ncbi" to the command, if your aligning engine is blast as me.

              And the error "line 146" should be the same problem of RepModelConfig.pm.
              That file should not be empty. I advise you to re-download the package and install it again.

              Comment


              • #8
                Originally posted by amitbik View Post
                Thank you.. sunhh for your reply..

                Actually I have installed repeatmodeler. But when i am building database it is showing error

                ./BuildDatabase -name test test.fa

                RepModelConfig.pm did not return a true value at ./BuildDatabase line 146.
                BEGIN failed--compilation aborted at ./BuildDatabase line 146.

                And one more thing RepModelConfig.pm file is empty.

                In ltr_finder i am giving this command and i am getting output like this

                ltr_finder -p 30 -w -C file.fa > ltr.fa

                output-

                Predict protein Domains 0.000 second
                >Sequence: Contig2 Len:9055
                No LTR Retrotransposons Found


                Do i have give my assembly file directly in repeatmodeler and ltr_finder or have to process some filteration?
                And for ltr_finder, I used a command like this:
                ltr_finder -w 0 -s ref_tRNAs.fa -a /path/to/ps_scan in_genome.fa 1>in_genome.fa.ltrF 2>in_genome.fa.ltrF.err

                It looks different from yours, especially "-w 0" parameter. I am not sure what "-C" means.

                Best

                Comment


                • #9
                  Originally posted by sunhh View Post
                  Hi,
                  For building database, I think you might need to add "-engine ncbi" to the command, if your aligning engine is blast as me.

                  And the error "line 146" should be the same problem of RepModelConfig.pm.
                  That file should not be empty. I advise you to re-download the package and install it again.
                  Before configure Repeatmodeler the RepModelConfig.pm file was not empty after i configure the Repeatemodeler and database the RepModelConfig.pm file became empty. When i start building the data base it is showing error.

                  Comment


                  • #10
                    Originally posted by amitbik View Post
                    Before configure Repeatmodeler the RepModelConfig.pm file was not empty after i configure the Repeatemodeler and database the RepModelConfig.pm file became empty. When i start building the data base it is showing error.
                    Please redo the configuration of Repeatmodeler. And record everything this time.

                    Comment


                    • #11
                      Originally posted by sunhh View Post
                      And for ltr_finder, I used a command like this:
                      ltr_finder -w 0 -s ref_tRNAs.fa -a /path/to/ps_scan in_genome.fa 1>in_genome.fa.ltrF 2>in_genome.fa.ltrF.err

                      It looks different from yours, especially "-w 0" parameter. I am not sure what "-C" means.

                      Best
                      By mistake i didn't put 0 in my command and "-C" is for delete highly repeat regions.
                      Can tell me you have given 3 files in_genome.fa, in_genome.fa.ltrF and in_genome.fa.ltrF.err
                      what are these files?

                      Comment


                      • #12
                        Originally posted by amitbik View Post
                        By mistake i didn't put 0 in my command and "-C" is for delete highly repeat regions.
                        Can tell me you have given 3 files in_genome.fa, in_genome.fa.ltrF and in_genome.fa.ltrF.err
                        what are these files?
                        Only in_genome.fa is an input file, and the rest are output files.

                        Comment


                        • #13
                          Thanks sunhh... for your help

                          My Repeatmodeler is working now. I can build data base now. This time i run Repeatmodeler from a different path and i change the path of Recon, Repeatscout...etc and it is working now.....

                          Comment


                          • #14
                            Hi sunhh,

                            I have some problem in ltr_finder i am using this command

                            ltr_finder -w 0 -s trna.fa -a ./ps_scan/ uni.fa > uni_ltr.txt

                            it run arround 16 hours and the two file uni.fa.ltrf and uni.fa.ltrf.err is empty. It also showed an error cannot find resonable bandwith: continue anyway.

                            Can you tell me why this error came and the two files are empty?

                            Thank you...

                            Comment


                            • #15
                              Can any one help me to find out the error.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Best Practices for Single-Cell Sequencing Analysis
                                by seqadmin



                                While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                                06-06-2024, 07:15 AM
                              • seqadmin
                                Latest Developments in Precision Medicine
                                by seqadmin



                                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                                Somatic Genomics
                                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                                05-24-2024, 01:16 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Today, 07:23 AM
                              0 responses
                              1 view
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 06-17-2024, 06:54 AM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 06-14-2024, 07:24 AM
                              0 responses
                              24 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 06-13-2024, 08:58 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X