Header Leaderboard Ad

Collapse

RepeatMasker & RepeatScout

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fripeki
    replied
    Hello. I know that is an old thread but I don't find people able to answer.
    I'm running Repeatscout. I built the l-mer table called myfile.freq of myfile.fa
    Can anyone tell me what do they mean the second and third columns produced as output?
    here I report an example:

    ```
    AAAAAAAAGCGGGA 3 107776875
    AAAAAAACTGTATG 10 83440519
    AAAAAAAAGGCGTA 3 41037187
    AAAAAAACTTGAAT 7 94493612
    CATACATGCATGCA 1065 125671338
    CATACATGCTTGAA 7 121799834
    AAAAAAATCATGCA 10 95493021
    AAAAAAAGTCCAGT 3 125127980
    AATTCACATGTATG 7 102505668
    ```
    Thank you

    Leave a comment:


  • bioinfo441
    replied
    hello evryone i have an error when i write the second command of RepeatScout if anyone have an idea please share

    $ ./RepeatScout -sequence Ca_dromedarius_kacst.fna -output output_repeats -freq output -l 14

    RepeatScout(9531,0x7fff9faf2380) malloc: *** mach_vm_map(size=18446744073479073792) failed (error code=3)
    *** error: can't allocate region
    *** set a breakpoint in malloc_error_break to debug
    Could not allocate space for sequence

    Leave a comment:


  • bryantd
    replied
    Originally posted by solidether View Post
    The error message ""Could not allocate space for sequence" :
    The reason for this error is in the RepeatScout software itself.

    In the source code file "build_repeat_families.c" there are two
    steps where memory allocation is done with command:
    malloc( (2 * MAXLENGTH + 3 * PADLENGTH) * sizeof(char) )

    This command tries to allocate proper amount of memory, based on the size of your input file. However, for some reason the allocation fails when the input file size is more than 2 GB.

    I don't know enough about programming with C to say, why there is
    this limit of 2 GB. Anyhow, for testing purposes I created a modified RepeatScout version (RepeatScout_fixmem) where the memory
    allocation is allways 5 GB. ( malloc( 5000000000 ) )

    After these modifications I was able to run the repeatscout analysis.
    I've changed three instances of this allocation, two in build_repeat_families.c and one in build_lmer_table. While I no longer see the allocation error, build_lmer_table finishes almost immediately, with:

    Done allocating headptr
    Done building headptr
    There are 0 l-mers
    Done sorting headptr
    OOPS no good lmers

    Any ideas?

    Leave a comment:


  • Brian Bushnell
    replied
    It's probably because ((2 * MAXLENGTH + 3 * PADLENGTH) * sizeof(char) ) is a signed int. I suspect casting the terms as 64-bit integers would work.

    Leave a comment:


  • solidether
    replied
    The error message ""Could not allocate space for sequence"

    The error message ""Could not allocate space for sequence" :
    The reason for this error is in the RepeatScout software itself.

    In the source code file "build_repeat_families.c" there are two
    steps where memory allocation is done with command:
    malloc( (2 * MAXLENGTH + 3 * PADLENGTH) * sizeof(char) )

    This command tries to allocate proper amount of memory, based on the size of your input file. However, for some reason the allocation fails when the input file size is more than 2 GB.

    I don't know enough about programming with C to say, why there is
    this limit of 2 GB. Anyhow, for testing purposes I created a modified RepeatScout version (RepeatScout_fixmem) where the memory
    allocation is allways 5 GB. ( malloc( 5000000000 ) )

    After these modifications I was able to run the repeatscout analysis.

    Leave a comment:


  • mke
    replied
    Originally posted by solidether View Post
    Hi guys, I still have the same problem that people in this list previously had.

    I followed the suggestions above and here is my command for running the step 2 of the RepeatScout:

    RepeatScout
    -sequence genome.fasta
    -output genome_repeat.fasta
    -freq genome.freq
    -l 14

    I get this error : "Could not allocate space for sequence" .

    I ran the test file and its running, so the installation is not a problem. Although I realized that the genome.fasta file in the test is only one concensus fasta sequence. However, my genome.fasta is an assembly containing multiple contigs but in fasta format. I should also add that I am giving a big time memory to the machine, so I doubt that its a problem.

    Anybody has suggestion.

    Thanks a lot, Solidether
    I have the same experience. It happens with genomes bigger than roughly 2 GB. The problem, I guess is with the allocation within RepeatScout itself. You can give it any RAM memory you want, but I think one of the variables is wrongly declared, so it cannot contain any more data. So I guess it's a bug.

    Leave a comment:


  • solidether
    replied
    Hi guys, I still have the same problem that people in this list previously had.

    I followed the suggestions above and here is my command for running the step 2 of the RepeatScout:

    RepeatScout
    -sequence genome.fasta
    -output genome_repeat.fasta
    -freq genome.freq
    -l 14

    I get this error : "Could not allocate space for sequence" .

    I ran the test file and its running, so the installation is not a problem. Although I realized that the genome.fasta file in the test is only one concensus fasta sequence. However, my genome.fasta is an assembly containing multiple contigs but in fasta format. I should also add that I am giving a big time memory to the machine, so I doubt that its a problem.

    Anybody has suggestion.

    Thanks a lot, Solidether

    Leave a comment:


  • sunnyseq
    replied
    Originally posted by tnguyen View Post
    Hi Rahul,

    How large was your genome? How much memory was needed for your run? I received this error message at the start of Step 2:

    "Could not allocate space for sequence"
    Please change the code in build_repeat_families.c

    sequence = (char *) malloc( (2 * MAXLENGTH + 3 * PADLENGTH) * sizeof(char) );
    if( NULL == sequence ) {
    fprintf(stderr, "Could not allocate space for sequence\n");
    exit(1);
    }

    to

    sequence = (char *) malloc( (2 * (size_t)MAXLENGTH + 3 * (size_t)PADLENGTH) * sizeof(char) );
    if( NULL == sequence ) {
    fprintf(stderr, "Could not allocate space for sequence\n");
    exit(1);
    }

    otherwise calculation of big numbers (files more than about 1 GB) are not correct and results in much much bigger memory allocations than neccessary. I had this situation previously under FreeBSD, Linux and Solaris. That change helped me to overcome this allocation error... Actually it is running under FreeBSD :-)

    Cheers, sunnyseq

    Leave a comment:


  • amitbik
    replied
    Thank You.. GenoMax

    I did that and i got the result. I have one more problem
    I have installed repeatmodeler. But when i am building database it is showing error

    ./BuildDatabase -name test test.fa

    RepModelConfig.pm did not return a true value at ./BuildDatabase line 146.
    BEGIN failed--compilation aborted at ./BuildDatabase line 146.

    Can you tell me why the error is coming?

    Leave a comment:


  • GenoMax
    replied
    It may be a good idea to try a subset of your data (select a few large contigs and/or a known sequence with the right repeats) before you start running a large genome file through some of these tools. Depending of the size of data set the run times can increase logarithmically.

    Leave a comment:


  • amitbik
    replied
    Plz help me guys.. give me some reply...

    Leave a comment:


  • amitbik
    replied
    Hi DFJ111,

    I followed according to your steps and it is worked fine but in the .tbl file i am geting this output

    file name: file.fa
    sequences: 336145
    total length: 330872632 bp (330872632 bp excl N/X-runs)
    GC level: 39.43 %
    bases masked: 199587278 bp ( 60.32 %)
    ==================================================
    number of length percentage
    elements* occupied of sequence
    --------------------------------------------------
    SINEs: 0 0 bp 0.00 %
    ALUs 0 0 bp 0.00 %
    MIRs 0 0 bp 0.00 %

    LINEs: 0 0 bp 0.00 %
    LINE1 0 0 bp 0.00 %
    LINE2 0 0 bp 0.00 %
    L3/CR1 0 0 bp 0.00 %

    LTR elements: 0 0 bp 0.00 %
    ERVL 0 0 bp 0.00 %
    ERVL-MaLRs 0 0 bp 0.00 %
    ERV_classI 0 0 bp 0.00 %
    ERV_classII 0 0 bp 0.00 %

    DNA elements: 0 0 bp 0.00 %
    hAT-Charlie 0 0 bp 0.00 %
    TcMar-Tigger 0 0 bp 0.00 %

    Unclassified: 866174 216405375 bp 65.40 %

    Total interspersed repeats:216405375 bp 65.40 %


    Small RNA: 0 0 bp 0.00 %

    Satellites: 0 0 bp 0.00 %
    Simple repeats: 51195 2109015 bp 0.64 %
    Low complexity: 0 0 bp 0.00 %
    ==================================================

    * most repeats fragmented by insertions or deletions
    have been counted as one element


    The query species was assumed to be homo
    RepeatMasker version open-4.0.3 , sensitive mode

    run with rmblastn version 2.2.27+
    The query was compared to unclassified sequences in ".../repeats_1.fa"
    RepBase Update 20130422, RM database version 20130422

    can you guide me why most of the output are showing 0.

    Thanks in advance...

    Leave a comment:


  • amitbik
    replied
    Repeatmodeler error in building database

    I have installed repeatmodeler. But when i am building database

    ./BuildDatabase -name test test.fa

    it is showing error and the RepModelConfig.pm file is empty

    RepModelConfig.pm did not return a true value at ./BuildDatabase line 146.
    BEGIN failed--compilation aborted at ./BuildDatabase line 146.

    Anyone can help me to findout the error..

    Thanks..
    Last edited by amitbik; 01-22-2014, 11:18 PM.

    Leave a comment:


  • Lyn Hsiong
    replied
    Originally posted by sunhh View Post
    It helps little to modify the threads value for rmblast. But you can do it in the .pm file (you can fing that file by grep threads in .pm files). Just wait for less than two weeks, and you will get final result.
    Good luck!
    thank you very much! but i don't know how to deal with the .pm file (i suppose you meant the file "RepModelConfig.pm"). the file only contains Pre-installed programs' paths (perl, recon, repeatmasker and so on), so where can i set the threads value? and could you pls tell me what the "grep threads" exactly mean? thank you!

    Leave a comment:


  • sunhh
    replied
    Originally posted by Lyn Hsiong View Post
    Hi, my repeatmoderler run very slowly too, and the input genome is 300M. Maybe the abblast, by default, also used only 1 cpu, so I assigned 10 by "-num_threads 10" like you, however, the repeatmoderler contained no this option. Could you pls tell me how to set the parameter in repeatmoderler/abblast.
    Thank you very much!
    lyn
    It helps little to modify the threads value for rmblast. But you can do it in the .pm file (you can fing that file by grep threads in .pm files). Just wait for less than two weeks, and you will get final result.
    Good luck!

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
    by seqadmin


    ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

    01-24-2023, 01:19 PM
  • seqadmin
    Introduction to Single-Cell Sequencing
    by seqadmin
    Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

    The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
    ...
    01-09-2023, 03:10 PM
  • seqadmin
    AVITI from Element Biosciences: Latest Sequencing Technologies—Part 6
    by seqadmin
    Element Biosciences made its sequencing market debut this year when it released AVITI, its first sequencer. The AVITI System uses avidity sequencing, a novel sequencing chemistry that delivers higher quality data, decreases cycle times, and requires lower reagent concentrations. This new instrument reportedly features lower operating and start-up costs while maintaining quality sequencing.

    Read type and length
    AVITI is a short-read benchtop sequencer that also offers an innovative...
    12-29-2022, 10:44 AM

ad_right_rmr

Collapse
Working...
X