Hello. I know that is an old thread but I don't find people able to answer.
I'm running Repeatscout. I built the l-mer table called myfile.freq of myfile.fa
Can anyone tell me what do they mean the second and third columns produced as output?
here I report an example:
```
AAAAAAAAGCGGGA 3 107776875
AAAAAAACTGTATG 10 83440519
AAAAAAAAGGCGTA 3 41037187
AAAAAAACTTGAAT 7 94493612
CATACATGCATGCA 1065 125671338
CATACATGCTTGAA 7 121799834
AAAAAAATCATGCA 10 95493021
AAAAAAAGTCCAGT 3 125127980
AATTCACATGTATG 7 102505668
```
Thank you
Header Leaderboard Ad
Collapse
RepeatMasker & RepeatScout
Collapse
Announcement
Collapse
No announcement yet.
X
-
hello evryone i have an error when i write the second command of RepeatScout if anyone have an idea please share
$ ./RepeatScout -sequence Ca_dromedarius_kacst.fna -output output_repeats -freq output -l 14
RepeatScout(9531,0x7fff9faf2380) malloc: *** mach_vm_map(size=18446744073479073792) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Could not allocate space for sequence
Leave a comment:
-
Originally posted by solidether View PostThe error message ""Could not allocate space for sequence" :
The reason for this error is in the RepeatScout software itself.
In the source code file "build_repeat_families.c" there are two
steps where memory allocation is done with command:
malloc( (2 * MAXLENGTH + 3 * PADLENGTH) * sizeof(char) )
This command tries to allocate proper amount of memory, based on the size of your input file. However, for some reason the allocation fails when the input file size is more than 2 GB.
I don't know enough about programming with C to say, why there is
this limit of 2 GB. Anyhow, for testing purposes I created a modified RepeatScout version (RepeatScout_fixmem) where the memory
allocation is allways 5 GB. ( malloc( 5000000000 ) )
After these modifications I was able to run the repeatscout analysis.
Done allocating headptr
Done building headptr
There are 0 l-mers
Done sorting headptr
OOPS no good lmers
Any ideas?
Leave a comment:
-
It's probably because ((2 * MAXLENGTH + 3 * PADLENGTH) * sizeof(char) ) is a signed int. I suspect casting the terms as 64-bit integers would work.
Leave a comment:
-
The error message ""Could not allocate space for sequence"
The error message ""Could not allocate space for sequence" :
The reason for this error is in the RepeatScout software itself.
In the source code file "build_repeat_families.c" there are two
steps where memory allocation is done with command:
malloc( (2 * MAXLENGTH + 3 * PADLENGTH) * sizeof(char) )
This command tries to allocate proper amount of memory, based on the size of your input file. However, for some reason the allocation fails when the input file size is more than 2 GB.
I don't know enough about programming with C to say, why there is
this limit of 2 GB. Anyhow, for testing purposes I created a modified RepeatScout version (RepeatScout_fixmem) where the memory
allocation is allways 5 GB. ( malloc( 5000000000 ) )
After these modifications I was able to run the repeatscout analysis.
Leave a comment:
-
Originally posted by solidether View PostHi guys, I still have the same problem that people in this list previously had.
I followed the suggestions above and here is my command for running the step 2 of the RepeatScout:
RepeatScout
-sequence genome.fasta
-output genome_repeat.fasta
-freq genome.freq
-l 14
I get this error : "Could not allocate space for sequence" .
I ran the test file and its running, so the installation is not a problem. Although I realized that the genome.fasta file in the test is only one concensus fasta sequence. However, my genome.fasta is an assembly containing multiple contigs but in fasta format. I should also add that I am giving a big time memory to the machine, so I doubt that its a problem.
Anybody has suggestion.
Thanks a lot, Solidether
Leave a comment:
-
Hi guys, I still have the same problem that people in this list previously had.
I followed the suggestions above and here is my command for running the step 2 of the RepeatScout:
RepeatScout
-sequence genome.fasta
-output genome_repeat.fasta
-freq genome.freq
-l 14
I get this error : "Could not allocate space for sequence" .
I ran the test file and its running, so the installation is not a problem. Although I realized that the genome.fasta file in the test is only one concensus fasta sequence. However, my genome.fasta is an assembly containing multiple contigs but in fasta format. I should also add that I am giving a big time memory to the machine, so I doubt that its a problem.
Anybody has suggestion.
Thanks a lot, Solidether
Leave a comment:
-
Originally posted by tnguyen View PostHi Rahul,
How large was your genome? How much memory was needed for your run? I received this error message at the start of Step 2:
"Could not allocate space for sequence"
sequence = (char *) malloc( (2 * MAXLENGTH + 3 * PADLENGTH) * sizeof(char) );
if( NULL == sequence ) {
fprintf(stderr, "Could not allocate space for sequence\n");
exit(1);
}
to
sequence = (char *) malloc( (2 * (size_t)MAXLENGTH + 3 * (size_t)PADLENGTH) * sizeof(char) );
if( NULL == sequence ) {
fprintf(stderr, "Could not allocate space for sequence\n");
exit(1);
}
otherwise calculation of big numbers (files more than about 1 GB) are not correct and results in much much bigger memory allocations than neccessary. I had this situation previously under FreeBSD, Linux and Solaris. That change helped me to overcome this allocation error... Actually it is running under FreeBSD :-)
Cheers, sunnyseq
Leave a comment:
-
Thank You..GenoMax
I did that and i got the result. I have one more problem
I have installed repeatmodeler. But when i am building database it is showing error
./BuildDatabase -name test test.fa
RepModelConfig.pm did not return a true value at ./BuildDatabase line 146.
BEGIN failed--compilation aborted at ./BuildDatabase line 146.
Can you tell me why the error is coming?
Leave a comment:
-
It may be a good idea to try a subset of your data (select a few large contigs and/or a known sequence with the right repeats) before you start running a large genome file through some of these tools. Depending of the size of data set the run times can increase logarithmically.
Leave a comment:
-
Hi DFJ111,
I followed according to your steps and it is worked fine but in the .tbl file i am geting this output
file name: file.fa
sequences: 336145
total length: 330872632 bp (330872632 bp excl N/X-runs)
GC level: 39.43 %
bases masked: 199587278 bp ( 60.32 %)
==================================================
number of length percentage
elements* occupied of sequence
--------------------------------------------------
SINEs: 0 0 bp 0.00 %
ALUs 0 0 bp 0.00 %
MIRs 0 0 bp 0.00 %
LINEs: 0 0 bp 0.00 %
LINE1 0 0 bp 0.00 %
LINE2 0 0 bp 0.00 %
L3/CR1 0 0 bp 0.00 %
LTR elements: 0 0 bp 0.00 %
ERVL 0 0 bp 0.00 %
ERVL-MaLRs 0 0 bp 0.00 %
ERV_classI 0 0 bp 0.00 %
ERV_classII 0 0 bp 0.00 %
DNA elements: 0 0 bp 0.00 %
hAT-Charlie 0 0 bp 0.00 %
TcMar-Tigger 0 0 bp 0.00 %
Unclassified: 866174 216405375 bp 65.40 %
Total interspersed repeats:216405375 bp 65.40 %
Small RNA: 0 0 bp 0.00 %
Satellites: 0 0 bp 0.00 %
Simple repeats: 51195 2109015 bp 0.64 %
Low complexity: 0 0 bp 0.00 %
==================================================
* most repeats fragmented by insertions or deletions
have been counted as one element
The query species was assumed to be homo
RepeatMasker version open-4.0.3 , sensitive mode
run with rmblastn version 2.2.27+
The query was compared to unclassified sequences in ".../repeats_1.fa"
RepBase Update 20130422, RM database version 20130422
can you guide me why most of the output are showing 0.
Thanks in advance...
Leave a comment:
-
Repeatmodeler error in building database
I have installed repeatmodeler. But when i am building database
./BuildDatabase -name test test.fa
it is showing error and the RepModelConfig.pm file is empty
RepModelConfig.pm did not return a true value at ./BuildDatabase line 146.
BEGIN failed--compilation aborted at ./BuildDatabase line 146.
Anyone can help me to findout the error..
Thanks..Last edited by amitbik; 01-22-2014, 11:18 PM.
Leave a comment:
-
Originally posted by sunhh View PostIt helps little to modify the threads value for rmblast. But you can do it in the .pm file (you can fing that file by grep threads in .pm files). Just wait for less than two weeks, and you will get final result.
Good luck!
Leave a comment:
-
Originally posted by Lyn Hsiong View PostHi, my repeatmoderler run very slowly too, and the input genome is 300M. Maybe the abblast, by default, also used only 1 cpu, so I assigned 10 by "-num_threads 10" like you, however, the repeatmoderler contained no this option. Could you pls tell me how to set the parameter in repeatmoderler/abblast.
Thank you very much!
lyn
Good luck!
Leave a comment:
Latest Articles
Collapse
-
Differential Expression and Data Visualization: Recommended Tools for Next-Level Sequencing Analysisby seqadmin
After covering QC and alignment tools in the first segment and variant analysis and genome assembly in the second segment, we’re wrapping up with a discussion about tools for differential gene expression analysis and data visualization. In this article, we include recommendations from the following experts: Dr. Mark Ziemann, Senior Lecturer in Biotechnology and Bioinformatics, Deakin University; Dr. Medhat Mahmoud Postdoctoral Research Fellow at Baylor College of Medicine;...-
Channel: Articles
05-23-2023, 12:26 PM -
-
by seqadmin
Continuing from our previous article, we share variant analysis and genome assembly tools recommended by our experts Dr. Medhat Mahmoud, Postdoctoral Research Fellow at Baylor College of Medicine, and Dr. Ming "Tommy" Tang, Director of Computational Biology at Immunitas and author of From Cell Line to Command Line.
Variant detection and analysis tools
Mahmoud classifies variant detection work into two main groups: short variants (<50...-
Channel: Articles
05-19-2023, 10:03 AM -
-
by seqadmin
With new tools and computational resources being released regularly, it can be hard to determine which are best suited for the analysis process and which older tools continue to be maintained. In an effort to assist the sequencing community, we interviewed three highly skilled bioinformaticians about their recommended tools for several important analysis applications.
Quality control and preprocessing tools
“Garbage in, garbage out” is a popular...-
Channel: Articles
05-16-2023, 10:11 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Exploring French-Canadian Ancestry: Insights into Migration, Settlement Patterns, and Genetic Structure
by seqadmin
Started by seqadmin, 05-26-2023, 09:22 AM
|
0 responses
8 views
0 likes
|
Last Post
by seqadmin
05-26-2023, 09:22 AM
|
||
Started by seqadmin, 05-24-2023, 09:49 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
05-24-2023, 09:49 AM
|
||
Introducing ProtVar: A Web Tool for Contextualizing and Interpreting Human Missense Variation in Proteins
by seqadmin
Started by seqadmin, 05-23-2023, 07:14 AM
|
0 responses
30 views
0 likes
|
Last Post
by seqadmin
05-23-2023, 07:14 AM
|
||
Started by seqadmin, 05-18-2023, 11:36 AM
|
0 responses
115 views
0 likes
|
Last Post
by seqadmin
05-18-2023, 11:36 AM
|
Leave a comment: