Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
I think you are running out of memory (@Brian will confirm). How much RAM do you have (looks like you are asking for ~18G)?
-
Hi Brian,
I have a some error with BBsplit.sh (BBmap35.66). I get the following error.,
Module slurm/15.08.1 loaded
java -Djava.library.path=/home/software/BBMap/bbmap/jni/ -ea -Xmx18891m -cp /home/software/BBMap/bbmap/current/ align2.BBSplitter ow=t fastareadlen=500 minhits=1 minratio=0.56 maxindel=20 qtrim=rl untrim=t trimq=6 in=/home/Whole_genome_analysis/HiseQ/HiseQ_data/PST/Trimmed_Seq/GGAGC_trimmed1.fastq ref=/home/Whole_genome_analysis/Reference/NCPPB382/Cmm382_genome.fasta,/home/Whole_genome_analysis/Reference/NCPPB382/Xcitri_jx6.fasta basename=PST_407_%.fq outu=PST_407_unmapped.fq
Executing align2.BBSplitter [ow=t, fastareadlen=500, minhits=1, minratio=0.56, maxindel=20, qtrim=rl, untrim=t, trimq=6, in=/home/Whole_genome_analysis/HiseQ/HiseQ_data/PST/Trimmed_Seq/GGAGC_trimmed1.fastq, ref=/home/Whole_genome_analysis/Reference/NCPPB382/Cmm382_genome.fasta,/home/Whole_genome_analysis/Reference/NCPPB382/Xcitri_jx6.fasta, basename=PST_407_%.fq, outu=PST_407_unmapped.fq]
Converted arguments to [ow=t, fastareadlen=500, minhits=1, minratio=0.56, maxindel=20, qtrim=rl, untrim=t, trimq=6, in=/home/Whole_genome_analysis/HiseQ/HiseQ_data/PST/Trimmed_Seq/GGAGC_trimmed1.fastq, basename=PST_407_%.fq, outu=PST_407_unmapped.fq, ref_Cmm382_genome=/home/Whole_genome_analysis/Reference/NCPPB382/Cmm382_genome.fasta, ref_Xcitri_jx6=/home/Whole_genome_analysis/Reference/NCPPB382/Xcitri_jx6.fasta]
Creating merged reference file ref/genome/1/merged_ref_1588668246460521.fa.gz
Ref merge time: 1.421 seconds.
Executing align2.BBMap [ow=t, fastareadlen=500, minhits=1, minratio=0.56, maxindel=20, qtrim=rl, untrim=t, trimq=6, in=/home/Whole_genome_analysis/HiseQ/HiseQ_data/PST/Trimmed_Seq/GGAGC_trimmed1.fastq, outu=PST_407_unmapped.fq, ref=ref/genome/1/merged_ref_1588668246460521.fa.gz, out_Cmm382_genome=PST_407_Cmm382_genome.fq, out_Xcitri_jx6=PST_407_Xcitri_jx6.fq]
BBMap version 35.66
Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.560
Retaining first best site only for ambiguous mappings.
NOTE: Deleting contents of ref/genome/1 because reference is specified and overwrite=true
Writing reference.
Executing dna.FastaToChromArrays2 [ref/genome/1/merged_ref_1588668246460521.fa.gz, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=1, midpad=300, startpad=8000, stoppad=8000, nodisk=false]
Set genScaffoldInfo=true
Writing chunk 1
Set genome to 1
Loaded Reference: 0.005 seconds.
Loading index for chunk 1-1, build 1
No index available; generating from reference genome: /home/Whole_genome_analysis/HiseQ/HiseQ_data/PST/Trimmed_Seq/ref/index/1/chr1_index_k13_c7_b1.block
Indexing threads started for block 0-1
Indexing threads finished for block 0-1
Generated Index: 4.556 seconds.
Analyzed Index: 2.561 seconds.
Started output stream: 0.526 seconds.
Cleared Memory: 0.209 seconds.
Processing reads in single-ended mode.
Started read stream.
Started 16 mapping threads.
java.lang.ArrayIndexOutOfBoundsException: 0
at stream.FASTQ.quadToRead(FASTQ.java:723)
at stream.FASTQ.toReadList(FASTQ.java:653)
at stream.FastqReadInputStream.fillBuffer(FastqReadInputStream.java:111)
at stream.FastqReadInputStream.nextList(FastqReadInputStream.java:96)
at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:659)
at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:638)
Exception in thread "Thread-22" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Exception in thread "Thread-23" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Exception in thread "Thread-17" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Exception in thread "Thread-21" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Exception in thread "Thread-25" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Exception in thread "Thread-30" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Exception in thread "Thread-16" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Detecting finished threads: 0, 1Exception in thread "Thread-19" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Exception in thread "Thread-20" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Exception in thread "Thread-28" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Exception in thread "Thread-29" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Exception in thread "Thread-26" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Exception in thread "Thread-18" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
, 2, 3, 4, 5, 6, 7Exception in thread "Thread-27" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Exception in thread "Thread-24" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
, 8, 9, 10, 11, 12, 13, 14Exception in thread "Thread-31" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
, 15
**************************************************************************
Warning! 16 mapping threads did not terminate normally.
Check the error log; the output may be corrupt or incomplete.
Please submit the full stderr output as a bug report, not just this message.
**************************************************************************
------------------ Results ------------------
Genome: 1
Key Length: 13
Max Indel: 20
Minimum Score Ratio: 0.56
Mapping Mode: normal
Reads Used: 3200 (446131 bases)
Mapping: 2.454 seconds.
Reads/sec: 1303.92
kBases/sec: 181.79
Read 1 data: pct reads num reads pct bases num bases
mapped: 0.9688% 31 0.7738% 3452
unambiguous: 0.3438% 11 0.3373% 1505
ambiguous: 0.6250% 20 0.4364% 1947
low-Q discards: 0.0000% 0 0.0000% 0
perfect best site: 0.0313% 1 0.0043% 19
semiperfect site: 0.0313% 1 0.0043% 19
Match Rate: NA NA 85.1230% 2941
Error Rate: 17.0455% 30 14.8770% 514
Sub Rate: 17.0455% 30 14.4718% 500
Del Rate: 1.7045% 3 0.0868% 3
Ins Rate: 3.4091% 6 0.3184% 11
N Rate: 0.0000% 0 0.0000% 0
Exception in thread "main" java.lang.RuntimeException: BBMap terminated in an error state; the output may be corrupt.
at align2.BBMap.testSpeed(BBMap.java:489)
at align2.BBMap.main(BBMap.java:34)
at align2.BBSplitter.main(BBSplitter.java:46)
Is there any thing you help with?
Thanks
Leave a comment:
-
Thanks for finding the specific part that triggers the error, that will be really helpful! I'll try to address that as soon as possible (hopefully tomorrow).
Edit: It's fixed now; I'll upload the new version soon.Last edited by Brian Bushnell; 11-12-2015, 03:42 PM.
Leave a comment:
-
cutprimers.sh error
Hello Brian!
I tried to extract a region of interest from the SILVA database with cutprimers.sh (from BBMap v.35.66) and encountered the following error:
Code:Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at java.util.Arrays.copyOfRange(Arrays.java:2551) at jgi.CutPrimers.process(CutPrimers.java:166) at jgi.CutPrimers.main(CutPrimers.java:35)
Reproducible example:
Code:msa.sh in=ERR.fasta out=pr1.sam literal="CCGTTCTTAGTTGGTGG" msa.sh in=ERR.fasta out=pr2.sam literal="TAACAAGGTTTCCGTAGGTG" cutprimers.sh in=ERR.fasta out=ERR_cut.fasta sam1=pr1.sam sam2=pr2.sam
Edit: Thanks a lot for the fix! With v.35.68 everything works fine!Attached FilesLast edited by vmikk; 11-17-2015, 09:49 PM.
Leave a comment:
-
BBSplit is ideal for purpose of getting one file per organism, and there's no upper limit to the number of genomes it can handle. Note that BBSplit needs to have each reference in its own file (e.g. ref=virus1.fa,virus2.fa, etc), and also it should not be used to produce sam files, just fasta/fastq. There's also Seal, which can function similarly to BBSplit, and produces either one output file per reference file or one per reference sequence, so all your organisms can be in the same reference file if you want. It also has no upper limit on the number of reference sequences.
I'll add a flag to tell BBMap/Pileup to ignore deletions when calculating coverage. It does seem fairly useful, so I'll probably do that soon.
...OK, I added it. I'll upload the new version next week, probably on Monday.Last edited by Brian Bushnell; 10-30-2015, 05:11 PM.
Leave a comment:
-
Thanks Brian!
What if I use BBSplit? How many genomes can BBsplit handle? What I ultimately want to do is to figure out how many reads mapped to each genome and separate those reads by genome so that we can evaluate them later if necessary. For instance, we generate a fastq file with MiSeq, we bbsplit on say 1000 viral genomes, and see which virus has more hits. We then somehow report what percentage of the genome is covered with reads and on average at what fold coverage.
Your kind input is highly appreciated.
James
Leave a comment:
-
It looks like you are doing it correctly. Bases covered by deletion events are treated as covered for "percent covered", but "fold coverage" is calculated by the total number of mapped bases divided by the length, so that is not affected by long deletions. So, only the percent covered column is affected by long deletions. You simply ban long deletions with a flag like "strictmaxindel=10" if you don't want those reads.
Alternately, produce a sam file, and then run it through a different program that treats bases covered by deletion events as not covered. I'm not sure which tool would do that, but probably one exists. Maybe samtools or bedtools?
Leave a comment:
-
Hi Brian,
Thanks a lot for your quick response!
Perhaps you can give me some advice: I have a metagenomic sequencing fast file. I want to know how many reads are mapped to each of the viral genomes and what % of each viral genome is covered at what fold. What is the best way to do that?
Thanks in advance!
James
Leave a comment:
-
Hi James,
BBMap and Pileup consider bases covered by a deletion event as covered. So presumably one or both of your reads mapped with a long deletion ("D" symbol in the cigar string). Please let me know if that's not the case.
Covered_percent is simply 100*Covered_bases/Length.
Leave a comment:
-
Hi Brian,
I am using bbmap.sh to evaluate coverage on a collection of viral genomes (viral all.fna downloaded from NCBI). The command I am using is as follow:
bbmap.sh in=XXX.fastq ref=XXX/all.fna nodisk covstats=xxx.txt
I got results in the XXX.txt file but I am not sure if I understand it correctly. For instance, for one of the gi| entry, I got the following information:
#ID Avg_fold Length Ref-GC Covered_percent Covered_bases Plus_reads Minus_reads Read_GC
gi|XXXXXX 0.0926 195859 0.6661 9.2561 18129 1 1 0.9444
How come with one Plus_reads and one minus_reads you can get covered_base=18129? How does the Covered_percent calculated?
Your kind help is highly appreciated.
James
Leave a comment:
-
Unfortunately, I'm not sure how this could be accomplished without changing the scoring matrix, which is currently hard-coded, or else doing custom post-processing the output.
It's an odd request, though, as PacBio reads have substitution errors as well as indels. If not for the indels, the "semiperfectmode" might accomplish what you want - it requires perfect mappings, aside from the parts that go off the ends of contigs.
Perhaps the best bet would be to align with "bbmapskimmer.sh", which tries to report all alignments above a certain score cutoff, rather than looking for the single best alignment like normal BBMap (and it is designed for PacBio reads). Then, subfilter=0 would get rid of sites with substitutions, but shorter sub-free alignments would be retained.
Leave a comment:
-
mapPacBio.sh - weighting soft clipped mappings higher than mappings with substitution
Hi Brian,
I was wondering if there was a way to adjust the mapping score weights such as to favor a mapping that perfectly maps to a reference with soft clipped ends over a mapping with a number of substitutions. Our problem is that we have allele databases of variable length sequences that closely resemble one another (in this case Major Histocompatibility complex class II - macaques) where our pacbio reads end up extending well past the reference sequence, but map perfectly to the reference. We want to include gapped mappings but not mappings with substitutions for the purpose of counting reads to individual ref sequences taking into account pacbio's tendency to introduce random indels.
We have been using bbmap - with the pacbio option and the subfilter=0 option as packaged within Geneious 9.2. This has the result of discarding perfect mappings to allele references where there is a higher scoring hit to a longer reference sequence with substitutions that subsequently gets filtered out post-mapping with the subfilter=0 option.
This looks to give the correct result if we use a full length allele reference file, however many of the alleles we need to count for genotyping don't have a full-length ref sequence identified as of yet.
Leave a comment:
-
BBMap needs a fairly large amount of memory dependent on kmer size, regardless of how small the reference is. You can estimate it like this:
Code:C:\temp\ecoli>java jgi.AssemblyStats2 in=ecoli_K12.fa k=13 A C G T N IUPAC Other GC GC_stdev 0.2462 0.2542 0.2537 0.2459 0.0000 0.0000 0.0000 0.5079 0.0000 Main genome scaffold total: 1 Main genome contig total: 1 Main genome scaffold sequence total: 4.640 MB [I](stuff)...[/I] [B]BBMap minimum memory estimate at k=13: [COLOR="DarkRed"]-Xmx900m[/COLOR] (at least 1000 MB physical RAM)[/B]
Code:C:\temp\ecoli>java jgi.AssemblyStats2 in=ecoli_K12.fa k=12 A C G T N IUPAC Other GC GC_stdev 0.2462 0.2542 0.2537 0.2459 0.0000 0.0000 0.0000 0.5079 0.0000 Main genome scaffold total: 1 Main genome contig total: 1 Main genome scaffold sequence total: 4.640 MB [I](stuff)...[/I] [B]BBMap minimum memory estimate at k=12: [COLOR="DarkRed"]-Xmx480m[/COLOR] (at least 540 MB physical RAM)[/B]
Leave a comment:
-
out of memory error
even with a very small ref and input file, I got the out of memory error. Would you check what is happen?
bbmap.sh in=~/Documents/align/test.fa out=~/Documents/align/outputu.sam
java -Djava.library.path=/home/local/bbmap/jni/ -ea -Xmx700m -cp /home/local/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 in=/home/Documents/align/HLAtest.fa out=/home/Documents/align/outputu.sam
Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, in=/home/Documents/align/test.fa, out=/home/Documents/align/outputu.sam]
BBMap version 35.51
Retaining first best site only for ambiguous mappings.
Set genome to 1
Loaded Reference: 0.023 seconds.
Loading index for chunk 1-1, build 1
Generated Index: 0.817 seconds.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at align2.BBIndex.analyzeIndex(BBIndex.java:106)
at align2.BBMap.loadIndex(BBMap.java:410)
at align2.BBMap.main(BBMap.java:32)
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
-
by seqadmin
Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...-
Channel: Articles
03-22-2024, 06:39 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
25 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
29 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
25 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
52 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Leave a comment: