BBMap (aligner for DNA/RNAseq) is now open-source and available for download.

GenoMax replied

11-13-2015, 10:49 AM
I think you are running out of memory (@Brian will confirm). How much RAM do you have (looks like you are asking for ~18G)?
Leave a comment:
Bioinform replied

11-13-2015, 10:20 AM
Hi Brian,

I have a some error with BBsplit.sh (BBmap35.66). I get the following error.,

Module slurm/15.08.1 loaded
java -Djava.library.path=/home/software/BBMap/bbmap/jni/ -ea -Xmx18891m -cp /home/software/BBMap/bbmap/current/ align2.BBSplitter ow=t fastareadlen=500 minhits=1 minratio=0.56 maxindel=20 qtrim=rl untrim=t trimq=6 in=/home/Whole_genome_analysis/HiseQ/HiseQ_data/PST/Trimmed_Seq/GGAGC_trimmed1.fastq ref=/home/Whole_genome_analysis/Reference/NCPPB382/Cmm382_genome.fasta,/home/Whole_genome_analysis/Reference/NCPPB382/Xcitri_jx6.fasta basename=PST_407_%.fq outu=PST_407_unmapped.fq
Executing align2.BBSplitter [ow=t, fastareadlen=500, minhits=1, minratio=0.56, maxindel=20, qtrim=rl, untrim=t, trimq=6, in=/home/Whole_genome_analysis/HiseQ/HiseQ_data/PST/Trimmed_Seq/GGAGC_trimmed1.fastq, ref=/home/Whole_genome_analysis/Reference/NCPPB382/Cmm382_genome.fasta,/home/Whole_genome_analysis/Reference/NCPPB382/Xcitri_jx6.fasta, basename=PST_407_%.fq, outu=PST_407_unmapped.fq]

Converted arguments to [ow=t, fastareadlen=500, minhits=1, minratio=0.56, maxindel=20, qtrim=rl, untrim=t, trimq=6, in=/home/Whole_genome_analysis/HiseQ/HiseQ_data/PST/Trimmed_Seq/GGAGC_trimmed1.fastq, basename=PST_407_%.fq, outu=PST_407_unmapped.fq, ref_Cmm382_genome=/home/Whole_genome_analysis/Reference/NCPPB382/Cmm382_genome.fasta, ref_Xcitri_jx6=/home/Whole_genome_analysis/Reference/NCPPB382/Xcitri_jx6.fasta]
Creating merged reference file ref/genome/1/merged_ref_1588668246460521.fa.gz
Ref merge time: 1.421 seconds.
Executing align2.BBMap [ow=t, fastareadlen=500, minhits=1, minratio=0.56, maxindel=20, qtrim=rl, untrim=t, trimq=6, in=/home/Whole_genome_analysis/HiseQ/HiseQ_data/PST/Trimmed_Seq/GGAGC_trimmed1.fastq, outu=PST_407_unmapped.fq, ref=ref/genome/1/merged_ref_1588668246460521.fa.gz, out_Cmm382_genome=PST_407_Cmm382_genome.fq, out_Xcitri_jx6=PST_407_Xcitri_jx6.fq]

BBMap version 35.66
Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.560
Retaining first best site only for ambiguous mappings.
NOTE: Deleting contents of ref/genome/1 because reference is specified and overwrite=true
Writing reference.
Executing dna.FastaToChromArrays2 [ref/genome/1/merged_ref_1588668246460521.fa.gz, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=1, midpad=300, startpad=8000, stoppad=8000, nodisk=false]

Set genScaffoldInfo=true
Writing chunk 1
Set genome to 1

Loaded Reference: 0.005 seconds.
Loading index for chunk 1-1, build 1
No index available; generating from reference genome: /home/Whole_genome_analysis/HiseQ/HiseQ_data/PST/Trimmed_Seq/ref/index/1/chr1_index_k13_c7_b1.block
Indexing threads started for block 0-1
Indexing threads finished for block 0-1
Generated Index: 4.556 seconds.
Analyzed Index: 2.561 seconds.
Started output stream: 0.526 seconds.
Cleared Memory: 0.209 seconds.
Processing reads in single-ended mode.
Started read stream.
Started 16 mapping threads.
java.lang.ArrayIndexOutOfBoundsException: 0
at stream.FASTQ.quadToRead(FASTQ.java:723)
at stream.FASTQ.toReadList(FASTQ.java:653)
at stream.FastqReadInputStream.fillBuffer(FastqReadInputStream.java:111)
at stream.FastqReadInputStream.nextList(FastqReadInputStream.java:96)
at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:659)
at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:638)
Exception in thread "Thread-22" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Exception in thread "Thread-23" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Exception in thread "Thread-17" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Exception in thread "Thread-21" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Exception in thread "Thread-25" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Exception in thread "Thread-30" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Exception in thread "Thread-16" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Detecting finished threads: 0, 1Exception in thread "Thread-19" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Exception in thread "Thread-20" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Exception in thread "Thread-28" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Exception in thread "Thread-29" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Exception in thread "Thread-26" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Exception in thread "Thread-18" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
, 2, 3, 4, 5, 6, 7Exception in thread "Thread-27" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
Exception in thread "Thread-24" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
, 8, 9, 10, 11, 12, 13, 14Exception in thread "Thread-31" java.lang.NullPointerException
at align2.AbstractMapThread.run(AbstractMapThread.java:563)
, 15

**************************************************************************
Warning! 16 mapping threads did not terminate normally.
Check the error log; the output may be corrupt or incomplete.
Please submit the full stderr output as a bug report, not just this message.
**************************************************************************

------------------ Results ------------------

Genome: 1
Key Length: 13
Max Indel: 20
Minimum Score Ratio: 0.56
Mapping Mode: normal
Reads Used: 3200 (446131 bases)

Mapping: 2.454 seconds.
Reads/sec: 1303.92
kBases/sec: 181.79

Read 1 data: pct reads num reads pct bases num bases

mapped: 0.9688% 31 0.7738% 3452
unambiguous: 0.3438% 11 0.3373% 1505
ambiguous: 0.6250% 20 0.4364% 1947
low-Q discards: 0.0000% 0 0.0000% 0

perfect best site: 0.0313% 1 0.0043% 19
semiperfect site: 0.0313% 1 0.0043% 19

Match Rate: NA NA 85.1230% 2941
Error Rate: 17.0455% 30 14.8770% 514
Sub Rate: 17.0455% 30 14.4718% 500
Del Rate: 1.7045% 3 0.0868% 3
Ins Rate: 3.4091% 6 0.3184% 11
N Rate: 0.0000% 0 0.0000% 0
Exception in thread "main" java.lang.RuntimeException: BBMap terminated in an error state; the output may be corrupt.
at align2.BBMap.testSpeed(BBMap.java:489)
at align2.BBMap.main(BBMap.java:34)
at align2.BBSplitter.main(BBSplitter.java:46)

Is there any thing you help with?

Thanks
Leave a comment:
Brian Bushnell replied

11-11-2015, 09:06 AM
Thanks for finding the specific part that triggers the error, that will be really helpful! I'll try to address that as soon as possible (hopefully tomorrow).

Edit: It's fixed now; I'll upload the new version soon.

Last edited by Brian Bushnell; 11-12-2015, 03:42 PM.
Leave a comment:
vmikk replied

11-10-2015, 11:39 PM
cutprimers.sh error

Hello Brian!
I tried to extract a region of interest from the SILVA database with cutprimers.sh (from BBMap v.35.66) and encountered the following error:

Code:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at java.util.Arrays.copyOfRange(Arrays.java:2551) at jgi.CutPrimers.process(CutPrimers.java:166) at jgi.CutPrimers.main(CutPrimers.java:35)

I split the database and found that only a small part causes this error, but I can not figure out what is wrong with it. I would appreciate your help.

Reproducible example:

Code:

msa.sh in=ERR.fasta out=pr1.sam literal="CCGTTCTTAGTTGGTGG" msa.sh in=ERR.fasta out=pr2.sam literal="TAACAAGGTTTCCGTAGGTG" cutprimers.sh in=ERR.fasta out=ERR_cut.fasta sam1=pr1.sam sam2=pr2.sam

File is in the attachment.

Edit: Thanks a lot for the fix! With v.35.68 everything works fine!
Attached Files

ERR.zip (17.9 KB, 2 views)
Last edited by vmikk; 11-17-2015, 09:49 PM.
Leave a comment:
Brian Bushnell replied

10-30-2015, 03:52 PM
BBSplit is ideal for purpose of getting one file per organism, and there's no upper limit to the number of genomes it can handle. Note that BBSplit needs to have each reference in its own file (e.g. ref=virus1.fa,virus2.fa, etc), and also it should not be used to produce sam files, just fasta/fastq. There's also Seal, which can function similarly to BBSplit, and produces either one output file per reference file or one per reference sequence, so all your organisms can be in the same reference file if you want. It also has no upper limit on the number of reference sequences.

I'll add a flag to tell BBMap/Pileup to ignore deletions when calculating coverage. It does seem fairly useful, so I'll probably do that soon.

...OK, I added it. I'll upload the new version next week, probably on Monday.

Last edited by Brian Bushnell; 10-30-2015, 05:11 PM.
Leave a comment:
rocksd replied

10-30-2015, 02:44 PM
Thanks Brian!

What if I use BBSplit? How many genomes can BBsplit handle? What I ultimately want to do is to figure out how many reads mapped to each genome and separate those reads by genome so that we can evaluate them later if necessary. For instance, we generate a fastq file with MiSeq, we bbsplit on say 1000 viral genomes, and see which virus has more hits. We then somehow report what percentage of the genome is covered with reads and on average at what fold coverage.

Your kind input is highly appreciated.

James
Leave a comment:
Brian Bushnell replied

10-30-2015, 11:37 AM
It looks like you are doing it correctly. Bases covered by deletion events are treated as covered for "percent covered", but "fold coverage" is calculated by the total number of mapped bases divided by the length, so that is not affected by long deletions. So, only the percent covered column is affected by long deletions. You simply ban long deletions with a flag like "strictmaxindel=10" if you don't want those reads.

Alternately, produce a sam file, and then run it through a different program that treats bases covered by deletion events as not covered. I'm not sure which tool would do that, but probably one exists. Maybe samtools or bedtools?
Leave a comment:
rocksd replied

10-30-2015, 11:10 AM
Hi Brian,

Thanks a lot for your quick response!

Perhaps you can give me some advice: I have a metagenomic sequencing fast file. I want to know how many reads are mapped to each of the viral genomes and what % of each viral genome is covered at what fold. What is the best way to do that?

Thanks in advance!

James
Leave a comment:
Brian Bushnell replied

10-30-2015, 09:44 AM
Hi James,

BBMap and Pileup consider bases covered by a deletion event as covered. So presumably one or both of your reads mapped with a long deletion ("D" symbol in the cigar string). Please let me know if that's not the case.

Covered_percent is simply 100*Covered_bases/Length.
Leave a comment:
rocksd replied

10-30-2015, 07:10 AM
Hi Brian,

I am using bbmap.sh to evaluate coverage on a collection of viral genomes (viral all.fna downloaded from NCBI). The command I am using is as follow:

bbmap.sh in=XXX.fastq ref=XXX/all.fna nodisk covstats=xxx.txt

I got results in the XXX.txt file but I am not sure if I understand it correctly. For instance, for one of the gi| entry, I got the following information:

#ID Avg_fold Length Ref-GC Covered_percent Covered_bases Plus_reads Minus_reads Read_GC

gi|XXXXXX 0.0926 195859 0.6661 9.2561 18129 1 1 0.9444

How come with one Plus_reads and one minus_reads you can get covered_base=18129? How does the Covered_percent calculated?

Your kind help is highly appreciated.

James
Leave a comment:
Brian Bushnell replied

10-28-2015, 09:41 AM
Unfortunately, I'm not sure how this could be accomplished without changing the scoring matrix, which is currently hard-coded, or else doing custom post-processing the output.

It's an odd request, though, as PacBio reads have substitution errors as well as indels. If not for the indels, the "semiperfectmode" might accomplish what you want - it requires perfect mappings, aside from the parts that go off the ends of contigs.

Perhaps the best bet would be to align with "bbmapskimmer.sh", which tries to report all alignments above a certain score cutoff, rather than looking for the single best alignment like normal BBMap (and it is designed for PacBio reads). Then, subfilter=0 would get rid of sites with substitutions, but shorter sub-free alignments would be retained.
Leave a comment:
lankage replied

10-28-2015, 08:57 AM
mapPacBio.sh - weighting soft clipped mappings higher than mappings with substitution

Hi Brian,
I was wondering if there was a way to adjust the mapping score weights such as to favor a mapping that perfectly maps to a reference with soft clipped ends over a mapping with a number of substitutions. Our problem is that we have allele databases of variable length sequences that closely resemble one another (in this case Major Histocompatibility complex class II - macaques) where our pacbio reads end up extending well past the reference sequence, but map perfectly to the reference. We want to include gapped mappings but not mappings with substitutions for the purpose of counting reads to individual ref sequences taking into account pacbio's tendency to introduce random indels.

We have been using bbmap - with the pacbio option and the subfilter=0 option as packaged within Geneious 9.2. This has the result of discarding perfect mappings to allele references where there is a higher scoring hit to a longer reference sequence with substitutions that subsequently gets filtered out post-mapping with the subfilter=0 option.

This looks to give the correct result if we use a full length allele reference file, however many of the alleles we need to count for genotyping don't have a full-length ref sequence identified as of yet.
Leave a comment:
mhgseq replied

10-14-2015, 07:08 AM
Thank you very much. After specify K in both command, it works.
Leave a comment:

Brian Bushnell replied

10-13-2015, 02:49 PM

BBMap needs a fairly large amount of memory dependent on kmer size, regardless of how small the reference is. You can estimate it like this:

Code:

C:\temp\ecoli>java jgi.AssemblyStats2 in=ecoli_K12.fa k=13
A       C       G       T       N       IUPAC   Other   GC      GC_stdev
0.2462  0.2542  0.2537  0.2459  0.0000  0.0000  0.0000  0.5079  0.0000

Main genome scaffold total:             1
Main genome contig total:               1
Main genome scaffold sequence total:    4.640 MB
[I](stuff)...[/I]

[B]BBMap minimum memory estimate at k=13:     [COLOR="DarkRed"]-Xmx900m[/COLOR]     (at least 1000 MB physical RAM)[/B]

So, it looks like K=13 with a tiny genome like E.coli still needs 900MB! Let's try K=12:

Code:

C:\temp\ecoli>java jgi.AssemblyStats2 in=ecoli_K12.fa k=12
A       C       G       T       N       IUPAC   Other   GC      GC_stdev
0.2462  0.2542  0.2537  0.2459  0.0000  0.0000  0.0000  0.5079  0.0000

Main genome scaffold total:             1
Main genome contig total:               1
Main genome scaffold sequence total:    4.640 MB
[I](stuff)...[/I]

[B]BBMap minimum memory estimate at k=12:     [COLOR="DarkRed"]-Xmx480m[/COLOR]     (at least 540 MB physical RAM)[/B]

That's more reasonable; K=12 will work with -Xmx480m. Note that you must specify K both when building the index AND mapping.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 29 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 25 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News