Wow, that's actually... a lot nicer. Thanks! I'll put it in the next release (and credit you of course).
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Hi,
sorry, the Javva 6 version is not working for me either. If I check my version I get:
$ java -version
java version "1.6.0_27"
OpenJDK Runtime Environment (IcedTea6 1.12.6) (6b27-1.12.6-1~deb7u1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
all29c@aahl-02-mel:~/data/019/19.5$
The error is:
Exception in thread "main" java.lang.UnsupportedClassVersionError: align2/BBMap : Unsupported major.minor version 51.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)
at java.net.URLClassLoader.access$000(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:212)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
Could not find the main class: align2.BBMap. Program will exit.
Comment
-
Cool! I did bbduk.sh as well. I've been using those two the most the last couple of days. FYI I had to make sure there wasn't any mixture of tabs and spaces (fortunately Sublime Text 2 can make that swap for me). Also it's a lot easier to write those tedious usage things in a single echo instead of multiple.
Code:usage(){ echo " Written by Brian Bushnell Last modified October 27, 2014 Description: Compares reads to the kmers in a reference dataset, optionally allowing an edit distance. Splits the reads into two outputs - those that match the reference, and those that don't. Can also trim (remove) the matching parts of the reads rather than binning the reads. Usage: bbduk.sh in=<input file> out=<output file> ref=<contaminant files> Input may be stdin or a fasta, fastq, or sam file, compressed or uncompressed. Output may be stdout or a file. If you pipe via stdin/stdout, please include the file type; e.g. for gzipped fasta input, set in=stdin.fa.gz Optional parameters (and their defaults) Input parameters: in=<file> Main input. in=stdin.fq will pipe from stdin. in2=<file> Input for 2nd read of pairs in a different file. ref=<file,file> Comma-delimited list of reference files. touppercase=f (tuc) Change all bases upper-case. interleaved=auto (int) t/f overrides interleaved autodetection. qin=auto Input quality offset: 33 (Sanger), 64, or auto. reads=-1 If positive, quit after processing X reads or pairs. Output parameters: out=<file> (outnonmatch) Write reads here that do not contain kmers matching the database. 'out=stdout.fq' will pipe to standard out. out2=<file> (outnonmatch2) Use this to write 2nd read of pairs to a different file. outmatch=<file> (outm or outb) Write reads here that contain kmers matching the database. outmatch2=<file> (outm2 or outb2) Use this to write 2nd read of pairs to a different file. outsingle=<file> (outs) Use this to write singleton reads whose mate was trimmed shorter than minlen. stats=<file> Write statistics about which contamininants were detected. duk=<file> Write statistics in duk's format. overwrite=t (ow) Grant permission to overwrite files. showspeed=t (ss) 'f' suppresses display of processing speed. ziplevel=2 (zl) Compression level; 1 (min) through 9 (max). fastawrap=80 Length of lines in fasta output. qout=auto Output quality offset: 33 (Sanger), 64, or auto. Histogram output parameters: bhist=<file> Base composition histogram by position. qhist=<file> Quality histogram by position. aqhist=<file> Histogram of average read quality. bqhist=<file> Quality histogram designed for box plots. lhist=<file> Read length histogram. gchist=<file> Read GC content histogram. Histograms for sam files only (requires sam format 1.4 or higher): ehist=<file> Errors-per-read histogram. qahist=<file> Quality accuracy histogram of error rates versus quality score. indelhist=<file> Indel length histogram. mhist=<file> Histogram of match, sub, del, and ins rates by read location. idhist=<file> Histogram of read count versus percent identity. Processing parameters: threads=auto (t) Set number of threads to use; default is number of logical processors. k=27 Kmer length used for finding contaminants. Contaminants shorter than k will not be found. k must be at least 1. rcomp=t Look for reverse-complements of kmers in addition to forward kmers. maskmiddle=t (mm) Treat the middle base of a kmer as a wildcard, to increase sensitivity in the presence of errors. maxbadkmers=0 (mbk) Reads with more than this many contaminant kmers will be discarded. hammingdistance=0 (hdist) Maximum Hamming distance from ref kmers (subs only). Memory use is proportional to (3*K)^hdist. editdistance=0 (edist) Maximum edit distance from ref kmers (subs and indels). Memory use is proportional to (8*K)^edist. forbidn=f (fn) Forbids matching of read kmers containing N. By default, these will match a reference 'A' if hdist>0 or edist>0, to increase sensitivity. maxns=-1 If non-negative, reads with more Ns than this (before trimming) will be discarded. minskip=1 (mns) Force minimal skip interval when indexing reference kmers. 1 means use all, 2 means use every other kmer, etc. maxskip=99 (mxs) Restrict maximal skip interval when indexing reference kmers. Normally all are used for scaffolds<100kb, but with longer scaffolds, up to K-1 are skipped. removeifeitherbad=t (rieb) Paired reads get sent to 'outmatch' if either is match (or either is trimmed shorter than minlen). Set to false to require both. findbestmatch=f (fbm) If multiple matches, associate read with sequence sharing most kmers. prealloc=f Preallocate memory in table. Allows faster table loading and more efficient memory usage, for a large reference. Trimming/Masking parameters: ktrim=f Trim reads to remove bases matching reference kmers. Values: t (trim) f (don't trim), r (trim right end), l (trim left end), n (convert to N instead of trimming). Any non-whitespace character other than t, f, r, l, n: convert to that symbol rather than trimming, and process short kmers on both ends. useshortkmers=f (usk) Look for shorter kmers at read tips (only for k-trimming). Enabling this will disable maskmiddle. mink=6 Minimum length of short kmers. Setting this automatically sets useshortkmers=t. qtrim=f Trim read ends to remove bases with quality below minq. Performed AFTER looking for kmers. Values: t (trim both ends), f (neither end), r (right end only), l (left end only). trimq=6 Trim quality threshold. minlength=10 (ml) Reads shorter than this after trimming will be discarded. Pairs will be discarded only if both are shorter. maxlength= Reads longer than this after trimming will be discarded. Pairs will be discarded only if both are longer. minavgquality=0 (maq) Reads with average quality (before trimming) below this will be discarded. otm=f (outputtrimmedtomatch) Output reads trimmed to shorter than minlength to outm rather than discarding. tp=0 (trimpad) Trim this much extra around matching kmers. tbo=f (trimbyoverlap) Trim adapters based on where paired reads overlap. minoverlap=24 Require this many bases of overlap for overlap detection. mininsert=25 Require insert size of at least this much for overlap detection (will automatically set minoverlap too). tpe=f (trimpairsevenly) When kmer right-trimming, trim both reads to the minimum length of either. forcetrimleft=0 (ftl) If positive, trim bases to the left of this position (exclusive, 0-based). forcetrimright=0 (ftr) If positive, trim bases to the right of this position (exclusive, 0-based). restrictleft=0 If positive, only look for kmer matches in the leftmost X bases. restrictright=0 If positive, only look for kmer matches in the rightmost X bases. Java Parameters: -Xmx This will be passed to Java to set memory usage, overriding the program's automatic memory detection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory. There is a changelog at /global/projectb/sandbox/gaag/bbtools/docs/changelog_bbduk.txt Please contact Brian Bushnell at [email protected] if you encounter any problems. " }
/* Shawn Driscoll, Gene Expression Laboratory, Pfaff
Salk Institute for Biological Studies, La Jolla, CA, USA */
Comment
-
Originally posted by susanklein View PostHi,
sorry, the Javva 6 version is not working for me either. If I check my version I get:
$ java -version
java version "1.6.0_27"
OpenJDK Runtime Environment (IcedTea6 1.12.6) (6b27-1.12.6-1~deb7u1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
all29c@aahl-02-mel:~/data/019/19.5$
For Java 6 .. last release is 45. Looks like you are on release 27.
Comment
-
Originally posted by susanklein View PostHi,
sorry, the Javva 6 version is not working for me either. If I check my version I get:
$ java -version
java version "1.6.0_27"
OpenJDK Runtime Environment (IcedTea6 1.12.6) (6b27-1.12.6-1~deb7u1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
all29c@aahl-02-mel:~/data/019/19.5$
The error is:
Exception in thread "main" java.lang.UnsupportedClassVersionError: align2/BBMap : Unsupported major.minor version 51.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)
at java.net.URLClassLoader.access$000(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:212)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
Could not find the main class: align2.BBMap. Program will exit.
Sorry, that was my fault; I was able to replicate that error. The wrong binaries were somehow in the java6 package. I've fixed it for real this time with BBMap_33.89b_java6.tar.gz which is now uploaded.
-Brian
Comment
-
Originally posted by sdriscoll View PostCool! I did bbduk.sh as well. I've been using those two the most the last couple of days. FYI I had to make sure there wasn't any mixture of tabs and spaces (fortunately Sublime Text 2 can make that swap for me). Also it's a lot easier to write those tedious usage things in a single echo instead of multiple.
Comment
-
Hi,
this is very wierd.. it must be a machine problem, but bbmap (and bowtie1/2) are not working for me for certain file sizes (~200k). The error is below. The file is there, looks normal. If I double or half its size, then the index builds. Very strange.
Thanks for the help though!
5$ ~/bin/bbmap/bbmap.sh ref=/OSM/HOME-MEL/all29c/data/019/19.5/plasmids.fna
java -Djava.library.path=/OSM/HOME-MEL/all29c/bin/bbmap/jni/ -ea -Xmx32955m -cp /OSM/HOME-MEL/all29c/bin/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 ref=/OSM/HOME-MEL/all29c/data/019/19.5/plasmids.fna
Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, ref=/OSM/HOME-MEL/all29c/data/019/19.5/plasmids.fna]
BBMap version 33.89
Retaining first best site only for ambiguous mappings.
No output file.
Writing reference.
Executing dna.FastaToChromArrays2 [/OSM/HOME-MEL/all29c/data/019/19.5/plasmids.fna, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gzip=true, chromc=false, maxlen=536670912, writechroms=true, minscaf=1, midpad=300, startpad=8000, stoppad=8000, nodisk=false]
Set genScaffoldInfo=true
Writing chunk 1
Set genome to 1
Loaded Reference: 0.012 seconds.
Loading index for chunk 1-1, build 1
No index available; generating from reference genome: /OSM/HOME-MEL/all29c/data/019/19.5/ref/index/1/chr1_index_k13_c13_b1.block
Indexing threads started for block 0-1
Indexing threads finished for block 0-1
Generated Index: 2.162 seconds.
No reads to process; quitting.Attached Files
Comment
-
Although.. not sure now.. it seems to have made two files chr1_index_k13_c13_b1.block and chr1_index_k13_c13_b1.block2.gz
is this right? Maybe it has worked.. The 'no reads to process' threw me.
Thanks.
S
Originally posted by susanklein View PostHi,
this is very wierd.. it must be a machine problem, but bbmap (and bowtie1/2) are not working for me for certain file sizes (~200k). The error is below. The file is there, looks normal. If I double or half its size, then the index builds. Very strange.
Thanks for the help though!
5$ ~/bin/bbmap/bbmap.sh ref=/OSM/HOME-MEL/all29c/data/019/19.5/plasmids.fna
java -Djava.library.path=/OSM/HOME-MEL/all29c/bin/bbmap/jni/ -ea -Xmx32955m -cp /OSM/HOME-MEL/all29c/bin/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 ref=/OSM/HOME-MEL/all29c/data/019/19.5/plasmids.fna
Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, ref=/OSM/HOME-MEL/all29c/data/019/19.5/plasmids.fna]
BBMap version 33.89
Retaining first best site only for ambiguous mappings.
No output file.
Writing reference.
Executing dna.FastaToChromArrays2 [/OSM/HOME-MEL/all29c/data/019/19.5/plasmids.fna, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gzip=true, chromc=false, maxlen=536670912, writechroms=true, minscaf=1, midpad=300, startpad=8000, stoppad=8000, nodisk=false]
Set genScaffoldInfo=true
Writing chunk 1
Set genome to 1
Loaded Reference: 0.012 seconds.
Loading index for chunk 1-1, build 1
No index available; generating from reference genome: /OSM/HOME-MEL/all29c/data/019/19.5/ref/index/1/chr1_index_k13_c13_b1.block
Indexing threads started for block 0-1
Indexing threads finished for block 0-1
Generated Index: 2.162 seconds.
No reads to process; quitting.
Comment
-
Originally posted by susanklein View PostAlthough.. not sure now.. it seems to have made two files chr1_index_k13_c13_b1.block and chr1_index_k13_c13_b1.block2.gz
is this right? Maybe it has worked.. The 'no reads to process' threw me.
Thanks.
S
Use path=/OSM/HOME-MEL/all29c/data/019/19.5 when you do the mapping.
Comment
-
how do you control the quality filtering threshold in bbmap? i see the maq threshold setting in reformat.sh - is that same setting usable in bbmap? thanks./* Shawn Driscoll, Gene Expression Laboratory, Pfaff
Salk Institute for Biological Studies, La Jolla, CA, USA */
Comment
-
I must be misunderstanding something then. I aligned some PE 100 reads from 2011 that had some crappy qualities with bbmap and in the output bam it had flagged many reads with x200 (qc failed). Is bbmap not filtering by base qualities?/* Shawn Driscoll, Gene Expression Laboratory, Pfaff
Salk Institute for Biological Studies, La Jolla, CA, USA */
Comment
-
Oh... that's a heuristic that cannot be adjusted, though it CAN be disabled (with the "usequality=f" or "uq=f" flag). I'll add that to the help menu.
Not every kmer in a read is used to search the index. Most are discarded, and they are discarded based on several factors:
1) How close they are to other retained kmers (to get a good distribution across the read).
2) The probability that the kmer is correct, based on base qualities; by default, kmers with a 94%+ chance of containing an error are usually discarded, and kmers with a 99.99%+ chance of containing an error are always discarded - this includes any kmer that spans an 'N'.
3) Kmers that are more common in that specific genome are more likely to be discarded.
If a read ends up with fewer than "minhits" kmers (default 1) then it will be discarded as "low quality".
"uq=f" will turn off rule 2, so quality scores will not be considered. However, a read will still be discarded as low quality if there are so many Ns in it that there are not K consecutive defined bases anywhere.
Comment
-
Ah thanks for the explanation! Very interesting. I'm not used to aligners making those kind of decisions but I think it's useful. I have another question for you.
Originally posted by sdriscoll View PostOK one more question. What would the correct alignment option setting be to get alignments with 0 mismatches and still allow splices? If it's not possible I can just do downstream filtering.
Thanks.Originally posted by Brian Bushnell View PostThere is no option for that, unfortunately.
Thanks./* Shawn Driscoll, Gene Expression Laboratory, Pfaff
Salk Institute for Biological Studies, La Jolla, CA, USA */
Comment
-
First, let me describe "minid" and "idfilter", which are slightly different:
"minid" sets a threshold for the minimum allowable alignment score, which is based on a custom hard-coded affine-transform weight matrix, and only loosely correlated with the actual percent identity, depending on the types and lengths of mutation events. It directly affects the mapping process (and speed), as sites that can be proven to have a score below the minimum ratio are not further considered.
"idfilter" does not affect mapping at all, but rather, sites that have an true percent identity below "idfilter" are not printed. You can use both or either; the difference is subtle, but important in some cases.
Now, as for scoring - the affine score matrix is deeply embedded in BBMap. A 100bp read that maps perfectly with only matches has a 100% identity and BBMap score of 9970 (100 points per match, but only 70 points for the first match). If a 100bp read can map somewhere with 1 mismatch (99% identity, BBMap score of 9713: 9970+1*(-127-100-100+70)), that site will always be chosen over a site with 50bp match, 20k intron, 50bp match (4.7% identity, BBMap score of roughly 8841). But if the 100bp read can map somewhere with 5 mismatches (95% identity, BBMap score of 8685: 9970 +5*(-127-100-100+70)), the spliced alignment will be chosen, because 8841>8685. However they are very close so it would get a low mapping score.
These site-selection priorities will be made no matter what you set for idfilter or minid; they could only be changed by a new weight matrix, which is nontrivial because it affects a lot of the proofs scattered through the code. A couple mismatches will always be preferred over a really long splice.
So, you can post-filter based on the number of mismatches, but there's currently no way to adjust what BBMap thinks is the best alignment beyond using the "maxindel" flag. However, I could make a new weight matrix that penalizes substitutions more and deletions less, as long as a single deletion strictly has a greater score penalty than a single substitution. Would that be helpful?
Comment
Latest Articles
Collapse
-
by seqadmin
The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...-
Channel: Articles
11-06-2024, 07:24 PM -
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 11-08-2024, 11:09 AM
|
0 responses
35 views
0 likes
|
Last Post
by seqadmin
11-08-2024, 11:09 AM
|
||
Started by seqadmin, 11-08-2024, 06:13 AM
|
0 responses
28 views
0 likes
|
Last Post
by seqadmin
11-08-2024, 06:13 AM
|
||
Started by seqadmin, 11-01-2024, 06:09 AM
|
0 responses
32 views
0 likes
|
Last Post
by seqadmin
11-01-2024, 06:09 AM
|
||
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, 10-30-2024, 05:31 AM
|
0 responses
23 views
0 likes
|
Last Post
by seqadmin
10-30-2024, 05:31 AM
|
Comment