Seqanswers Leaderboard Ad

**Brian Bushnell** · 06-22-2014, 10:00 PM

Woody,

It sounds like Java is not installed on your cluster. Try running this command:

java -version

It should print some kind of message, like this:

java version "1.6.0_07"
Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
Java HotSpot(TM) Client VM (build 10.0-b23, mixed mode)

If it just says "command not found" then java is not installed. You should contact the system administrator, or download it from here:

Download the Latest Java LTS Free

http://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html

Subscribe to Java SE and get the most comprehensive Java support available, with 24/7 global access to the experts.

...and put 'java' in your path. Or explicitly change the path of 'java' in my shellscript to wherever you install it.

**woodydon** · 06-22-2014, 10:16 PM

Hi Brian,

Thank you for the quick reply. It indeed was the problem of not finding java. I have consulted with the administrator about this since I used java before without a problem.

Sincerely,
Woody

**woodydon** · 06-22-2014, 11:29 PM

Originally posted by Brian Bushnell View Post

Woody,

Did you try error-correcting and using the 'vloose' setting? Both of those can substantially improve the merge rate.

If you want to see what the actual, unbiased insert size distribution of your data, then you can generate a histogram by mapping:

bbmap.sh in1=read1.fq in2=read2.fq ref=reference.fa ihist=ihist.txt nodisk

Then plot a graph of 'ihist.txt', so you can see how many reads should be expected to assemble.

I can't find error-correcting option in bbmerge.sh. Also, in bbmap.sh, "out=" seems outputs reads instead of sam files. Could you please give me a hint? Thanks!

**GenoMax** · 06-23-2014, 03:41 AM

Originally posted by woodydon View Post

I can't find error-correcting option in bbmerge.sh. Also, in bbmap.sh, "out=" seems outputs reads instead of sam files. Could you please give me a hint? Thanks!

With bbmap.sh out=filename.sam should produce a sam file (http://seqanswers.com/forums/showpos...69&postcount=1).

**Brian Bushnell** · 06-23-2014, 08:18 AM

Originally posted by woodydon View Post

I can't find error-correcting option in bbmerge.sh. Also, in bbmap.sh, "out=" seems outputs reads instead of sam files. Could you please give me a hint? Thanks!

Woody,

bbmerge does not do error-correction; for that, you need to use the script "ecc.sh", e.g.

ecc.sh in=reads.fq out=corrected.fq

And as Genomax implied, all of my programs are sensitive to file name extensions. If you say "out=reads.fasta" you'll get fasta; if you say "out=reads.sam.gz", you'll get gzipped sam, and so forth.

**woodydon** · 06-23-2014, 06:15 PM

Originally posted by Brian Bushnell View Post

Woody,

bbmerge does not do error-correction; for that, you need to use the script "ecc.sh", e.g.

ecc.sh in=reads.fq out=corrected.fq

And as Genomax implied, all of my programs are sensitive to file name extensions. If you say "out=reads.fasta" you'll get fasta; if you say "out=reads.sam.gz", you'll get gzipped sam, and so forth.

I see~ thanks for GenoMax's and your explanations!

Woody

**woodydon** · 06-24-2014, 03:06 PM

Originally posted by Brian Bushnell View Post

Woody,

bbmerge does not do error-correction; for that, you need to use the script "ecc.sh", e.g.

ecc.sh in=reads.fq out=corrected.fq

And as Genomax implied, all of my programs are sensitive to file name extensions. If you say "out=reads.fasta" you'll get fasta; if you say "out=reads.sam.gz", you'll get gzipped sam, and so forth.

Hi Brain,

I used ecc.sh to correct my raw .fq files and then use bbmerge.sh again. I found the assemble rate is much lower:

Code:

Reads:          136095860
Joined:         13084199        9.614%
Ambiguous:      31319           0.023%
No Solution:    122968112       90.354%
Too Short:      12230           0.009%
Avg Insert:                     120.9
Standard Deviation:             12.9

Insert range:           26 - 138
90th percentile:        134
50th percentile:        123
10th percentile:        107

Without ecc.sh, the joined reads were about 30~40%. Now, it is only 9%...

Woody

**Brian Bushnell** · 06-24-2014, 03:20 PM

Woody,

That's very strange. The only explanation I can come up with is that the order of the reads is getting changed so read pairs are no longer always together.

If you ran ecc.sh twice on different files, that would make sense; unless you add the "ordered" flag, the reads will not necessarily come out in the same order. But it's best to correct all the reads together so that there is more coverage. So, assuming your reads are in two files (rather than interleaved), you should run it like this:

ecc.sh in1=read1.fq in2=read2.fq out1=corrected1.fq out2=corrected2.fq

And if your reads are interleaved in 1 file, you can add the "int=t" flag to ensure the reads are treated as interleaved. This is autodetected but the autodetection will fail if the read names are different than expected.

ecc.sh in=reads.fq out=corrected.fq int=t

If you do not think that was the problem, then please post the command line and standard out / standard error output of ecc. Thanks!

**woodydon** · 06-24-2014, 05:41 PM

My command was:

Code:

~/bbmap/ecc.sh threads=8 -Xmx14g in=~/reads/n_R1
.fastq out=~/reads/n_eccR1.fastq

Ecc.sh's log:

Code:

Settings:
threads:                8
k:                      31
deterministic:          false
toss error reads:       false
passes:                 1
bits per cell:          16
cells:                  3147.19M
hashes:                 3
prefilter bits:         2
prefilter cells:        13.56B
prefilter hashes:       2
base min quality:       5
kmer min prob:          0.5

target depth:           40
min depth:              6
max depth:              40
min good kmers:         15
depth percentile:       54.0
ignore dupe kmers:      true
fix spikes:             false

Changed from ASCII-33 to ASCII-64 on input quality 94 while prescanning.
Made prefilter:         hashes = 2       mem = 3.15 GB          cells = 13.54B 
       used = 3.739%
Made hash table:        hashes = 3       mem = 5.86 GB          cells = 3144.33M   
     used = 6.197%

Estimated kmers of depth 1-3:   203133322
Estimated kmers of depth 4+ :   54945575
Estimated unique kmers:         258078898

Table creation time:            1887.618 seconds.
Started output threads.
Table read time:                1510.451 seconds.       6757.71 kb/sec
Total reads in:                 136095860       100.000% Kept
Total bases in:                 10207189500     100.000% Kept
Error reads in:                 12059846        8.861%
Error type 1:                   3939140         2.894%
Error type 2:                   6350473         4.666%
Error type 3:                   9031666         6.636%

During Error Correction:
Errors Suspected:               31378616
Errors Corrected:               15385594
Errors Marked:                  0

Total kmers counted:            6085312047
Total unique kmer count:        408064601

Includes forward kmers only.
The unique kmer estimate can be more accurate than the unique count, if the tables are
 very full.
The most accurate value is the greater of the two.

Percent unique:                  6.71%
Depth average:                  14.91   (unique kmers)
Depth median:                   1       (unique kmers)
Depth standard deviation:       535.10  (unique kmers)

Depth average:                  6532.20 (all kmers)
Depth median:                   2141    (all kmers)
Depth standard deviation:       26317.70        (all kmers)

Approx. read depth median:      3568.33

Total time:                     3399.627 seconds.       3002.44 kb/sec

Hope it helps.

Woody

**Brian Bushnell** · 06-25-2014, 09:05 AM

Woody,

Yep, the problem was error-correcting the two files independently, which will reorder the reads on a multithreaded machine. So, this command will solve it:

ecc.sh threads=8 -Xmx14g in=~/reads/n_R1.fastq in2=~/reads/n_R2.fastq out=~/reads/n_eccR1.fastq out2=~/reads/n_eccR2.fastq

**salamay** · 07-09-2014, 11:48 AM

Hi Brian,

Could you try benchmarking against SeqPreP (https://github.com/jstjohn/SeqPrep)?

**Brian Bushnell** · 07-09-2014, 05:15 PM

salamay,

I will certainly add it to my list of things to do

**Elsie** · 07-15-2014, 03:24 PM

This is a very nice, very useful thread, thank you, and thank you Brian for the details on BBmerge. I have just started using PEAR so it was good to read these comments.

**zhangjiajie** · 10-06-2014, 04:53 AM

Hi,
I am one of the authors of PEAR. I must say this is a very unfair benchmark for PEAR. If you really want to show your program is better, please run PEAR with the default parameters like this:
pear -f r1.fq -r r2.fq -o outpear

The statistical test in an integrated part of PEAR, it makes no sense to switch it off. Also, you should not limit the minimal merge length in PEAR.

**Brian Bushnell** · 10-06-2014, 10:40 AM

I tried to make the comparison as fair as possible, but I will retest this with your suggested parameters and post the results sometime later this week.

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 48 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News