Seqanswers Leaderboard Ad

**Brian Bushnell** · 03-09-2016, 06:50 PM

java.lang.AssertionError: Seems odd so I added this assertion. I don't see anywhere it was being used. Use -da flag to override.

Wow, that's not a very good error message

I will examine that carefully this weekend. Anyway, you can try running with the -da flag, which may fix things. It looks like I made some changes to the code related to processing custom read names since I designed that workflow. The -da flag will allow you to circumvent that crash, but I'm not certain whether it will have a different problem later (though that should become apparent quickly).

Alternatively, you can go here:
https://sourceforge.net/projects/bbmap/files/

and download this version:
BBMap_33.41_java7.tar.gz
...which was released closest in time to that post. I don't generally recommend using old versions, but the custom naming schema has changed substantially since then so I'm not sure whether the current state supports that mode. But I'll fix it if possible.

-Brian

**Illusive Man** · 03-09-2016, 11:26 PM

Originally posted by Brian Bushnell View Post

Wow, that's not a very good error message

I will examine that carefully this weekend. Anyway, you can try running with the -da flag, which may fix things. It looks like I made some changes to the code related to processing custom read names since I designed that workflow. The -da flag will allow you to circumvent that crash, but I'm not certain whether it will have a different problem later (though that should become apparent quickly).

Alternatively, you can go here:
https://sourceforge.net/projects/bbmap/files/

and download this version:
BBMap_33.41_java7.tar.gz
...which was released closest in time to that post. I don't generally recommend using old versions, but the custom naming schema has changed substantially since then so I'm not sure whether the current state supports that mode. But I'll fix it if possible.

-Brian

Hi Brian - Your da flag seemed to do the trick. One thing I noticed is that the majority of my read 1 data didn't seem to map very well to the reference archaea sequence (I picked one of the three genera in the phylum Thaumarchaeota - ammonia oxidizing archaea).

For instance the size of all the sequences in read 1 and 2 were about 250 MB a piece. The mapped Read 1s were about 10 MB compared to the read 2s which were 180 MB. I was wondering if that could be because of the reference file? If so is it possible to add multiple references? Or does the workflow have to be done for each reference I am interested i (each of the 3 genera)?

Despite most of my reads being unmapped, I was able to go in Geneious, import the merged by mapping output and demultiplex the dual-indexed samples as if they had been merged using the normal overlapping methods. So that is actually great news.

You did mention concatenate the merge by mapping and the merge by overlap sequences. How exactly do I do that? I know Geneious can do it but it asks me which order it should be in? Thoughts on this?

I think your tool would definitely be a huge asset in processing paired-end reads that do not overlap (ideally they would overlap, but there are obvious limitation in terms on sequence length for Illumina sequences (upper limit for the insert probably around ~550 bp with the v3 2X300 kits)). The workflow works. Anyhow I'd like to hear your thoughts on what I stated above.

Thanks again!!

**Brian Bushnell** · 03-10-2016, 08:06 PM

For concatenating, the order does not matter in this case. However, the results are strange. When mapping paired reads, BBMap displays the mapping and error rates for read 1 and read 2 independently; normally, the mapping rate is higher for read 1. Can you post the screen output of the mapping phase?

**Illusive Man** · 03-13-2016, 10:51 AM

Originally posted by Brian Bushnell View Post

For concatenating, the order does not matter in this case. However, the results are strange. When mapping paired reads, BBMap displays the mapping and error rates for read 1 and read 2 independently; normally, the mapping rate is higher for read 1. Can you post the screen output of the mapping phase?

Bubbas-Mac-Pro:bbmap bubba$ bash bbmap.sh -Xmx10g threads=4 ref=amoaref1.fasta in=all_reads.fastq outm=mapped_ref1.fq outu=unmapped.fq nodisk po int rbm don -da
java -Djava.library.path=/Users/bubba/Downloads/bbmap/jni/ -da -Xmx10g -cp /Users/bubba/Downloads/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 -Xmx10g threads=4 ref=amoaref1.fasta in=all_reads.fastq outm=mapped_ref1.fq outu=unmapped.fq nodisk po int rbm don -da
Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, -Xmx10g, threads=4, ref=amoaref1.fasta, in=all_reads.fastq, outm=mapped_ref1.fq, outu=unmapped.fq, nodisk, po, int, rbm, don, -da]

BBMap version 35.85
Set threads to 4
Set INTERLEAVED to true
Retaining first best site only for ambiguous mappings.
Executing dna.FastaToChromArrays2 [amoaref1.fasta, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=false, minscaf=1, midpad=300, startpad=8000, stoppad=8000, nodisk=true]

Set genScaffoldInfo=true
Set genome to 1

Loaded Reference: 0.005 seconds.
Loading index for chunk 1-1, build 1
Indexing threads started for block 0-1
Indexing threads finished for block 0-1
Generated Index: 0.539 seconds.
Analyzed Index: 4.642 seconds.
Started output stream: 0.057 seconds.
Started output stream: 0.049 seconds.
Cleared Memory: 0.154 seconds.
Processing reads in paired-ended mode.
Started read stream.
Started 4 mapping threads.
Detecting finished threads: 0, 1, 2, 3

------------------ Results ------------------

Genome: 1
Key Length: 13
Max Indel: 16000
Minimum Score Ratio: 0.56
Mapping Mode: normal
Reads Used: 424758 (125427725 bases)

Mapping: 323.766 seconds.
Reads/sec: 1311.93
kBases/sec: 387.40

Pairing data: pct reads num reads pct bases num bases

mated pairs: 0.0245% 52 0.0250% 31304
bad pairs: 0.0000% 0 0.0000% 0
insert size avg: 648.12

Read 1 data: pct reads num reads pct bases num bases

mapped: 0.0245% 52 0.0249% 15652
unambiguous: 0.0245% 52 0.0249% 15652
ambiguous: 0.0000% 0 0.0000% 0
low-Q discards: 0.0057% 12 0.0007% 420

perfect best site: 0.0000% 0 0.0000% 0
semiperfect site: 0.0000% 0 0.0000% 0
rescued: 0.0000% 0

Match Rate: NA NA 79.1619% 12392
Error Rate: 0.1757% 52 18.8450% 2950
Sub Rate: 0.1757% 52 18.8322% 2948
Del Rate: 0.0068% 2 0.0128% 2
Ins Rate: 0.0000% 0 0.0000% 0
N Rate: 0.1757% 52 1.9931% 312

Read 2 data: pct reads num reads pct bases num bases

mapped: 0.0245% 52 0.0250% 15634
unambiguous: 0.0245% 52 0.0250% 15634
ambiguous: 0.0000% 0 0.0000% 0
low-Q discards: 0.2067% 439 0.0245% 15365

perfect best site: 0.0000% 0 0.0000% 0
semiperfect site: 0.0000% 0 0.0000% 0
rescued: 0.0000% 0

Match Rate: NA NA 76.9157% 12025
Error Rate: 98.1132% 52 23.0459% 3603
Sub Rate: 98.1132% 52 23.0459% 3603
Del Rate: 0.0000% 0 0.0000% 0
Ins Rate: 0.0000% 0 0.0000% 0
N Rate: 11.3208% 6 0.0384% 6

Total time: 329.537 seconds.

**Illusive Man** · 03-16-2016, 01:54 PM

Brian or anyone else -

Can you tell me if I can use more than one reference sequence to map my reads too? There are a total of 3 possible genera for the ammonia oxidizing archaea.

**GenoMax** · 03-16-2016, 01:59 PM

Yes you can. Make a multi-fasta file with your references and create an index to align against or you may also be able to do ref="ref1.fa,ref2.fa,ref3.fa".

**Illusive Man** · 03-16-2016, 07:40 PM

Originally posted by GenoMax View Post

Yes you can. Make a multi-fasta file with your references and create an index to align against or you may also be able to do ref="ref1.fa,ref2.fa,ref3.fa".

What do you mean make an index?

Do you mean a file containing multiple reference sequences that looks like:

>seq1 TAAATGA

>seq2 CCGTTAAA

If that's what you meant then I'm set as I already ran that file.

I tried using ref="ref1.fa, ref2.fa" and it returned the following error:

Bobbys-Mac-Pro:bbmap twpierson$ bash bbmap.sh -Xmx10g threads=4 ref1="amoa_1.fas,amoa_2.fas" in=all_reads.fastq outm=mapped_ref_1.fq outu=unmapped.fq nodisk po int rbm don -da
java -Djava.library.path=/Users/tbobby/Downloads/bbmap/jni/ -da -Xmx10g -cp /Users/twpierson/Downloads/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 -Xmx10g threads=4 ref1=amoa_1.fas,amoa_2.fas in=all_reads.fastq outm=mapped_ref_1.fq outu=unmapped.fq nodisk po int rbm don -da
Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, -Xmx10g, threads=4, ref1=amoa_1.fas,amoa_2.fas, in=all_reads.fastq, outm=mapped_ref_1.fq, outu=unmapped.fq, nodisk, po, int, rbm, don, -da]

BBMap version 35.85
Set threads to 4
Exception in thread "main" java.lang.RuntimeException: Unknown parameter: ref1=amoa_1.fas,amoa_2.fas
at align2.AbstractMapper.parse(AbstractMapper.java:627)
at align2.AbstractMapper.<init>(AbstractMapper.java:51)
at align2.BBMap.<init>(BBMap.java:41)
at align2.BBMap.main(BBMap.java:29)

**Brian Bushnell** · 03-16-2016, 08:41 PM

Originally posted by Illusive Man View Post

What do you mean make an index?

Do you mean a file containing multiple reference sequences that looks like:

>seq1 TAAATGA

>seq2 CCGTTAAA

If that's what you meant then I'm set as I already ran that file.

I tried using ref="ref1.fa, ref2.fa" and it returned the following error:

Bobbys-Mac-Pro:bbmap twpierson$ bash bbmap.sh -Xmx10g threads=4 ref1="amoa_1.fas,amoa_2.fas" in=all_reads.fastq outm=mapped_ref_1.fq outu=unmapped.fq nodisk po int rbm don -da
java -Djava.library.path=/Users/tbobby/Downloads/bbmap/jni/ -da -Xmx10g -cp /Users/twpierson/Downloads/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 -Xmx10g threads=4 ref1=amoa_1.fas,amoa_2.fas in=all_reads.fastq outm=mapped_ref_1.fq outu=unmapped.fq nodisk po int rbm don -da
Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, -Xmx10g, threads=4, ref1=amoa_1.fas,amoa_2.fas, in=all_reads.fastq, outm=mapped_ref_1.fq, outu=unmapped.fq, nodisk, po, int, rbm, don, -da]

BBMap version 35.85
Set threads to 4
Exception in thread "main" java.lang.RuntimeException: Unknown parameter: ref1=amoa_1.fas,amoa_2.fas
at align2.AbstractMapper.parse(AbstractMapper.java:627)
at align2.AbstractMapper.<init>(AbstractMapper.java:51)
at align2.BBMap.<init>(BBMap.java:41)
at align2.BBMap.main(BBMap.java:29)

You had a slight syntax error there - "ref1=" should be "ref=". But, BBMap won't accept that format (although some other tools do). You have to first concatenate them:

cat amoa_1.fas amoa_2.fas > all.fasta

Then align:

bbmap.sh ref=all.fasta other parameters

BBSplit will allow comma-delimited references, though, but its usage syntax is a bit different.

There is definitely something wrong here, as only 52 read pairs of 424000 aligned to the reference, which is even lower than the first run. Hard to say what it is... did you expect most of the reads to align?

And have you BLASTed some of these reads to see what they are?

**Illusive Man** · 03-16-2016, 09:45 PM

Originally posted by Brian Bushnell View Post

You had a slight syntax error there - "ref1=" should be "ref=". But, BBMap won't accept that format (although some other tools do). You have to first concatenate them:

cat amoa_1.fas amoa_2.fas > all.fasta

Then align:

bbmap.sh ref=all.fasta other parameters

BBSplit will allow comma-delimited references, though, but its usage syntax is a bit different.

There is definitely something wrong here, as only 52 read pairs of 424000 aligned to the reference, which is even lower than the first run. Hard to say what it is... did you expect most of the reads to align?

And have you BLASTed some of these reads to see what they are?

I have tried the second command you listed:

bash bbmap.sh ref=all.fasta other parameters
Max memory cannot be determined. Attempting to use 3200 MB.
If this fails, please add the -Xmx flag (e.g. -Xmx24g) to your command,
or run this program qsubbed or from a qlogin session on Genepool, or set ulimit to an appropriate value.
java -Djava.library.path=/Users/twpierson/Downloads/bbmap/jni/ -ea -Xmx3200m -cp /Users/twpierson/Downloads/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 ref=all.fasta other parameters
Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, ref=all.fasta, other, parameters]

BBMap version 35.85
Exception in thread "main" java.lang.RuntimeException: Unknown parameter: other
at align2.AbstractMapper.parse(AbstractMapper.java:627)
at align2.AbstractMapper.<init>(AbstractMapper.java:51)
at align2.BBMap.<init>(BBMap.java:41)
at align2.BBMap.main(BBMap.java:29)

I have attempted to blast the reads that mapped and they appear correct...mostly archaea amoA gene clones. This is actually a friends sequencing data. Maybe the issue has something to do with the primers amplifying more than archaea. I'm not exactly sure myself.

Thanks again guys!

**GenoMax** · 03-17-2016, 03:54 AM

@Illusive Man: A multi fasta format file looks like this

Code:

>Genome_1
ACGATCTAGC
>Genome_2
ACGCCTAGCTAGCGCTA
>Genome_3
CGCTCGATCGATCGA

You get the idea.

cat command @Brian provided combined your genomes to make a single genomes file in multi-fasta format.

As for the other command you literally tried to run what @Brian wrote. What he meant was

Code:

$ bash bbmap.sh ref=all.fasta [B][COLOR="Red"]other parameters[/COLOR][/B]

Replace Other parameters with BBMap optional parameters you want to use to run the alignment.

**Illusive Man** · 03-17-2016, 06:55 PM

Originally posted by GenoMax View Post

@Illusive Man: A multi fasta format file looks like this

Code:

>Genome_1
ACGATCTAGC
>Genome_2
ACGCCTAGCTAGCGCTA
>Genome_3
CGCTCGATCGATCGA

You get the idea.

cat command @Brian provided combined your genomes to make a single genomes file in multi-fasta format.

As for the other command you literally tried to run what @Brian wrote. What he meant was

Code:

$ bash bbmap.sh ref=all.fasta [B][COLOR="Red"]other parameters[/COLOR][/B]

Replace Other parameters with BBMap optional parameters you want to use to run the alignment.

I think I did exactly what you guys stated in the earlier scripts I posted. My reference file was already a multi-fasta file containing the three sequences concatenated. When I performed the alignment my script looked something like this...

bash bbmap.sh -Xmx10g threads=4 ref=amoaref2.fasta in=all_reads.fastq outm=mapped.fq outu=unmapped.fq nodisk po int rbm don -da

So I am fairly certain we are saying the same thing. That still leaves the question of why so few of read1 is aligning to the reference sequences. Perhaps I need to add some of the unclassified sequences in the archaea amoa NCBI database to my reference multi-fasta file (amoaref2.fasta).

Thanks!

**Brian Bushnell** · 03-17-2016, 09:19 PM

Your command:

bash bbmap.sh -Xmx10g threads=4 ref=amoaref1.fasta in=all_reads.fastq outm=mapped_ref1.fq outu=unmapped.fq nodisk po int rbm don -da

...is syntactically correct.

It looks like everything is OK except for the low mapping rate. Note that both read 1 and read 2 mapped at similar (low) rates in all cases, so basically, they just don't match the reference. And they don't match it equally badly.

**GenoMax** · 03-18-2016, 02:53 AM

Originally posted by Illusive Man View Post

That still leaves the question of why so few of read1 is aligning to the reference sequences. Perhaps I need to add some of the unclassified sequences in the archaea amoa NCBI database to my reference multi-fasta file (amoaref2.fasta).

Thanks!

Perhaps. But you should also check a sample of reads that do not align (you can collect those easily with BBMap) and check them with Blast @NCBI. That would rule out sample contamination.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News