BBMap (aligner for DNA/RNAseq) is now open-source and available for download.

SNPsaurus replied

09-25-2018, 09:47 AM
Originally posted by juanita View Post

I am sorry if this question is very basic but I am getting a low percentage of mapping reads to the reference genome, about the 36% of the pct reads mapped. Any clue what this is the case?

I am using as the reference genome the genome in scaffolds and paired-end reads...

Have you trimmed adapters away from the reads (short fragments will create reads that are part genomic and part adapter and may not map). You could use the related BBmap tool sendsketch to get a sense of what is in your reads (after trimming). When we do genotyping of samples, many samples have contaminating species...so using sendsketch can help figure out what is in there. You can input the entire fastq file with sendsketch, or go to read mose and get a result on a per read basis.

You can also grab 100 reads, turn them into fasta format and do blastn with them (if online use the blastn rather than megablast option) and see read by read what is in there.

Other options...your sample is not highly related to the reference, the reference may be incomplete and missing regions, the reference is lacking high copy repeat content like mtDNA or chloroplast and many reads go to those.
Leave a comment:
juanita replied

09-25-2018, 08:01 AM
ref input for BBMap and paired ends

I am sorry if this question is very basic but I am getting a low percentage of mapping reads to the reference genome, about the 36% of the pct reads mapped. Any clue what this is the case?

I am using as the reference genome the genome in scaffolds and paired-end reads...
Leave a comment:
raw937 replied

09-07-2018, 09:24 AM
bbmap for demultiplexing dual barcodes.

Hello,
I need it if possible to use dual indexes.

For example: In bold dual barcode

#R1 read
@SOLEXA1_0069_FC:3:1:1673:948#ACAGTG/1
GACTAACCGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAATGTTAGCCGTCGGGCAGTATACTGTTCGG
+
BMMQNTWSWWb_____b_bb__________Y_________YYYYY[[[Y[__________XXRWXVVVVTYYYYYT

#R2 read
@SOLEXA1_0069_FC:3:1:1673:948#ACAGTG/2
CTGAAGGGTTGCGCTCGTTGCGGGACTTAACCCAACATCTCACGACACGAGCTGACGACAGCCATGCAGCACCTGT
+
ghgaggfghhhhhhhhhhghhhhhhhhhhfhhhghfWffch[hhgahhedffddR[^W^Zc^_cac[Wb]^W^

Here are 16 possible in the file I am working on.
TCAG-TCAG
CTGA-CTGA
TCAG-GACT
GACT-GACT
AGTC-AGTC
GACT-TCAG
GACT-AGTC
GACT-CTGA
TCAG-CTGA
AGTC-TCAG
AGTC-GACT
CTGA-AGTC
CTGA-GACT
AGTC-CTGA
TCAG-AGTC
CTGA-TCAG

The first four nts are the barcode like our example before would be:
GACT-CTGA_R1.fq
GACT-CTGA_R2.fq

But you would need both reads to tell you that it's GACT-CTGA and not something else.
What would the command look like for this? Does this demux script do the dual barcoding?
Leave a comment:
ellybelly replied

08-23-2018, 06:44 AM
bbmap aborts after mapping some reads

Hello Brian,

we are using bbmap to see in how far it is possible to quantify gene expression by mapping Illumina RNA-seq reads to the genome of a closely related species, e.g. map chimpanzee reads to human or as in this example Macaque reads.

To this end, we generated Macaque Illumina SE reads using flux-simulator and map them to
hg38 and for comparison we were also trying also Mmul8, downloaded from ensembl (wget ftp://ftp.ensembl.org/pub/release-92...toplevel.fa.gz).

Everything mapped fine to hg38, but not to Mmul8.

Exception in thread "Thread-12" java.lang.AssertionError
at align2.BBIndex.extendScore(BBIndex.java:2612)
at align2.BBIndex.slowWalk3(BBIndex.java:1389)
at align2.BBIndex.find(BBIndex.java:777)
at align2.BBIndex.find(BBIndex.java:623)
at align2.BBIndex.findAdvanced(BBIndex.java:400)
at align2.AbstractMapThread.quickMap(AbstractMapThread.java:750)
at align2.BBMapThread.processRead(BBMapThread.java:408)
at align2.AbstractMapThread.run(AbstractMapThread.java:508)

I tried to run on one thread, increased memory to 101G, removed small contigs of <100kb ... but the error message remains the same.

We are running a Debian system with java version "1.8.0_181" and have BBMap version 38.02 -- the detailed error output is in the attached file.

The false Mapping Rates of bbmap are so much better than for STAR & GSNAP, that we definitely want to use bbmap for our paper and we are nearly done all other species (marmoset, gorilla, chimpanzee and orangutan) and the simulations ran through -- the only missing piece is the mapping to the Mmul8.

Any help would be greatly appreciated.

Best, Ines
Attached Files

Mmul1.701837.txt (4.0 KB, 48 views)
Leave a comment:
JenBarb replied

08-15-2018, 09:45 AM
mkf argument in bbduk.sh (bbmap tool)

Hello,
I am trying to use the flag mkf (minkmerfraction) and I am getting an error that that argument does not exist.
sh /data/barbj/bbmap/bbduk.sh in=./../Stool_001-01.fastq outm=v2fstoolfq.fa literal=CTCAAACTTGGGTAATTAAACC k=17 mkf=0.8
java -Djava.library.path=/data/barbj/bbmap/jni/ -ea -Xmx39767m -Xms39767m -cp /data/barbj/bbmap/current/ jgi.BBDukF in=./../Stool_001-01.fastq outm=v2fstoolfq.fa literal=CTCAAACTTGGGTAATTAAACC k=17 mkf=0.8
Executing jgi.BBDukF [in=./../Stool_001-01.fastq, outm=v2fstoolfq.fa, literal=CTCAAACTTGGGTAATTAAACC, k=17, mkf=0.8]

Exception in thread "main" java.lang.RuntimeException: Unknown parameter mkf=0.8
at jgi.BBDukF.<init>(BBDukF.java:402)
any ideas why this is not working?

Jen
Leave a comment:
Meyana replied

08-14-2018, 09:05 PM
Hi,
Hoping somebody can help me with this.

I used BBMap and now I would like to extract the reads from by .bam file that are split (/chimeric?) ie. reads that indicate a deletion.

I tried to use samblaster, but it doesn't recognize any reads as split...
(samtools view -h in.bam | samblaster -a -s split.sam -o /dev/null)
Are the split reads marked differently in BBMap compared to other aligners causing samblaster to fail?

IGV shows a good amount of reads with deletions and I can also call deletions using BBTools callvariants.sh - so I know they are in there. I just have a feeling callvariants is calling fewer deletions and with lower coverage than what IGV suggests, so I want to check up on it.
Leave a comment:
JenBarb replied

08-09-2018, 07:09 AM
Thank you! Love the tool!
Leave a comment:
HESmith replied

08-09-2018, 06:08 AM
@JenBarb see this thread in Biostars.
Leave a comment:
JenBarb replied

08-09-2018, 05:56 AM
pull out sequences with matching primers

Hi Brian,
I was wondering if bbmap has a tool that will pull out reads matching a particular primer sequences? I have fastq files with amplicons from 12 different primers in the same file so i want to make subsets of the reads having specific primers of interest from this.

i have used your tool for other tasks so i figured I would ask if it also has this capability?

Thank you,
Jen
Leave a comment:
sunnycqcn replied

08-07-2018, 07:45 AM
Hello Brian,
After running mapPacBio.sh, how can I combine the sequence of the same ID?
for example I want to combine the sequences as following:
m151006_234406_42219_c100867912550000001823195203031665_s1_p0/110457/57769_70466 id=3_0_part_2_6
m151006_234406_42219_c100867912550000001823195203031665_s1_p0/110457/57769_70466 id=3_0_part_3

Thanks,
Fuyou
Leave a comment:

olgabot replied

07-25-2018, 04:24 PM

Add hg19 masked reference to distribution

Hello,
I'm using BBTools via bioconda and the corresponding quay.io docker container. The image has the necessary resources, e.g. the adapters fasta file:

Code:

(base) 
 Wed 25 Jul - 17:10  ~/code/tick-genome/reflow   origin ☊ master 9☀ 1● 
  docker run -it -v $PWD:/data quay.io/biocontainers/bbmap:38.06--2 bash
bash-4.2# find . -name adapters.fa
./usr/local/opt/bbmap-38.06/resources/adapters.fa
bash-4.2# cd ./usr/local/opt/bbmap-38.06/resources
bash-4.2# ll
bash: ll: command not found
bash-4.2# ls 
adapters.fa                          blacklist_silva_species_500.sketch   lambda.fa.gz                         nextera_LMP_linker.fa.gz             primes.txt.gz                        sequencing_artifacts.fa.gz
adapters_no_transposase.fa.gz        contents.txt                         lfpe.linker.fa.gz                    pJET1.2.fa                           remote_files.txt                     short.fa
blacklist_img_species_300.sketch     crelox.fa.gz                         mtst.fa                              phix174_ill.ref.fa.gz                remote_files_old.txt                 truseq.fa.gz
blacklist_nt_species_1000.sketch     favicon.ico                          nextera.fa.gz                        phix_adapters.fa.gz                  sample1.fq.gz                        truseq_rna.fa.gz
blacklist_refseq_species_250.sketch  kapatags.L40.fa                      nextera_LMP_adapter.fa.gz            polyA.fa.gz                          sample2.fq.gz

However, the removehuman.sh script uses a hardcoded path for the masked human genome posted in the RemoveHuman thread.

Code:

	local CMD="java -Djava.library.path=$NATIVELIBDIR $EA $z -cp $CP align2.BBMap minratio=0.9 maxindel=3 bwr=0.16 bw=12 quickmatch fast minhits=2 path=/global/projectb/sandbox/gaag/bbtools/hg19 pigz unpigz zl=6 qtrim=r trimq=10 untrim idtag usemodulo printunmappedcount usejni ztd=2 kfilter=25 maxsites=1 k=14 $@

Can the masked genome be included in the distribution?

Thank you!
Warmest,
Olga

Leave a comment:

StephCarr replied

07-18-2018, 06:08 AM
You're right. It wasn't downloaded correctly. At first I used git to download the bbmap package. But when I just downloaded with wget from https://sourceforge.net/projects/bbm...p_38.12.tar.gz everything was organized correctly.

Thanks for the help.
Leave a comment:
GenoMax replied

07-17-2018, 01:59 PM
Is your bbmap installed correctly? Have you moved any files around? I am able to run "bbfakereads.sh" and generate fastq and fasta files without a problem.
Leave a comment:
StephCarr replied

07-17-2018, 10:50 AM
Hello! Thanks for all of the wonderful bbmap scripts. Today I was was trying to use bbfakereads.sh but the script can not locate or open the jgi/FakeReads file. Any thoughts? I noticed a Fakereads files in the current/jgi directory. Do you think the path in the script is incorrect?

Thanks for your time and help!

bbfakereads.sh in=scaffolds.fasta out=fakePE_R1.fasta out2=fakePE.R2.fasta length=150
java -ea -Xmx600m -cp /media/bioinformaticprograms/BBMap/sh/current/ jgi.FakeReads in=scaffolds.fasta out=fakePE_R1.fasta out2=fakePE.R2.fasta length=150
Error: Could not find or load main class jgi.FakeReads
Leave a comment:
gaohanlisa replied

07-07-2018, 07:24 AM
I am trying to run BBmap on the cluster, but got the error below. Can anyone help me to solve the error?
Thanks,
[hgx080@quser10 DNA_all]$ /home/hgx080/bbmap/bbmap.sh ref=/projects/b1052/Wells_b1042/GaoHan/CANDO_RNA/assemble/final/idba/DNA-all/DNA-all-contig.fa in=/projects/b1052/Wells_b1042/GaoHan/CANDO_RNA/clean_reads/Wells02/filter_reads/RNA-Ac-1_S7_filter.fa out=RNA-Ac-1_S7_filter.test.sam minid=0.95 ambig=random reads=100000 -Xmx100g -eoom
java -Djava.library.path=/home/hgx080/bbmap/jni/ -ea -Xmx100g -cp /home/hgx080/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 ref=/projects/b1052/Wells_b1042/GaoHan/CANDO_RNA/assemble/final/idba/DNA-all/DNA-all-contig.fa in=/projects/b1052/Wells_b1042/GaoHan/CANDO_RNA/clean_reads/Wells02/filter_reads/RNA-Ac-1_S7_filter.fa out=RNA-Ac-1_S7_filter.test.sam minid=0.95 ambig=random reads=100000 -Xmx100g -eoom
Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, ref=/projects/b1052/Wells_b1042/GaoHan/CANDO_RNA/assemble/final/idba/DNA-all/DNA-all-contig.fa, in=/projects/b1052/Wells_b1042/GaoHan/CANDO_RNA/clean_reads/Wells02/filter_reads/RNA-Ac-1_S7_filter.fa, out=RNA-Ac-1_S7_filter.test.sam, minid=0.95, ambig=random, reads=100000, -Xmx100g, -eoom]
Version 38.11

Choosing a site randomly for ambiguous mappings.
Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.908
NOTE: Ignoring reference file because it already appears to have been processed.
NOTE: If you wish to regenerate the index, please manually delete ref/genome/1/summary.txt
Max reads: 100000
Set genome to 1

Exception in thread "Thread-0" java.lang.RuntimeException: java.lang.RuntimeException: java.io.EOFException: Unexpected end of ZLIB input stream
at align2.ChromLoadThread.run(ChromLoadThread.java:79)
Caused by: java.lang.RuntimeException: java.io.EOFException: Unexpected end of ZLIB input stream
at fileIO.ReadWrite.readObject(ReadWrite.java:806)
at fileIO.ReadWrite.read(ReadWrite.java:1246)
at dna.ChromosomeArray.read(ChromosomeArray.java:65)
at align2.ChromLoadThread.run(ChromLoadThread.java:76)
Caused by: java.io.EOFException: Unexpected end of ZLIB input stream
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117)
at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2620)
at java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:3031)
at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:3061)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1914)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
at fileIO.ReadWrite.readObject(ReadWrite.java:802)
... 3 more
Leave a comment:

Previous 1 3 4 5 6 7 8 9 16 34 template Next

Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM
Strategies for Sequencing Challenging Samples

by seqadmin

Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
- Channel: Articles
03-22-2024, 06:39 AM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News