BBMap (aligner for DNA/RNAseq) is now open-source and available for download.

lmusgrove replied

09-22-2021, 05:39 PM
Finding mapped rate for rpkm output

Hi Brian,

I'm running BBMap with the rpkm output option and would like to know how to see mapped rate for each read file. I run multiple files consecutively with nohup. Here's my code:

bbmap.sh ref=data/Assembly.fasta \
in1=data/clean/A_1.clean.fq.gz \
in2=data/clean/A_2.clean.fq.gz \
rpkm=data/fpkm/A.fpkm \
t=5 &

bbmap.sh ref=data/Assembly.fasta \
in1=data/clean/B_1.clean.fq.gz \
in2=data/clean/B_2.clean.fq.gz \
rpkm=data/fpkm/B.fpkm \
t=5

etc etc

So far it's worked really well. However, while the stdout file shows the mapped rates it doesn't tell me which read files relate to which stats. It just has a number of repeated --- Results 1 ---- records and I don't know which is which.

Is there a flag I need to add to ensure I can see which stats are for which files?

Thanks!

Lisa
Leave a comment:
mewu3 replied

09-14-2021, 11:45 PM
pileup.sh explication

Hello,

Can some please kindly explain the output file of pileup.sh ?
basecov

bincove

covstat

How the coverage is calculated ?
Leave a comment:
mewu3 replied

09-14-2021, 08:18 AM
Originally posted by HESmith View Post

"Exception in thread "main" java.lang.AssertionError: Attempting to output paired reads to different sam files."

Typically, BBMap tools keep paired reads together. You're attempting to write aligned and unaligned reads to separate files, which violates that function.

Thank you !
Leave a comment:
GenoMax replied

09-11-2021, 04:40 AM
@mewu3: Since paired-end reads are aligned together you should use a single "out=output.sam". If you wanted to capture unmapped reads into separate files then you would want to do that as "outu1=R1.unmapped.fq outu2=R2.unmapped.fq".

You may be able to write them out as an unmapped sam file "outu=unmapped.sam" but then again you should use only one output for that. This is untested.
Leave a comment:
kesmarl replied

09-10-2021, 09:27 AM
Thank you very so much !

I downloaded it
Leave a comment:
HESmith replied

09-10-2021, 08:04 AM
"Exception in thread "main" java.lang.AssertionError: Attempting to output paired reads to different sam files."

Typically, BBMap tools keep paired reads together. You're attempting to write aligned and unaligned reads to separate files, which violates that function.
Leave a comment:
mewu3 replied

09-10-2021, 12:13 AM
[help]

Hello,

I am using bbmap on HPC and I get the fallowing message :

Aligning C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001 reads to fasta file ...
java -Djava.library.path=/opt/apps/bbtools-37.97/jni/ -ea -Xmx50G -cp /opt/apps/bbtools-37.97/current/ align2.BBWrap build=1 overwrite=true fastareadlen=500 build=1 in1=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_R1_trim.fastq.gz,C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_se_trim.fastq.gz in2=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_R2_trim.fastq.gz,null trimreaddescriptions=t outm=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_aligned.sam outu1=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_R1.sam,C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_se.sam outu2=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_R2.sam threads=8 pairlen=1000 pairedonly=t minid=0.9 mdtag=t xstag=fs nmtag=t sam=1.3 ambiguous=best secondary=t saa=f maxsites=10 -Xmx50G
Executing align2.BBWrap [build=1, overwrite=true, fastareadlen=500, build=1, in1=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_R1_trim.fastq.gz,C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_se_trim.fastq.gz, in2=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_R2_trim.fastq.gz,null, trimreaddescriptions=t, outm=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_aligned.sam, outu1=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_R1.sam,C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_se.sam, outu2=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_R2.sam, threads=8, pairlen=1000, pairedonly=t, minid=0.9, mdtag=t, xstag=fs, nmtag=t, sam=1.3, ambiguous=best, secondary=t, saa=f, maxsites=10, -Xmx50G]

Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, build=1, trimreaddescriptions=t, threads=8, pairlen=1000, pairedonly=t, minid=0.9, mdtag=t, xstag=fs, nmtag=t, sam=1.3, ambiguous=best, secondary=t, saa=f, maxsites=10, in=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_R1_trim.fastq.gz, outu=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_R1.sam, outm=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_aligned.sam, in2=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_R2_trim.fastq.gz, outu2=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_R2.sam]
Version 37.97 [build=1, overwrite=true, fastareadlen=500, build=1, trimreaddescriptions=t, threads=8, pairlen=1000, pairedonly=t, minid=0.9, mdtag=t, xstag=fs, nmtag=t, sam=1.3, ambiguous=best, secondary=t, saa=f, maxsites=10, in=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_R1_trim.fastq.gz, outu=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_R1.sam, outm=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_aligned.sam, in2=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_R2_trim.fastq.gz, outu2=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_R2.sam]

Set threads to 8
Retaining first best site only for ambiguous mappings.
Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.816
Set genome to 1

Loaded Reference: 3.463 seconds.
Loading index for chunk 1-1, build 1
Generated Index: 2.556 seconds.
Analyzed Index: 7.028 seconds.
Started output stream: 0.043 seconds.
Exception in thread "main" java.lang.AssertionError: Attempting to output paired reads to different sam files.
at stream.ReadStreamWriter.<init>(ReadStreamWriter.java:51)
at stream.ReadStreamByteWriter.<init>(ReadStreamByteWriter.java:17)
at stream.ConcurrentGenericReadOutputStream.<init>(ConcurrentGenericReadOutputStream.java:40)
at stream.ConcurrentReadOutputStream.getStream(ConcurrentReadOutputStream.java:52)
at stream.ConcurrentReadOutputStream.getStream(ConcurrentReadOutputStream.java:29)
at align2.AbstractMapper.openStreams(AbstractMapper.java:873)
at align2.BBMap.testSpeed(BBMap.java:437)
at align2.BBMap.main(BBMap.java:34)
at align2.BBWrap.execute(BBWrap.java:144)
at align2.BBWrap.main(BBWrap.java:22)

I get the impression that bbmap is stuck on something and i don't know what's wrong with it. Please help !

mewu3
Leave a comment:
GenoMax replied

09-07-2021, 08:30 AM
@mewu3: Unfortunately it can't. You will need to restart the job.

Originally posted by mewu3 View Post

Hello, Brian,

I am wondering whether bbmap.sh could resume an unfished job.

mewu3
Leave a comment:
mewu3 replied

09-06-2021, 12:14 AM
bbmap.sh unfished job

Hello, Brian,

I am wondering whether bbmap.sh could resume an unfished job.

mewu3
Leave a comment:
GenoMax replied

05-12-2021, 05:10 AM
Originally posted by pck0 View Post

Hey, did a little test run mapping sequences against a reference fasta that contains two identical sequences called >one and >two, then repeated the process with the sequences renamed to >three and >four. The amount of reads mapping to each were as follows:

1st run:

>one: 47,699
>two: 330

2nd run:

>three: 47,688
>four: 338

BBmap options were all default, just minidentity = 90 and T=12

How come that (1) BBmap apparently misses some reads that map on the first sequence and then maps them on the second, identical sequence, and (2) how come the runs give different results?

Just curious, my apologies if this was addressed elsewhere!

cheers

Most NGS aligners are non-deterministic i.e. they will not produce exactly identical results if run multiple times.

Fortunately, BBMap does have an option to run in deterministic mode.

Code:

deterministic=f Run in deterministic mode. In this case it is good to set averagepairdist. BBMap is deterministic without this flag if using single-ended reads, or run singlethreaded.

You could also run the analysis using just a single thread.
Leave a comment:
pck0 replied

05-11-2021, 12:03 PM
Is mapping stochastic?

Hey, did a little test run mapping sequences against a reference fasta that contains two identical sequences called >one and >two, then repeated the process with the sequences renamed to >three and >four. The amount of reads mapping to each were as follows:

1st run:

>one: 47,699
>two: 330

2nd run:

>three: 47,688
>four: 338

BBmap options were all default, just minidentity = 90 and T=12

How come that (1) BBmap apparently misses some reads that map on the first sequence and then maps them on the second, identical sequence, and (2) how come the runs give different results?

Just curious, my apologies if this was addressed elsewhere!

cheers
Leave a comment:
OwenLeiser replied

03-02-2021, 12:42 PM
mapPacBio strange behavior

Hi there, I really like this tool for Illumina reads and am trying it on some PacBio/Nanopore reads using the mapPacBio.sh function.

I am "successfully" getting aligned bam files out the other end, but the log file lists >99% of reads as Low-Q Discards. I also get unmapped fastq files out (I do this for my own sanity) and these are usually larger than the input files themselves.

Should I be suspicious of this behavior? I have also tried using the low-quality data suggestion (setting the key to a lower value, adjusting the minimum score ratio, etc), but this did not result in any improvement. I know the input files themselves - subread fastq.gz - are "good enough" for Canu assembly but for I'd also like to align the reads for consensus sequence generation.

Thanks!
Leave a comment:
GenoMax replied

12-12-2019, 09:36 AM
@Robinsleith: I suggest that you post this as a ticket on BBMap SF page. Brian no longer visits SeqAnswers regularly. He would be the only person who can answer this.
Leave a comment:
robinsleith replied

12-12-2019, 08:08 AM
Change in minid implementation?

Hello, I recently noticed that newer versions >=38.68 do not seem to implement minid in the same way as previous versions.

For the same command (same minid, 99), same reads, same reference, with versions <38.68 I get 0 mated pairs mapped, 4 Read 1 mapped, and 6 Read 2 mapped.

Code:

bbmap.sh nodisk minid=99 mappedonly=T threads=4 ref=parcu18S.fasta in=LKH421_FPE_q24_minlen100.fastq.gz in2=LKH421_RPE_q24_minlen100.fastq.gz out=stdout.sam | reformat.sh in=stdin.sam out=stdout.sam minlength=80 | samtools view -h -b -S | samtools sort >67_99parcu.bam

For versions >=38.68 I get 94 mated pairs mapped, 94 Read 1 reads mapped, and 94 Read 2 reads mapped.

Is the minid command just not working in newer versions or is there a new detail I am missing?
Thanks!
Leave a comment:
SNPsaurus replied

12-09-2019, 09:14 PM
Did you try the suggested "Please manually set qin=33 or qin=64"?
Leave a comment:

Previous 1 2 3 4 5 6 13 34 template Next

Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM
Strategies for Sequencing Challenging Samples

by seqadmin

Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
- Channel: Articles
03-22-2024, 06:39 AM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News