Seqanswers Leaderboard Ad

**HGV** · 09-17-2014, 01:16 PM

pairlen= broken in 33.41

Hi Brian

I was using pairlen=1200 to limit insert sizes for paired end mapping and with the newest version 33.41 bbmap reports unkown option, while in 33.40b it used to work. What has happened?

Cheers
Harald

**Brian Bushnell** · 09-17-2014, 01:53 PM

Harald,

Thanks for noting that! I accidentally deleted the line that parsed that command. I've uploaded the fixed version, 33.42.

On another topic, I'd like to announce that a second developer, Jon Rood, has begun porting certain aspects of BBTools over to C using JNI calls. The currently ported classes are BBMerge, BBMap, and Dedupe. This is optional (you can enable it at runtime with the 'usejni' flag), and the output is identical, but there is a substantial speedup:
BBMap: +30%
BBMerge: +60%
Dedupe: up to +200% (when allowing an edit distance)

If you are interested in a free speed increase, instructions for compiling the C code for OS X or Linux are in /bbmap/jni/README.txt

**kbseah** · 09-22-2014, 08:00 AM

Hello,

I hope this is the right place to post questions related to BBMap, but since the last reply wasn't too long ago....

I've been using BBMap to map paired-end Fastq reads where the headers have been renamed for downstream analysis ("_1" and "_2" have been added to header names for forward and reverse reads respectively). When I look at the SAM file from the mapping, the forward and reverse reads that map have nonzero POS field, but the PNEXT fields are always zero. Is this caused by my editing the read names? Bowtie2 doesn't have the same problem, and assembling with SPAdes and IDBA-UD worked normally with the edited read names.

Example of the SAM entries for a read pair:

HWI-ST863:279:H03F7ADXX:1:1101:7656:2184_1 16 NODE_207_length_5463_cov_10917.5_ID_413 125 44 13=1X137= * 0 0 [...] [...] NM:i:1 AM:i:44
HWI-ST863:279:H03F7ADXX:1:1101:7656:2184_2 0 NODE_207_length_5463_cov_10917.5_ID_413 5275 45 151= * 0 0 [...] [...] NM:i:0 AM:i:45

Thanks a lot in advance for your help.
Brandon

**Brian Bushnell** · 09-22-2014, 09:38 AM

Brandon,

Those were not recognized as paired. BBMap recognizes only the normal Illumina naming schemes:

"* /1"
and
"* 1:"

If those reads are interleaved in a single file, use the "int=t" flag which will force BBMap to recognized them as being interleaved.

**HGV** · 09-22-2014, 10:47 AM

bbmap hitstats - unambiguous Hits

Hi Brian
I was looking at the hitstats files and I realized that the %unambiguousReads
can add up to more than 100%, and similarly the unambiguousReads count can be higher than the total input reads... How can that be?

Cheers
Harald

**Brian Bushnell** · 09-22-2014, 11:26 AM

Harald,

Thanks for noticing that. It works correctly for single-ended reads, but it appears that improper pairs (where one read maps to one scaffold, and the other maps to a different scaffold) are double-incrementing the counts on both scaffolds. I'll fix that in the next release.

-Brian

**kbseah** · 09-22-2014, 12:14 PM

Thanks for the quick reply, Brian!

**Brian Bushnell** · 09-22-2014, 03:55 PM

Originally posted by kbseah View Post

Thanks for the quick reply, Brian!

You're welcome!

Originally posted by HGV View Post

Hi Brian
I was looking at the hitstats files and I realized that the %unambiguousReads
can add up to more than 100%, and similarly the unambiguousReads count can be higher than the total input reads...

Fixed now, as of v33.46.

**andrej-gnip** · 09-24-2014, 09:32 AM

Hi,

I'm developing a tool for analysis of sequence reads from viral genetic material, and mapping to reference viral genomes is part of the process. First version used bowtie2, but now I'm trying to make a new version with better user flexibility, more options etc. And I'd like to use bbmap, as it seems to be the best for this purpose. The user manual to bbmap seems more straightforward, and also people I talked to who used both mappers told me bbmap is generally better.

Now, I did some tests, e.g. I ran bbmap on made up sequences to see how it would perform. I created a fastq file with one read, and two fasta files as references. One fasta file had only one sequence similar to the read, while the other fasta file had 4 such sequences (one of which was the same as the one in the other file, and this one was most similar). Naturally, the read was always mapped to the sequence with highest similarity. The mapping quality was the same in the 2 cases. According to that, I can say that mapping quality is completely independent from other sequences in the reference, it only depends on the read and the particular reference sequence to which it was mapped. I'm not sure, though, so I wanted to ask whether this assumption is actually true. Thanks a lot.

**Brian Bushnell** · 09-24-2014, 09:59 AM

The mapping quality is dependent on the other reference sequences, but only if they are within some threshold of similarity (roughly 6 edits) to the best site. If you copy the same reference sequence twice in the fasta file, the read will map ambiguously and get a score of 3 or less. If you add one or two edits to one copy, the read will map to the unedited one but get a reduced score. But if they differ by, say, 10 edits, then the best mapping location will not get any score penalty. The penalty is also influenced by the number of alternative sites; for example, if there are 5 sites that are each 2 edits worse than the best site, that will give a greater score penalty than if there is only one alternative site.

**kbseah** · 10-06-2014, 07:47 AM

pileup.sh inconsistent with samtools pileup

Hi Brian,

I tried using the "fastaorf" function in pileup.sh, to look at the read depth for a bunch of ORFs predicted with Prodigal. The input is in Prodigal's output format as specified. However, I get per-orf coverage results (the depthSum field) that are inconsistent with the output from samtools mpileup.

Briefly: I produced a pileup file with samtools mpileup and for each orf, simply summed the read depth (4th column of the pileup file) for each position that falls within the orf.
I double-checked this by converting the Prodigal output to a BED feature table, and used bedtools multicov and the original BAM file to produce a per-feature read depth. This gives per-feature read depths which are not identical to what I got by summing the depths but roughly a multiple (i.e. plotting the depths per orf from both methods against each other gives an approximately linear relationship).
The output from pileup.sh, on the other hand, doesn't give anything close to a linear relationship.

Is there something different in how pileup.sh calculates the per-orf coverage?

Thanks a lot,
Brandon

**Brian Bushnell** · 10-06-2014, 09:11 AM

Brandon,

I did introduce a bug recently when I added support for tracking only read start positions rather than total coverage, which manifested in some situations. It's fixed now and I just uploaded the fixed version (33.57). Would you mind downloading that and confirming whether it works correctly?

Thanks!

**kbseah** · 10-07-2014, 12:10 AM

Hi Brian,

Thanks for your reply. Unfortunately the output seems to be the same. Should I send you the output from pileup.sh vs. the output from bedtools so that you can see what I mean?

Best,
Brandon

Originally posted by Brian Bushnell View Post

Brandon,

I did introduce a bug recently when I added support for tracking only read start positions rather than total coverage, which manifested in some situations. It's fixed now and I just uploaded the fixed version (33.57). Would you mind downloading that and confirming whether it works correctly?

Thanks!

**Brian Bushnell** · 10-07-2014, 08:36 AM

Originally posted by kbseah View Post

Hi Brian,

Thanks for your reply. Unfortunately the output seems to be the same. Should I send you the output from pileup.sh vs. the output from bedtools so that you can see what I mean?

Best,
Brandon

Yes, please do, as well as the command line and stdout/stderr messages.

**holmrenser** · 10-20-2014, 05:38 AM

Hi Brian,

I would be interested in knowing when you intend to publish BBmap in a paper, can you enlighten us?

Cheers

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 50 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News