BBMap (aligner for DNA/RNAseq) is now open-source and available for download.

Brian Bushnell replied

11-25-2016, 10:36 AM
BBMap does not have any explicit chimera identification capabilities; that would need to be done in post-processing. Also, it does not produce split alignments, but it does do a very good job of identifying long deletions, which it will capture in a single alignment.

You can run BBMap like this:

Code:

bbmap.sh in=reads.fq outm=normal.bam outu=unmapped.fq pairedonly ambig=all maxindel=400k

That will produce a bam file of the normally-mapped pairs. Then, mapping with a second pass:

Code:

bbmap.sh in=unmapped.fq out=odd.bam ambig=all local maxindel=400k

That will produce a file containing all of the unmapped and unpaired reads. I have not used Lumpy so I'm not sure if this is the best way to do it, but that sounds approximately like what it wants.
Leave a comment:
nicklisbon replied

11-25-2016, 06:11 AM
Good afternoon Brian,
In my lab we are dealing with a couple of thousands of whole genome seq. data. We are running some tests with BBtools and its performance looks really promising.
We would like to use Lumpy for structural variants call and this tool requires as inputs beside a "normal" BAM file also a bam file containing unpaired and split reads.
I have seen that it is possible to run BBmap in local alignment mode but is there any way to identify chimeras with BBmap?
Thanks in advance.
Leave a comment:
Brian Bushnell replied

11-23-2016, 01:34 PM
Seem my response in this thread: http://seqanswers.com/forums/showthr...313#post201313

But basically, yes. And no, you don't need to be in the /ref/ directory, you need to be in the directory containing the ref directory.

By default, there is only one alignment per read.
Leave a comment:
Brian Bushnell replied

11-18-2016, 09:51 AM
I will update the description; it's a bit misleading. What it means is, reads with average quality below the setting will not be mapped. All of the reads will still be printed to "out", even if they failed the quality threshold. However, if you use the "outm" and "outu" streams, only mapped reads will go to outm, and only unmapped reads will go to outu. So, maq would suppress low-quality reads from going to outm.

Thanks for noting that the long version of the flag does not work. It's actually "minavgquality"; I'll fix that.
Leave a comment:
ShellfishGene replied

11-18-2016, 03:02 AM
qtrim and maq in BBMap

Hi Brian,

I'm using BBMap to get reads that do not align to a genome. I want the output quality trimmed, and use

Code:

bbmap.sh in=raw_reads.fastq.gz outu=stdout.sam qin=auto qout=33 qtrim=lr maq=30 mintrimlength=30

The BBDuk docs state that the minimum average quality is checked after clipping. I assume this is the same in BBMap. However with this command I have output such as this:

Code:

1600:7:1:1413:2208 653 * 0 0 * * 0 0 AAATGAAGAGAAAGATTACAGGGAAGTAAC ##############################

The read should clearly be removed, I would think? Or does that work only for the mapped reads?
Also, is there an option to drop all reads that are clipped to below a certain length?

Btw, the option "minaveragequality" is not recognized, only the short version "maq". Not sure if that's intentional.

Cheers,

Till
Leave a comment:
darthsequencer replied

11-08-2016, 03:22 PM
Great - Thanks a bunch!
Leave a comment:
Brian Bushnell replied

11-08-2016, 03:19 PM
I just uploaded BBMap v36.59 which adds "coverage" and "metagenome" flags.

coverage=X will automatically set "reads" to a level that will give X average coverage (decimal point is allowed).

metagenome will assign each scaffold a random exponential variable, which decides the probability that a read be generated from that scaffold. So, if you concatenate together 20 bacterial genomes, you can run randomreads and get a metagenomic-like distribution. It could also be used for RNA-seq when using a transcriptome reference.

The coverage is decided on a per-reference-sequence level, so if a bacterial assembly has more than one contig, you may want to glue them together first with fuse.sh before concatenating them with the other references.
Leave a comment:
darthsequencer replied

11-08-2016, 12:03 PM
That would be awesome Brian! Most of the tools out there are bit wonky, slow, or have fallen to the wayside.

More of a nuts and bolts question but how do you handle fragmented reference genomes in a simulated metagenome? In the sense that one reference but is in multiple fasta files but will get assigned different coverages because most programs think each file is a different genome.
Leave a comment:
Brian Bushnell replied

11-08-2016, 10:43 AM
You can simulate a mixed dataset just fine that way, but unfortunately metagenomes tend to have an exponential distribution of coverage. So to really simulate a metagenome, you need to mimic that, which would mean (for now) running randomreads on each genome individually with an exponential-random coverage target, then combining all the outputs together.

I did write a program "mutate.sh" that I use in simulating metagenomes, which takes a genome fasta and produces a mutant version of it at a specified error rate; I use it to do things like verify that error-correction is not erasing strain differences in metagenomes.

Anyway, I think it will be pretty easy to modify randomreads for a metagenome mode, so I'll try to do that today. And a coverage (rather than #reads) threshold is pretty easy as well.
Leave a comment:
GenoMax replied

11-08-2016, 10:36 AM
Originally posted by Brian Bushnell View Post

Unfortunately, no to both. And you are correct, simulating metagenomes is a need we have at the JGI so I probably should have developed one by now!

I'll look into making a wrapper for the purpose; it shouldn't take very long.

Starting with a bunch of bacterial genomes in a file and using randomreads.sh should produce something acceptable, correct? I have used it in the past with a defined set of genomes to simulate a mixed dataset.
Leave a comment:
Brian Bushnell replied

11-08-2016, 10:25 AM
Originally posted by Carol_B View Post

Hi Brian,

I'm also struggling with the selection of unique alignments. I had previously used the option ambiguous=toss, but for some pair-end reads I only get the forward to be considered aligned. I saw your suggestions of removing the "ambiguos=toss" flag and later filter the alignment based on their mapq value. Nevertheless, I am curious on how to select a correct threshold for my filtering. I am using local=t alignment, is the maximum mapq score different when you use local as compared to global alignment? why do you suggest 4 as a good threshold?

Thanks in advance !!!

Carol

Hi Carol,

The definition of the mapq is the Phred-scaled (decibel) probability that the mapping is incorrect; q=10 means a 10% chance of being incorrect, q=20 means 1%, q=30 means 0.1%, etc. q=3 indicates 50% - so q=3 or below indicate that the read is at least 50% likely to have come from somewhere else, while q=4 or above means the mapping is probably correct. Therefore, reads that map equally well to multiple locations should be assigned a mapping score of 3 or less, and a mapping score of 4 or more indicates the read cannot map anywhere else as well as the indicated position.

The local alignments will give different scores than global alignments because mismatches toward the ends of the reads are ignored. In practice, though, there will not be much difference in most cases; mainly just near the ends of contigs, or when dealing with reads that are chimeric/low-quality/not-adapter-trimmed, etc.
Leave a comment:
Brian Bushnell replied

11-08-2016, 10:19 AM
Originally posted by darthsequencer View Post

Hi Brian,
Is there a way to set a desired coverage for randomreads.sh rather than the total number of reads?

On that note - I'm trying to simulate a metagenome with bbmap tools. Knowing BBMap there's probably a way to do that (especially since JGI has probably needed it too). So can BBmap simulate a metagenome (in terms of abundance profile) from a bunch of reference genomes?

Unfortunately, no to both. And you are correct, simulating metagenomes is a need we have at the JGI so I probably should have developed one by now!

I'll look into making a wrapper for the purpose; it shouldn't take very long.
Leave a comment:
darthsequencer replied

11-07-2016, 07:54 PM
Hi Brian,
Is there a way to set a desired coverage for randomreads.sh rather than the total number of reads?

On that note - I'm trying to simulate a metagenome with bbmap tools. Knowing BBMap there's probably a way to do that (especially since JGI has probably needed it too). So can BBmap simulate a metagenome (in terms of abundance profile) from a bunch of reference genomes?
Leave a comment:
Carol_B replied

10-15-2016, 03:33 AM
Originally posted by Brian Bushnell View Post

You can filter proper pairs only with the "po" flag; unpaired reads will then be classified as unmapped. You can use the 'outm' stream to just get mapped reads. You can also filter with samtools after the fact.

From these, you could filter only the pairs where at least one read is uniquely aligned (mapq>3 or some higher threshold for greater confidence), but this can't be done in BBMap currently; you could, however, do this with a custom script or with samtools.

Hi Brian,

I'm also struggling with the selection of unique alignments. I had previously used the option ambiguous=toss, but for some pair-end reads I only get the forward to be considered aligned. I saw your suggestions of removing the "ambiguos=toss" flag and later filter the alignment based on their mapq value. Nevertheless, I am curious on how to select a correct threshold for my filtering. I am using local=t alignment, is the maximum mapq score different when you use local as compared to global alignment? why do you suggest 4 as a good threshold?

Thanks in advance !!!

Carol
Leave a comment:
GenoMax replied

10-10-2016, 03:50 AM
Originally posted by sebl View Post

Hi,

I would like to calculate the average coverage of some genomes. I used:

Code:

bbmap.sh in1=reads_R1.fq in2=reads_R2.fq ref=scaffolds.fasta nodisk covstats=stats.txt covhist=histogram.txt

I get nice output files with coverage etc. for each scaffold, but the average for the whole genome is shown only in the terminal. How could I output it to a file, and preferably in a way that I can get tens of genomes coverages summarized in the same file?

Thanks!

Capture the standard out/error streams to a file (would depend on your shell but examples for bash are here) and you will have that information in a file.
Leave a comment:

Previous 1 14 21 22 23 24 25 26 27 34 template Next

Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM
Strategies for Sequencing Challenging Samples

by seqadmin

Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
- Channel: Articles
03-22-2024, 06:39 AM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News