BBMap (aligner for DNA/RNAseq) is now open-source and available for download.

susanklein replied

04-12-2018, 09:22 PM
Originally posted by GenoMax View Post

Since you are using ambig=all you are going to get multi-mappers with their MAPQ being set to a small value (3). You could post filter your sam file or set ambig=toss (or ambig=random/best to get only one alignment).

ambig=best worked, Thanks.

S.
Leave a comment:
GenoMax replied

04-12-2018, 03:38 AM
Since you are using ambig=all you are going to get multi-mappers with their MAPQ being set to a small value (3). You could post filter your sam file or set ambig=toss (or ambig=random/best to get only one alignment).
Leave a comment:
susanklein replied

04-11-2018, 07:40 PM
bbmap SAM output question

Hi,

I don't quite understand the SAM output of bbmap.sh

here is my line:

bbmap.sh ref=ref.fasta ambig=all nodisk=t nzo=t mappedonly=t outu=unmapped.fasta scafstats=stats.map in=reads.fastq out=mapped.sam

I usually just use the stats file to get read counts per feature using a ref with features as a multifasta, but I need to annotate from a .gff3 file using a draft genome ref and the only way I can see to do that is from the sam (to get the mapping coordinate. However, the sam seems to have low quality mappings too (MAPQ=3)..

What is the threshold actually used for mapping quality? I can see that in the options, and how can I get the sam to only contain the mapped reads above that threshold?

Thanks,

S.
Leave a comment:
GenoMax replied

04-09-2018, 11:54 AM
Originally posted by Coyk View Post

Hello,
I've been trying to get any basic command to work on my windows 7 machine. I've installed bbmap in the C:\Users\me directory. My .fastq.gz files are in the bbmap folder. When I type the command
C:\Users\me>java -ea -Xmx1g -cp \bbmap\current\jgi.BBDukF in1=inputR1.fastq.gz in2=inputR2.fastq.gz out1=test1.fastq out2=test2.fastq ktrim=r mink=8 k=15

I get "could not find or load main class in1=inputR1.fastq.gz"

I've messed with the type of slash, putting the infiles in the current directory, and taking away parts of the directory names. I even truncated the argument after -cp to just jgi.BBDukF, which I know is wrong, and it still wants the main class of in1. I don't know a thing about java and when I read instructions about paths, classpaths, path environments, etc, I get confused quickly. When I type "echo %CLASSPATH% into the command prompt, I get CLASSPATH back so that's not set. I tried using Git Bash but when I typed "bbduk.sh" I got "command not found" so I'm stuck, but I would really like to use this tool on my windows machine. Anyone willing to give me very basic instructions on the exact commands that get bbduk to run on a windows machine?

Try the following. Make sure you change /path/to/ to a real value on your computer.

Code:

java -ea -Xmx200m -cp /path/to/bbmap/current/ jgi.BBDukF in=reads.fq out=processed.fq
Leave a comment:
GenoMax replied

04-09-2018, 11:41 AM
Originally posted by JenBarb View Post

Hello,
Any suggestions as to why filterbyname is not pulling out reads that I know are there.

Help??

Jen

Are you including the entire fastq/fasta header in your retrieval command?
Leave a comment:
JenBarb replied

04-09-2018, 10:00 AM
filterbyname.sh is not pulling out reads

Hello,
Any suggestions as to why filterbyname is not pulling out reads that I know are there.

Help??

Jen
Leave a comment:
Coyk replied

04-09-2018, 09:16 AM
basic java windows error help

Hello,
I've been trying to get any basic command to work on my windows 7 machine. I've installed bbmap in the C:\Users\me directory. My .fastq.gz files are in the bbmap folder. When I type the command
C:\Users\me>java -ea -Xmx1g -cp \bbmap\current\jgi.BBDukF in1=inputR1.fastq.gz in2=inputR2.fastq.gz out1=test1.fastq out2=test2.fastq ktrim=r mink=8 k=15

I get "could not find or load main class in1=inputR1.fastq.gz"

I've messed with the type of slash, putting the infiles in the current directory, and taking away parts of the directory names. I even truncated the argument after -cp to just jgi.BBDukF, which I know is wrong, and it still wants the main class of in1. I don't know a thing about java and when I read instructions about paths, classpaths, path environments, etc, I get confused quickly. When I type "echo %CLASSPATH% into the command prompt, I get CLASSPATH back so that's not set. I tried using Git Bash but when I typed "bbduk.sh" I got "command not found" so I'm stuck, but I would really like to use this tool on my windows machine. Anyone willing to give me very basic instructions on the exact commands that get bbduk to run on a windows machine?
Leave a comment:
kevin199011 replied

04-07-2018, 10:17 AM
Hi Brian, can BBduk deal with trimming adapter sequence only but retain other bases? Forexample:
SEQUENCEADAPTERSEQUENCE; trimming adapter leaves: SEQUENCESEQUENCE?

Thank you!

Kevin
Leave a comment:
GenoMax replied

04-07-2018, 06:20 AM
Originally posted by luc View Post

Convenient way to avoid re-indexing references?

When working with big genomes one would really like to avoid re-indexing reference sequences. BBmap is always looking for the reference indices in the working directory and not in the directory where the reference resides. Is there any way to point BBMap to the index? When using "path=" it seems to automatically re-index i this location.
Thanks in advance.

Index the genome independently by doing

Code:

bbmap.sh ref=your_genome.fa

This will produce the index in the local directory. There will be a top level "ref" directory with several things in it. Don't worry about the exact organization/names of files in it. This is the pre-made index.

When you want to do the alignments instead of "ref=" you pass "path=/path_to_dir_containing_ref_dir". That should use the pre-made index.
Last edited by GenoMax; 04-07-2018, 09:37 AM.
Leave a comment:
GenoMax replied

04-07-2018, 06:17 AM
Originally posted by bwubb View Post

Greetings,

First time bbmap user and Im having some odd compatibility issues with certain tools using a bbmap generated bam. I do not think I did anything particularly odd in my bam workflow:

Left trim and right trim using bbduk | bbmerge | bbmap | samtools addreplacerg

From this I was able to successfully make variant calls and generate some metrics using GATK/Picard. However bedtools was reporting zero coverage at all intervals and I am unable to view reads restricting to regions using samtools.

Code:

[bwubb@node063 GRCh37]$ samtools view 4721.ready.bam 1 [main_samview] region "1" specifies an unknown reference name. Continue anyway. [bwubb@node063 GRCh37]$ samtools view -H 4721.ready.bam | head -5 @HD VN:1.4 SO:coordinate @SQ SN:1 dna:chromosome chromosome:GRCh37:1:1:249250621:1 LN:249250621 @SQ SN:2 dna:chromosome chromosome:GRCh37:2:1:243199373:1 LN:243199373 @SQ SN:3 dna:chromosome chromosome:GRCh37:3:1:198022430:1 LN:198022430 @SQ SN:4 dna:chromosome chromosome:GRCh37:4:1:191154276:1 LN:191154276

Are they SN incorrect? Nothing looks really unusual. I guess the first line of my fastq does look like

Code:

>1 dna:chromosome chromosome:GRCh37:1:1:249250621:1

If this is the issue can I "correct" the bbmap index or should I just modify the fasta? Thanks.

-bwubb

BBMap is one of the aligners that uses the full header present in your fasta file when creating the index and passes it along to alignment file. If there are spaces in the header name they are written to alignment. Some downstream programs have a problem with this.

You can use the option "trd=t" to truncate the fasta header names after the first space in the name. There is an option for reformat.sh that can do this after the fact for aligned data. I can look this up later if you don't find it.
Leave a comment:
luc replied

04-06-2018, 03:23 PM
Convenient way to avoid re-indexing references?

When working with big genomes one would really like to avoid re-indexing reference sequences. BBmap is always looking for the reference indices in the working directory and not in the directory where the reference resides. Is there any way to point BBMap to the index? When using "path=" it seems to automatically re-index i this location.
Thanks in advance.
Leave a comment:
bwubb replied

04-04-2018, 08:29 AM
Greetings,

First time bbmap user and Im having some odd compatibility issues with certain tools using a bbmap generated bam. I do not think I did anything particularly odd in my bam workflow:

Left trim and right trim using bbduk | bbmerge | bbmap | samtools addreplacerg

From this I was able to successfully make variant calls and generate some metrics using GATK/Picard. However bedtools was reporting zero coverage at all intervals and I am unable to view reads restricting to regions using samtools.

Code:

[bwubb@node063 GRCh37]$ samtools view 4721.ready.bam 1 [main_samview] region "1" specifies an unknown reference name. Continue anyway. [bwubb@node063 GRCh37]$ samtools view -H 4721.ready.bam | head -5 @HD VN:1.4 SO:coordinate @SQ SN:1 dna:chromosome chromosome:GRCh37:1:1:249250621:1 LN:249250621 @SQ SN:2 dna:chromosome chromosome:GRCh37:2:1:243199373:1 LN:243199373 @SQ SN:3 dna:chromosome chromosome:GRCh37:3:1:198022430:1 LN:198022430 @SQ SN:4 dna:chromosome chromosome:GRCh37:4:1:191154276:1 LN:191154276

Are they SN incorrect? Nothing looks really unusual. I guess the first line of my fastq does look like

Code:

>1 dna:chromosome chromosome:GRCh37:1:1:249250621:1

If this is the issue can I "correct" the bbmap index or should I just modify the fasta? Thanks.

-bwubb
Leave a comment:
tabwalsh replied

03-29-2018, 11:25 PM
Thanks, I've submitted a ticket here: https://sourceforge.net/p/bbmap/tickets/7/
Leave a comment:
GenoMax replied

03-29-2018, 08:02 AM
Originally posted by tabwalsh View Post

For the latest version of BBMap (BBMap_37.95.tar.gz), it looks like this issue might be fixed by changing line 183 of file 'bbmap/current/jgi/FindPrimers.java' to:

score2=ss.score=ss.quickScore;

I suggest that you create a ticket on BBMap SF site since you have specifically identified a fix.
Leave a comment:
tabwalsh replied

03-29-2018, 06:28 AM
Bias towards forward strand in msa.sh

Hello Brian,

Thanks for the lovely tool. I've been using msa.sh with cutprimers.sh (as described here) to extract sequences between pairs of primers.

However, I got some unexpectedly large sequences as output, and when I took a look at the SAM alignments output by msa.sh, none of them are mapping to the reverse strand.

For example, with the following template sequence…

>template
CTCGAGGGGCCTAGACATTGCCCTCCAGAGAGAGCACCCAACACCCTCCAGGCTTGACCGGCCAGGGTGTCCCCTTGACACCCGTCAAGCCTCACGA

…it should be possible to use the following commands to extract the sequence between the given primers:

msa.sh in=template.fa primer=GGGGCCTAGACATTGCCCTCCA out=fwd.sam
msa.sh in=template.fa primer=GACACCCTGGCCGGTCAAGCCT out=rev.sam
cutprimers.sh in=template.fa sam1=fwd.sam sam2=rev.sam include=t out=amplicon.fa

This outputs the following (primer alignments shown in uppercase):

GGGGCCTAGACATTGCCCTCCAgagagagcacccaacaccctccaggcttgaccggccagggtgtccccttGACACCCGTCAAGCCT

The sequence GACACCCGTCAAGCCT is the best match for the reverse primer on the forward strand, but not the best match overall.

That is on the reverse strand, which would result in the following (primer alignments again in uppercase):

GGGGCCTAGACATTGCCCTCCAgagagagcacccaacaccctccAGGCTTGACCGGCCAGGGTGTC

For the latest version of BBMap (BBMap_37.95.tar.gz), it looks like this issue might be fixed by changing line 183 of file 'bbmap/current/jgi/FindPrimers.java' to:

score2=ss.score=ss.quickScore;

Last edited by tabwalsh; 03-29-2018, 06:31 AM.
Leave a comment:

Previous 1 5 6 7 8 9 10 11 18 34 template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 11 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News