How to use the bbsplit to check the match details.
i.e. I might want to know the 1 seq from my reads match which seq in the ref? Is it possible to do that?
Cheers,
a
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Yes, it is. Also, with BBSplit, I think it will try to regenerate the index every time as long as "ref=" is specified, even if it already exists, so only do that once.
Leave a comment:
-
After running bbsplit once using the syntax: ref=ref1.fa,ref2.fa, is it possible to re-use that index on subsequent runs using the path= parameter?
Leave a comment:
-
Originally posted by vingomez View PostThanks Brian for your effort in providing bioinformatic applications,
As you mentioned in your post, BBsplit can use two paired-end files as input. In addition to the two files; Can I add a third file (e.g. merged read file) as input?
P.S. In previous post "java -Xmx29g -cp /path/to/current align2.BBSplit", must said align2.BBSplitter"
Leave a comment:
-
Thanks Brian for your effort in providing bioinformatic applications,
As you mentioned in your post, BBsplit can use two paired-end files as input. In addition to the two files; Can I add a third file (e.g. merged read file) as input?
Thanks again
P.S. In previous post "java -Xmx29g -cp /path/to/current align2.BBSplit", must said align2.BBSplitter"Last edited by vingomez; 09-16-2014, 07:22 AM.
Leave a comment:
-
Introducing BBSplit: Read Binning Tool for Metagenomes and Contaminated Libraries
BBSplit is a tool that bins reads by mapping to multiple references simultaneously, using BBMap. The reads go to the bin of the reference they map to best. There are also disambiguation options, such that reads that map to multiple references can be binned with all of them, none of them, one of them, or put in a special "ambiguous" file for each of them. Paired reads will always be kept together.
For example, if you had a library of something that was contaminated with e.coli and salmonella, you could do this:
bbsplit.sh in=reads.fq ref=ecoli.fa,salmonella.fa basename=out_%.fq outu=clean.fq int=t
This will produce 3 output files:
out_ecoli.fq (ecoli reads)
out_salmonella.fq (salmonella reads)
clean.fq (unmapped reads)
In this case, "int=t" means that the input file is paired and interleaved. For single-end reads you would leave that out. For paired reads in 2 files, you would do this:
bbsplit.sh in1=reads1.fq in2=reads2.fq ref=ecoli.fa,salmonella.fa basename=out_%.fq outu1=clean1.fq outu2=clean2.fq
You can get more information about parameters by running bbsplit.sh with no arguments, or reading /bbmap/docs/readme.txt. But I will mention here the inter-reference ambiguity modes, which decide what to do with reads that map to multiple references and pairs where each read maps to a different reference:
ambig2=best
Default. Ambiguous reads go to the first best site.
ambig2=toss
Ambiguous reads are considered unmapped.
ambig2=all
Write a copy to the output for each reference to which it maps.
ambig2=split
Write a copy to the AMBIGUOUS_ output file for each reference to which it maps.
If your OS cannot process bash shellscripts, replace "bbsplit.sh" with "java -Xmx29g -cp /path/to/current align2.BBSplitter", where /path/to/current is the location of the 'current' directory (a subdirectory of bbmap), and -Xmx29g specifies the amount of memory to use (so this would be the command line for a 32GB computer). This should be set to about 85% of physical memory.
BBSplit is extremely fast and highly sensitive, using BBMap for the mapping. So, all flags and features supported by BBMap can be used with BBSplit (aside from sam output).
BBSplit is available here:
Download BBMap for free. BBMap short read aligner, and other bioinformatic tools. This package includes BBMap, a short read aligner, as well as various other bioinformatic tools. It is written in pure Java, can run on any platform, and has no dependencies other than Java being installed (compiled for Java 6 and higher).
P.S. Some people have asked why BBSplit has a lower alignment rate than BBMap. That is because it has a lower default sensitivity, as the original intent was to bin reads using known assemblies. The sensitivity can be raised to be equivalent to BBMap with these flags: "minratio=0.56 minhits=1 maxindel=16000"Last edited by Brian Bushnell; 09-16-2014, 08:29 AM.
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 08:47 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Today, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
59 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
54 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
Leave a comment: