Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Newbie here! I have unzipped and untared bbmap but it wont run any commands. I have a Linux virtual box in windows 10. Am I missing some software to use BBMap?
-
Hi
I came across this when searching for a way to demultiplex non-overlapping paired end reads that were sequenced using combinatorial barcodes. I don't suppose there is a way of doing that somehow using seal (or other tools?).
Originally posted by Brian Bushnell View PostIt is almost possible to do this with Seal, which outputs reads into bins based on kmer matching.
seal.sh in=reads.fq pattern=%.fq k=6 restrictleft=6 mm=f ref=barcodes.fa rcomp=f
That would require a file "barcodes.fa" like this:
>AACTGA
AACTGA
>GGCCTT
GGCCTT
etc., with one fasta entry per barcode, so the output reads would be in file AACTGA.fq and so forth. This is sort of a common request, so maybe I will make it unnecessary to provide a fasta file of the barcodes. Does that matter to you either way?
However, BBDuk has the flags "skipr1" and "skipr2", which allow it to only do kmer operations on one read or the other. Seal currently lacks this, but it's essential for processing inline barcodes. I'll add it for the next release.
Leave a comment:
-
Reformat should never run out of memory with the default settings and short (<200kbp) reads. I think the input file is corrupt, and should be re-downloaded. The corruption probably occurs somewhere around the 32.77 millionth read, but it's hard to be sure...
Leave a comment:
-
Looks like you only ran this with 200MB of RAM. Can you try with -Xmx2g?
How old is this data BTW (in years)?
Leave a comment:
-
See below. I run it for only one fo the files because doing both would go over my storage quota.
java -da -Xmx200m -cp /home/cslamovi/CLARKSCV1.2.2-b/bbmap/current/ jgi.ReformatReads -da ibq qin=33 in=scratch/s_3_1_sequence.fastq out=scratch/fixed_1.fq
Executing jgi.ReformatReads [-da, ibq, qin=33, in=scratch/s_3_1_sequence.fastq, out=scratch/fixed_1.fq]
Input is being processed as unpaired
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at fileIO.ByteFile1.fillBuffer(ByteFile1.java:180)
at fileIO.ByteFile1.nextLine(ByteFile1.java:136)
at stream.FASTQ.toReadList(FASTQ.java:648)
at stream.FastqReadInputStream.fillBuffer(FastqReadInputStream.java:111)
at stream.FastqReadInputStream.nextList(FastqReadInputStream.java:96)
at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:656)
at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:635)
Input: 32775200 reads 3539721600 bases
Output: 32775200 reads (100.00%) 3539721600 bases (100.00%)
Time: 807.191 seconds.
Reads Processed: 32775k 40.60k reads/sec
Bases Processed: 3539m 4.39m bases/sec
Exception in thread "main" java.lang.RuntimeException: ReformatReads terminated in an error state; the output may be corrupt.
at jgi.ReformatReads.process(ReformatReads.java:1032)
at jgi.ReformatReads.main(ReformatReads.java:45)
Leave a comment:
-
Hmm... can you post the errors? Reformat by default uses very little memory, which is all that should be needed for a correctly-formatted file containing reads. It will run out of memory if you use it without increasing the default memory allocation on extremely long sequences (over tens of megabases) such as the human genome. It will never run out of memory on a correctly-formatted Illumina fastq file.
So, it would also be helpful if you could post the results of "head" (the first 10 lines of the file).
Leave a comment:
-
testformat says: illumina fastq raw single-ended 108bp
As far as I remember this was a HiSeq run.
I tried the reformat line suggested by Brian but the process stops after a while with errors. Apparently short of memory. Will try to improve that and try again.
Leave a comment:
-
Originally posted by DNA Sorcerer View PostThank Brian,
You are probably right in your suspicion. The read files are old, probably 2011 or so, and were originally .txt and I changed to .fastq. Would this have confused BBmap?
@Brian also has a program to identify the correct encoding for your files. That example is in post # 1 of this thread.
Code:$ testformat.sh in=seq.fq.gz
Leave a comment:
-
Thank Brian,
You are probably right in your suspicion. The read files are old, probably 2011 or so, and were originally .txt and I changed to .fastq. Would this have confused BBmap?
Leave a comment:
-
Your error message indicates a problem with the quality scores (admittedly, I should improve the error message from that assertion, but I've never seen it happen before!). Specifically, there are some N bases (no-calls) that have a quality of greater than 0. This could mean that you have an ASCII-64 file being processed as ASCII-33, or it could mean... well, something else, like bug in the base-calling software.
If the problem is just Ns with nonzero quality scores, you can fix the files by running them through Reformat (this command assumes the file is ASCII-33/Sanger):
reformat.sh -da ibq qin=33 in=scratch/s_3_#_sequence.fastq out=fixed_#.fq
Do you happen to know what the quality encoding is of these files? For example, what platform are they from, and how old are they?
Leave a comment:
-
You don't have to use all of the cores/threads available on a physical server (some clusters may be setup in a way that require you to reserve exclusive access to a node to do that).
I generally find that threads=6 works plenty fast (BBMap is well written). You may run into some I/O limits unless you have a high performance file system that goes along with your HPC cluster.
Leave a comment:
-
Originally posted by GenoMax View PostHaving a multi-threaded job spread out across physical nodes would not be advisable and may lead to the kind of errors you are seeing above.
Leave a comment:
-
Do you know how many cores there are on the node that you are running this on (16 was an example, you may have more or less available on the hardware you have access to)? Are you providing the necessary request for using all available cores on a node in your SGE command line?
Having a multi-threaded job spread out across physical nodes would not be advisable and may lead to the kind of errors you are seeing above.
Leave a comment:
-
Thanks Brian,
I got it run on 16 threads, but after a little while I get these sort of errors:
Started 16 mapping threads.
Exception in thread "Thread-7" java.lang.AssertionError
at stream.Read.expectedErrors(Read.java:2099)
at stream.Read.avgQualityByProbability(Read.java:1713)
at stream.Read.avgQualityByProbability(Read.java:1706)
at stream.Read.avgQuality(Read.java:1700)
at align2.AbstractMapThread.quickMap(AbstractMapThread.java:688)
at align2.BBMapThread.processReadPair(BBMapThread.java:953)
at align2.AbstractMapThread.run(AbstractMapThread.java:508)
Detecting finished threads: 0Exception in thread "Thread-13" java.lang.AssertionError
at stream.Read.expectedErrors(Read.java:2099)
at stream.Read.avgQualityByProbability(Read.java:1713)
at stream.Read.avgQualityByProbability(Read.java:1706)
at stream.Read.avgQuality(Read.java:1700)
at align2.AbstractMapThread.quickMap(AbstractMapThread.java:688)
at align2.BBMapThread.processReadPair(BBMapThread.java:953)
at align2.AbstractMapThread.run(AbstractMapThread.java:508)
Exception in thread "Thread-12" java.lang.AssertionError
at stream.Read.expectedErrors(Read.java:2099)
at stream.Read.avgQualityByProbability(Read.java:1713)
at stream.Read.avgQualityByProbability(Read.java:1706)
at stream.Read.avgQuality(Read.java:1700)
at align2.AbstractMapThread.quickMap(AbstractMapThread.java:688)
at align2.BBMapThread.processReadPair(BBMapThread.java:953)
at align2.AbstractMapThread.run(AbstractMapThread.java:508)
Leave a comment:
-
Also, note this line:
Code:Started 1 mapping thread.
I recommend you ssh into the node it's running on and run "top -c", then look for a java process and ensure it is using CPU resources (should be at 100%). You can also examine the output file; it should gradually be growing.Last edited by Brian Bushnell; 02-18-2016, 07:55 PM.
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...-
Channel: Articles
05-06-2024, 07:48 AM -
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 05-14-2024, 07:03 AM
|
0 responses
24 views
0 likes
|
Last Post
by seqadmin
05-14-2024, 07:03 AM
|
||
Started by seqadmin, 05-10-2024, 06:35 AM
|
0 responses
44 views
0 likes
|
Last Post
by seqadmin
05-10-2024, 06:35 AM
|
||
Started by seqadmin, 05-09-2024, 02:46 PM
|
0 responses
58 views
0 likes
|
Last Post
by seqadmin
05-09-2024, 02:46 PM
|
||
Started by seqadmin, 05-07-2024, 06:57 AM
|
0 responses
44 views
0 likes
|
Last Post
by seqadmin
05-07-2024, 06:57 AM
|
Leave a comment: