Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
The reference in this case is a list of RAD loci sequences, each at 150 bp. The reads are 150 bp and should match one of the RAD loci. Since there are no N (or degenerate nucleotides) in the reference, and only a small number of reads with an N, it must have to do with the padding. A trimmed read would not have N padding, would it? This sample was degraded DNA, so plenty of reads were trimmed to less than 150 nucleotides. I doubt that 72% of the loci from this sample had an insertion that made the read extend past the reference (generated from consensus of many samples). Odd, but I guess it is not critical for me to fully figure this out.
-
Originally posted by kimng View PostHello, I've been using different parts of the BBMap suite and mixing them with other softwares for WGS of bacterial samples (primarily for the purpose of automated QC). I'm a fan of callvariants.sh as it works quickly and doesn't require me to sort .sam files before hand and it's simple to clearfilters to return all values. Unfortunately when using callvariants.sh with .sam files created from minimap2 (https://github.com/lh3/minimap2) using illumina nextseq reads. I receive the error seen below. I was curious if this was a minor bug that could be corrected?
I'm running callvariants.sh from bbmap v37.90 obtained from conda (https://anaconda.org/bioconda/bbmap). Appreciate any assistance given.
-Kim
Error:
java -ea -Xmx48779m -Xms48779m -cp /home/ssi.ad/kimn/.conda/envs/env_kim/opt/bbmap-37.90/current/ var2.CallVariants in=aln.sam ref=contigs.fasta vcf=minimap.vcf ploidy=1 clearfilters
Executing var2.CallVariants [in=aln.sam, ref=contigs.fasta, vcf=minimap.vcf, ploidy=1, clearfilters]
Loading reference.
Time: 0.304 seconds.
Processing input files.
Exception in thread "Thread-3" Exception in thread "Thread-4" Exception in thread "Thread-5" java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
at stream.SamLine.toShortMatch(SamLine.java:1200)
at stream.SamLine.toRead(SamLine.java:1972)
at stream.SamLine.toRead(SamLine.java:1832)
at stream.SamReadStreamer$ProcessThread.makeReads(SamReadStreamer.java:206)
at stream.SamReadStreamer$ProcessThread.run(SamReadStreamer.java:135)
java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
at stream.SamLine.toShortMatch(SamLine.java:1200)
at stream.SamLine.toRead(SamLine.java:1972)
at stream.SamLine.toRead(SamLine.java:1832)
at stream.SamReadStreamer$ProcessThread.makeReads(SamReadStreamer.java:206)
at stream.SamReadStreamer$ProcessThread.run(SamReadStreamer.java:135)
java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
at stream.SamLine.toShortMatch(SamLine.java:1200)
at stream.SamLine.toRead(SamLine.java:1972)
at stream.SamLine.toRead(SamLine.java:1832)
at stream.SamReadStreamer$ProcessThread.makeReads(SamReadStreamer.java:206)
at stream.SamReadStreamer$ProcessThread.run(SamReadStreamer.java:135)
Prior to that you could probably add MD tags with "samtools calmd".
Leave a comment:
-
This means 72 percent of the reads mapped with an "N" symbol in the match string, an internal data structure similar to a cigar string. The "N" symbol denotes either an N in the read or an N in the reference, which can include sequences that go off the end of scaffolds (the ends are internally padded with Ns). Additionally, degenerate IUPAC codes are counted as Ns.
Leave a comment:
-
I have a question about the bbmap summary output. Here is an example:
Code:Reads Used: 1061591 (151284618 bases) Mapping: 15.250 seconds. Reads/sec: 69611.57 kBases/sec: 9920.17 Read 1 data: pct reads num reads pct bases num bases mapped: 76.2432% 809391 77.6086% 117409909 unambiguous: 75.3885% 800318 76.8084% 116199316 ambiguous: 0.8547% 9073 0.8002% 1210593 low-Q discards: 0.0000% 0 0.0000% 0 perfect best site: 11.4049% 121073 9.7282% 14717263 semiperfect site: 58.9222% 625513 60.0540% 90852423 Match Rate: NA NA 99.1429% 116413381 Error Rate: 20.6634% 183594 0.2527% 296692 Sub Rate: 19.9279% 177059 0.2369% 278137 Del Rate: 0.6244% 5548 0.0084% 9819 Ins Rate: 0.6302% 5599 0.0074% 8736 N Rate: 72.0298% 639985 0.6044% 709655
Leave a comment:
-
Originally posted by darthsequencer View PostYeah - I have but from what I can tell it maxes out at 6 reports unless maxsites is increased. I am wondering if there is a setting that avoids maxsites and reports everything above some score.
Leave a comment:
-
Originally posted by GenoMax View PostHave you tried ambig=all?
Leave a comment:
-
Hi Brian - What are the right flags to set if I want all of the top scoring mappings for paired end reads?
Thanks
Leave a comment:
-
Thanks for the input but same issue using sam=1.3, I think Geno's comment on something he intended to do is on the mark.
Leave a comment:
-
You can try the reformat.sh command per @GenoMax but with 'sam=1.3' flag. That worked for me with a different variant caller (FreeBayes).
Leave a comment:
-
Thanks for the info didn't realise that was the site for tickets so appreciate it, I'll create a ticket there.
Leave a comment:
-
It was worth a try.
Based on the `TODO` tag I see perhaps this is something @Brian intends to work on. Unfortunately he has not been participating in forums of late so there is no guarantee as to when this question may be addressed. You could try creating a ticket at BBMap SF repository for this question.
Leave a comment:
-
Appreciate the reply but unfortunately it leads to the same error:
java -ea -Xmx200m -cp /home/ssi.ad/kimn/.conda/envs/env_kim/opt/bbmap-37.90/current/ jgi.ReformatReads in=minimap.sam out=new.bam sam=1.4
Executing jgi.ReformatReads [in=minimap.sam, out=new.bam, sam=1.4]
Input is being processed as unpaired
java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
at stream.SamLine.toShortMatch(SamLine.java:1200)
at stream.SamLine.toRead(SamLine.java:1972)
at stream.SamLine.toRead(SamLine.java:1832)
at stream.SamReadInputStream.toReadList(SamReadInputStream.java:118)
at stream.SamReadInputStream.fillBuffer(SamReadInputStream.java:89)
at stream.SamReadInputStream.hasMore(SamReadInputStream.java:53)
at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:664)
at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:653)
Leave a comment:
-
You can try reformatting your alignment file using "reformat.sh in=you.bam out=new.bam sam=1.4" to see if that helps.
Leave a comment:
-
Bug in call variants with relation to sam files created from minimap2
Hello, I've been using different parts of the BBMap suite and mixing them with other softwares for WGS of bacterial samples (primarily for the purpose of automated QC). I'm a fan of callvariants.sh as it works quickly and doesn't require me to sort .sam files before hand and it's simple to clearfilters to return all values. Unfortunately when using callvariants.sh with .sam files created from minimap2 (https://github.com/lh3/minimap2) using illumina nextseq reads. I receive the error seen below. I was curious if this was a minor bug that could be corrected?
I'm running callvariants.sh from bbmap v37.90 obtained from conda (https://anaconda.org/bioconda/bbmap). Appreciate any assistance given.
-Kim
Error:
java -ea -Xmx48779m -Xms48779m -cp /home/ssi.ad/kimn/.conda/envs/env_kim/opt/bbmap-37.90/current/ var2.CallVariants in=aln.sam ref=contigs.fasta vcf=minimap.vcf ploidy=1 clearfilters
Executing var2.CallVariants [in=aln.sam, ref=contigs.fasta, vcf=minimap.vcf, ploidy=1, clearfilters]
Loading reference.
Time: 0.304 seconds.
Processing input files.
Exception in thread "Thread-3" Exception in thread "Thread-4" Exception in thread "Thread-5" java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
at stream.SamLine.toShortMatch(SamLine.java:1200)
at stream.SamLine.toRead(SamLine.java:1972)
at stream.SamLine.toRead(SamLine.java:1832)
at stream.SamReadStreamer$ProcessThread.makeReads(SamReadStreamer.java:206)
at stream.SamReadStreamer$ProcessThread.run(SamReadStreamer.java:135)
java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
at stream.SamLine.toShortMatch(SamLine.java:1200)
at stream.SamLine.toRead(SamLine.java:1972)
at stream.SamLine.toRead(SamLine.java:1832)
at stream.SamReadStreamer$ProcessThread.makeReads(SamReadStreamer.java:206)
at stream.SamReadStreamer$ProcessThread.run(SamReadStreamer.java:135)
java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
at stream.SamLine.toShortMatch(SamLine.java:1200)
at stream.SamLine.toRead(SamLine.java:1972)
at stream.SamLine.toRead(SamLine.java:1832)
at stream.SamReadStreamer$ProcessThread.makeReads(SamReadStreamer.java:206)
at stream.SamReadStreamer$ProcessThread.run(SamReadStreamer.java:135)
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 08:47 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
Yesterday, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
54 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
Leave a comment: