Is bbmap missing mappings of bbmapskimmer ?
Hello Brian,
I will be glad to get your help with the next issue.
I mapped a large DB of reads(over 100K) against the human genome, and found different results between bbmap's and bbmapskimmer's mappings.
I used bbmap to map reads against hg19, then I was interested finding all the mappings over the genome- so I used bbmapskimmer.
Except this concept I do not know any main differences between bbmap and bbmapskimmer codes (am I right?).
And still, bbmap mapped 98,425 reads and bbmapskimmer mapped 107,708. While 96,987 reads are shared.
Means bbmap mapped 1,438 reads that wasn't mapped by bbmapskimmer, which did mapped another 10,721 that wasn't mapped by bbmap- some of them with perfect minid score of 1.0.
Is there any possible reason for that?
I attached SAM files, with the relevant mappings for reads with maximum minid score in each algorithem (best minid for 1,438 reads of bbmap is 0.96, and for bbmapskimmer is 1.0).
- I used the next reference genome: hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz
- All mappings were done with next parameters: minid=0.9 ambig=all
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
filter reads based on GC% content
Hello Brian and everyone,
I'm wondering whether it could be possible (and with which tool) to perform a read filter based on its GC% content.
For example, working with a metagenome with fungal and bacterial DNA fragments, how can I retain only paired-end reads with a GC > 50% and isolate bacterial reads?
Thanks for the attention
Stefano
Leave a comment:
-
Error running BBMap
Hi!
I already used BBMap successfully in the past to map RNA-seq data to a multi-FASTA transcriptome. Now I was trying to map to the human genome (downloaded from human genome browser of NCBI). I used the following command:
bbmap.sh ref=$mapping_ref build="$BBmap_ref_ID" in=$BBduked_reads1 in2=$BBduked_reads2 \
outu=unmapped_"$name1"_"$date".sam outm=mapped_"$name1"_"$date".sam \
maxindel=200k mdtag=true sam=1.4 subfilter=3 pairedonly=true
Unfortunately I´m running into an error:
[...]
Retaining first best site only for ambiguous mappings.
Writing reference.
Executing dna.FastaToChromArrays2 [/scratch/hpc-prf-ptma2/ptma2001/bbmap/GRCh38_genome_16032022.fna, 6, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=1, midpad=300, startpad=8000, stoppad=8000, nodisk=false]
Set genScaffoldInfo=true
Writing chunk 1
Writing chunk 2
Exception in thread "main" java.lang.NumberFormatException: For input string: "ӌ�jHN�H��¬���l��,ΚK��/S��ͩm���T
�!�l��X��o�ã’*s$�s^��m�h
J@�
�m��$֧����@�H,���xI%_ Uz�?� S�B�2[*i�_����)i�NEր�ٜ�?Z�i£�� "
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at dna.Data.setGenome2(Data.java:1016)
at dna.Data.setGenome(Data.java:769)
at align2.BBMap.loadIndex(BBMap.java:316)
at align2.BBMap.main(BBMap.java:32)
Any idea what causes the problem and how I could fix it?
Thanks a lot in advance and have a great day
Ella
Leave a comment:
-
BBmap Error
I am getting this error how to solve it.
java -ea -Xmx4393m -Xms4393m -cp /home/szweda/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 in=PY53_contigs.fasta out=PY53_bbmapped.sam bamscript=bs.sh
Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, in=PY53_contigs.fasta, out=PY53_bbmapped.sam, bamscript=bs.sh]
Version 38.95
Retaining first best site only for ambiguous mappings.
Found samtools 1.10
Exception in thread "main" java.lang.RuntimeException: Can't find file ref/genome/1/summary.txt
at fileIO.ReadWrite.getRawInputStream(ReadWrite.java:933)
at fileIO.ReadWrite.getInputStream(ReadWrite.java:898)
at fileIO.TextFile.open(TextFile.java:280)
at fileIO.TextFile.<init>(TextFile.java:123)
at dna.Data.setGenome2(Data.java:823)
at dna.Data.setGenome(Data.java:769)
at align2.BBMap.loadIndex(BBMap.java:316)
at align2.BBMap.main(BBMap.java:32)
Leave a comment:
-
Entropy filtering
Hello,
I am working with shotgun metagenomic sequencing data from gut microbiome samples. We're planning to do taxonomic abundance estimates. For data preprocessing we are going to trim and filter sequences for quality and adapter content with bbduk, as well as remove host sequences with bbsplit. We are also considering an additional entropy filtering step with bbduk after our other preprocessing steps. I was hoping I could ask you for some more information about how this entropy filtering process works, and if you might have a recommendation in our case.- In some of our samples, we have an overrepresentation of G homopolymers. We’re confident that these are technical artifacts from the NextSeq sequencing protocol. I know we can filter these out with an entropy threshold of 0.1. However, I’ve seen in some metagenomic studies they filter out repetitive sequences that are not sequencing artifacts. Would you recommend raising this entropy threshold in our case, and if so to what new value?
- On the more technical side of how bbduk implements entropy filtering…
- How does the sliding window traverse a read? Is it one base at a time, or does it move by the whole window size?
- If a read has a region of low complexity sequences at the beginning/end, are only these sections filtered or is the entire read removed?
- How might a read with an internal region of low complexity be treated?
Leave a comment:
-
How is coverage calculated in bbmap?
Hi Brian and everyone else,
I am using bbmap for my metagenomes. Currently, I am comparing the coverage of different genes across metagenomes. BBmap has the flag -rpkm, which calculates fold coverage, RPKM and FPKM. But I couldn´t find the respective formulas, how exactly it is calculated. When I tried to re-calculate these parameters, I came to different results.
Thank you very much in advance! I hope this thread is at the right position in SEQanswers.
Best,
Franz
Leave a comment:
-
Originally posted by HESmith View PostThose genotype calls are based on the read data, which are summarized in the INFO fields of the VCF. If you change the genotypes, those field data will be inconsistent.
It would be useful if you explained more clearly what you're trying to accomplish.
I've used a published simulation framework to simulate pedigrees having genotype data, and what I'm trying to do is specify a rate at which I want to mutate genotypes, i.e., to mimic allele drop-out or drop-in, each at specified rates. Mimicking a het to hom-ref would be easy if variant alleles were all I cared about (ie just deleting random rows from the VCF), but I _do_ care about genotypes at invariant sites.
My plan was to write some custom bash/python/bcftoolsy things to get out genotypes and mutate at a specified rate then stick them back into the VCF. I was just wondering if there was some existing tool to do something like this in bbmap (or elsewhere). I can't seem to find anything.
Leave a comment:
-
Originally posted by turnersd View PostNot exactly. I literally want to replace the 0/1s or 0/0s or 1/1s in the VCF, not generate reads.
It would be useful if you explained more clearly what you're trying to accomplish.
Leave a comment:
-
Originally posted by HESmith View PostYou can do it in two steps: 1) use mutate.sh on the reference genome; 2) use randomreads.sh of the mutated reference to generate the data (I assume that's what you mean by sample).
Leave a comment:
-
You can do it in two steps: 1) use mutate.sh on the reference genome; 2) use randomreads.sh of the mutated reference to generate the data (I assume that's what you mean by sample).
Leave a comment:
-
Insert mutations in a VCF
I see mutate.sh takes a vcf and inserts mutations in a reference genome. I need something a little different. I want to specify a mutation rate and potentially a sample, and insert mutations at that specified rate in the specified sample. Anything in bbmap that might help with this?
Leave a comment:
-
All "N" reads in paired reads not being filtered if other read is "good"
Version 38.94
This thread details the "bug":
Leave a comment:
-
bbmap ignores minid
bbmap ignores minid parameter
This is for version 38.93 and 38.92
Download BBMap for free. BBMap short read aligner, and other bioinformatic tools. This package includes BBMap, a short read aligner, as well as various other bioinformatic tools. It is written in pure Java, can run on any platform, and has no dependencies other than Java being installed (compiled for Java 6 and higher).Last edited by silask; 09-28-2021, 11:44 PM.
Leave a comment:
-
Originally posted by lmusgrove View PostHi Brian,
I'm running BBMap with the rpkm output option and would like to know how to see mapped rate for each read file. I run multiple files consecutively with nohup. Here's my code:
bbmap.sh ref=data/Assembly.fasta \
in1=data/clean/A_1.clean.fq.gz \
in2=data/clean/A_2.clean.fq.gz \
rpkm=data/fpkm/A.fpkm \
t=5 &
bbmap.sh ref=data/Assembly.fasta \
in1=data/clean/B_1.clean.fq.gz \
in2=data/clean/B_2.clean.fq.gz \
rpkm=data/fpkm/B.fpkm \
t=5
etc etc
So far it's worked really well. However, while the stdout file shows the mapped rates it doesn't tell me which read files relate to which stats. It just has a number of repeated --- Results 1 ---- records and I don't know which is which.
Is there a flag I need to add to ensure I can see which stats are for which files?
Thanks!
LisaCode:covstats=<file> Per-scaffold coverage info.
Leave a comment:
-
Finding mapped rate for rpkm output
Hi Brian,
I'm running BBMap with the rpkm output option and would like to know how to see mapped rate for each read file. I run multiple files consecutively with nohup. Here's my code:
bbmap.sh ref=data/Assembly.fasta \
in1=data/clean/A_1.clean.fq.gz \
in2=data/clean/A_2.clean.fq.gz \
rpkm=data/fpkm/A.fpkm \
t=5 &
bbmap.sh ref=data/Assembly.fasta \
in1=data/clean/B_1.clean.fq.gz \
in2=data/clean/B_2.clean.fq.gz \
rpkm=data/fpkm/B.fpkm \
t=5
etc etc
So far it's worked really well. However, while the stdout file shows the mapped rates it doesn't tell me which read files relate to which stats. It just has a number of repeated --- Results 1 ---- records and I don't know which is which.
Is there a flag I need to add to ensure I can see which stats are for which files?
Thanks!
Lisa
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...-
Channel: Articles
03-22-2024, 06:39 AM -
-
by seqadmin
The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.
Avian Conservation
Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...-
Channel: Articles
03-08-2024, 10:41 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 06:37 PM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
Yesterday, 06:37 PM
|
||
Started by seqadmin, Yesterday, 06:07 PM
|
0 responses
9 views
0 likes
|
Last Post
by seqadmin
Yesterday, 06:07 PM
|
||
Started by seqadmin, 03-22-2024, 10:03 AM
|
0 responses
49 views
0 likes
|
Last Post
by seqadmin
03-22-2024, 10:03 AM
|
||
Started by seqadmin, 03-21-2024, 07:32 AM
|
0 responses
67 views
0 likes
|
Last Post
by seqadmin
03-21-2024, 07:32 AM
|
Leave a comment: