Originally posted by GenoMax
View Post
Header Leaderboard Ad
Collapse
BBMap (aligner for DNA/RNAseq) is now open-source and available for download.
Collapse
Announcement
Collapse
SEQanswers June Challenge Has Begun!
The competition has begun! We're giving away a $50 Amazon gift card to the member who answers the most questions on our site during the month. We want to encourage our community members to share their knowledge and help each other out by answering questions related to sequencing technologies, genomics, and bioinformatics. The competition is open to all members of the site, and the winner will be announced at the beginning of July. Best of luck!
For a list of the official rules, visit (https://www.seqanswers.com/forum/sit...wledge-and-win)
For a list of the official rules, visit (https://www.seqanswers.com/forum/sit...wledge-and-win)
See more
See less
X
-
-
Thank you for developing BBMap. Could you please advise how to preprocess the output VCF to make it compatible with VCFAnnotator, SNPEff. I recieve "No gene feature" and "Chromososme is missing" errors on annotation. Tried Pilon also VCFs and errors persist.
Comment
-
Hi Brian,
I have a discontinuous reference genome in a single fasta file (6000 coding sequences) and I would like to use bbmap to align my paired reads to the coding sequence only. I want to know how many reads align to each ORF.
Now I realised that the order of the reference genome affects the alignment because bbmap seems to ignore the ORF borderes in the reference.
Do you have a suggestion how to constrain the alignment within each ORF?
Thank you!
Comment
-
Originally posted by Mat29 View PostThank you for developing BBMap. Could you please advise how to preprocess the output VCF to make it compatible with VCFAnnotator, SNPEff. I recieve "No gene feature" and "Chromososme is missing" errors on annotation. Tried Pilon also VCFs and errors persist.
reformat.sh in=ref.fa out=fixed.fa trd
...then use fixed.fa for mapping and everything else.
Comment
-
Originally posted by YeastGuy View PostHi Brian,
I have a discontinuous reference genome in a single fasta file (6000 coding sequences) and I would like to use bbmap to align my paired reads to the coding sequence only. I want to know how many reads align to each ORF.
Now I realised that the order of the reference genome affects the alignment because bbmap seems to ignore the ORF borderes in the reference.
Do you have a suggestion how to constrain the alignment within each ORF?
Thank you!
Comment
-
Originally posted by Brian Bushnell View PostHow are the ORFs indicated? Do you have them in a separate file (gff/gtf/bed)? BBMap doesn't use any of those, though you could probably use a different tool to mask the reference with them. Generally, I don't recommend constraining alignment to a specific subset of the genome, which incurs confirmation bias.
>lcl||1|YAL001C|TFC3|1|145
TAGTTACTATGGTCGTTAACGAAATAATATTTCATCCAGGGA
>lcl||2|YAL002W|VPS8|1|145
CTGGTCTGGACCCATTACTTTTTCTAGCTTGGGAAAATGTACAG
But after randomising the order withing the ref file the coverage changed.
Comment
-
Oh, that file should be fine; BBMap sees the individual sequences as unconnected. It's still possible for read 1 to map to one csequence and read 2 to a different sequence, though. The coverage may be different for various reasons, typically due to ambiguity (two different reference sequences that are very similar). You can use "ambig=toss" or "ambig=all" to change the behavior of reads mapping to multiple possible locations. But bear in mind that BBMap is slightly nondeterministic when mapping paired reads.
Comment
-
Originally posted by Brian Bushnell View PostOh, that file should be fine; BBMap sees the individual sequences as unconnected. It's still possible for read 1 to map to one csequence and read 2 to a different sequence, though. The coverage may be different for various reasons, typically due to ambiguity (two different reference sequences that are very similar). You can use "ambig=toss" or "ambig=all" to change the behavior of reads mapping to multiple possible locations. But bear in mind that BBMap is slightly nondeterministic when mapping paired reads.
Now that I know what the problem is, it should be possible to find a solution.
Comment
-
Difficulties using bbmap to generate sorted, indexed bam file
Hello people,
I am trying to generate sorted, indexed bam file with the bbmap tool, but I have an error message.
tanshim[email protected]:~/Documents/ACDC-trial/shimingles$ bbmap.sh in1=UPWRP0508161KA_TCCGGAGA-ATAGAGGC_L1L2_R1_001.fastq in2=UPWRP0508161KA_TCCGGAGA-ATAGAGGC_L1L2_R2_001.fastq out=mapped.sam -Xmx30g ref=reformatted-contigs.fasta bamscript=bs.sh; sh bs.sh
java -Djava.library.path=/home/tanshiming/tools/bbmap/jni/ -ea -Xmx30g -cp /home/tanshiming/tools/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 in1=UPWRP0508161KA_TCCGGAGA-ATAGAGGC_L1L2_R1_001.fastq in2=UPWRP0508161KA_TCCGGAGA-ATAGAGGC_L1L2_R2_001.fastq out=mapped.sam -Xmx30g ref=reformatted-contigs.fasta bamscript=bs.sh
Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, in1=UPWRP0508161KA_TCCGGAGA-ATAGAGGC_L1L2_R1_001.fastq, in2=UPWRP0508161KA_TCCGGAGA-ATAGAGGC_L1L2_R2_001.fastq, out=mapped.sam, -Xmx30g, ref=reformatted-contigs.fasta, bamscript=bs.sh]
BBMap version 36.02
Retaining first best site only for ambiguous mappings.
Writing reference.
Executing dna.FastaToChromArrays2 [reformatted-contigs.fasta, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=1, midpad=300, startpad=8000, stoppad=8000, nodisk=false]
Set genScaffoldInfo=true
Writing chunk 1
Set genome to 1
Loaded Reference: 0.084 seconds.
Loading index for chunk 1-1, build 1
No index available; generating from reference genome: /home/tanshiming/Documents/ACDC-trial/shimingles/ref/index/1/chr1_index_k13_c7_b1.block
Indexing threads started for block 0-1
Indexing threads finished for block 0-1
Generated Index: 4.146 seconds.
Analyzed Index: 3.074 seconds.
Started output stream: 0.015 seconds.
Cleared Memory: 0.307 seconds.
Processing reads in paired-ended mode.
Started read stream.
Started 32 mapping threads.
Detecting finished threads: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31
------------------ Results ------------------
Genome: 1
Key Length: 13
Max Indel: 16000
Minimum Score Ratio: 0.56
Mapping Mode: normal
Reads Used: 13403154 (2943817149 bases)
Mapping: 634.915 seconds.
Reads/sec: 21110.16
kBases/sec: 4636.56
Pairing data: pct reads num reads pct bases num bases
mated pairs: 82.0206% 5496676 85.8858% 2528321686
bad pairs: 1.6892% 113202 1.6858% 49625732
insert size avg: 339.43
Read 1 data: pct reads num reads pct bases num bases
mapped: 85.2928% 5715965 85.4106% 1312819237
unambiguous: 85.0524% 5699850 85.1949% 1309504251
ambiguous: 0.2405% 16115 0.2157% 3314986
low-Q discards: 0.0000% 0 0.0000% 0
perfect best site: 56.5028% 3786577 57.4090% 882415075
semiperfect site: 57.7478% 3870016 58.6870% 902059233
rescued: 0.7551% 50604
Match Rate: NA NA 95.4737% 1291210481
Error Rate: 32.2729% 1844876 3.9447% 53348516
Sub Rate: 31.0425% 1774536 0.7285% 9852710
Del Rate: 2.5010% 142970 2.9285% 39606112
Ins Rate: 4.1940% 239750 0.2876% 3889694
N Rate: 2.6675% 152488 0.5816% 7866352
Read 2 data: pct reads num reads pct bases num bases
mapped: 85.4634% 5727397 85.6329% 1204639477
unambiguous: 85.1581% 5706936 85.3792% 1201071323
ambiguous: 0.3053% 20461 0.2536% 3568154
low-Q discards: 0.0000% 0 0.0000% 0
perfect best site: 54.6069% 3659525 55.8135% 785155749
semiperfect site: 55.7207% 3734167 56.9902% 801709710
rescued: 1.0234% 68587
Match Rate: NA NA 95.6569% 1184979976
Error Rate: 34.8039% 1993488 3.7863% 46904205
Sub Rate: 33.7245% 1931661 0.7674% 9506079
Del Rate: 2.2309% 127782 2.7561% 34141479
Ins Rate: 3.7482% 214688 0.2629% 3256647
N Rate: 2.4683% 141381 0.5567% 6896775
Total time: 643.704 seconds.
bs.sh: 2: bs.sh: module: not found
bs.sh: 3: bs.sh: module: not found
Note: This script is designed to run with the amount of memory detected by BBMap.
If Samtools crashes, please ensure you are running on the same platform as BBMap,
or reduce Samtools' memory setting (the -m flag).
Note: Please ignore any warnings about 'EOF marker is absent'; this is a bug in samtools that occurs when using piped input.
[bam_sort] Use -T PREFIX / -o FILE to specify temporary and final output files
Usage: samtools sort [options...] [in.bam]
Options:
-l INT Set compression level, from 0 (uncompressed) to 9 (best)
-m INT Set maximum memory per thread; suffix K/M/G recognized [768M]
-n Sort by read name
-o FILE Write final output to FILE rather than standard output
-T PREFIX Write temporary files to PREFIX.nnnn.bam
-@, --threads INT
Set number of sorting and compression threads [1]
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
-O, --output-fmt FORMAT[,OPT[=VAL]]...
Specify output format (SAM, BAM, CRAM)
--output-fmt-option OPT[=VAL]
Specify a single output file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
[E::hts_open_format] fail to open file 'mapped_sorted.bam'
samtools index: failed to open "mapped_sorted.bam": No such file or directory
Could someone please advice me on this?
Thank You.
Comment
-
Do you have samtools installed and available in your $PATH and which version?
If yes then you can create bam files directly in bbmap command and then sort/index them. Following example is for samtools (v.1.3.1).
Code:bbmap.sh in1=UPWRP0508161KA_TCCGGAGA-ATAGAGGC_L1L2_R1_001.fastq in2=UPWRP0508161KA_TCCGGAGA-ATAGAGGC_L1L2_R2_001.fastq out=mapped.bam -Xmx30g ref=reformatted-contigs.fasta samtools sort -o sorted.bam mapped.bam samtools index sorted.bam
Comment
-
Actually, I think the problem in this case is the samtools version. Samtools changed the default command line flags going from 0.x to 1.x, which causes headaches, and I didn't realize this until recently. If you download the latest version of BBMap (36.92), it will look at the version of samtools installed and change the samtools command appropriately.
Alternately, you can just run these commands manually. For samtools 0.x:
Code:samtools view -bSh1 mapped.sam.gz | samtools sort -m 1G -@ 4 - mapped_sorted samtools index mapped_sorted.bam
Code:samtools view -bSh1 mapped.sam | samtools sort -m 1G -@ 4 - -o mapped_sorted.bam samtools index mapped_sorted.bam
Comment
Latest Articles
Collapse
-
by seqadmin
Developments in sequencing technologies and methodologies have transformed the field of epigenetics, giving researchers a better way to understand the complex world of gene regulation and heritable modifications. This article explores some of the diverse sequencing methods employed in the study of epigenetics, ranging from classic techniques to cutting-edge innovations while providing a brief overview of their processes, applications, and advances.
Methylation Detect...-
Channel: Articles
05-31-2023, 10:46 AM -
-
Differential Expression and Data Visualization: Recommended Tools for Next-Level Sequencing Analysisby seqadmin
After covering QC and alignment tools in the first segment and variant analysis and genome assembly in the second segment, we’re wrapping up with a discussion about tools for differential gene expression analysis and data visualization. In this article, we include recommendations from the following experts: Dr. Mark Ziemann, Senior Lecturer in Biotechnology and Bioinformatics, Deakin University; Dr. Medhat Mahmoud Postdoctoral Research Fellow at Baylor College of Medicine;...-
Channel: Articles
05-23-2023, 12:26 PM -
-
by seqadmin
Continuing from our previous article, we share variant analysis and genome assembly tools recommended by our experts Dr. Medhat Mahmoud, Postdoctoral Research Fellow at Baylor College of Medicine, and Dr. Ming "Tommy" Tang, Director of Computational Biology at Immunitas and author of From Cell Line to Command Line.
Variant detection and analysis tools
Mahmoud classifies variant detection work into two main groups: short variants (<50...-
Channel: Articles
05-19-2023, 10:03 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 06-01-2023, 08:56 PM
|
0 responses
9 views
0 likes
|
Last Post
by seqadmin
06-01-2023, 08:56 PM
|
||
Deep Sequencing Unearths Novel Genetic Variants: Enhancing Precision Medicine for Vascular Anomalies
by seqadmin
Started by seqadmin, 06-01-2023, 07:33 AM
|
0 responses
8 views
0 likes
|
Last Post
by seqadmin
06-01-2023, 07:33 AM
|
||
Unveiling Genetic Associations Through Transcription Factor Binding Quantitative Trait Loci
by seqadmin
Started by seqadmin, 05-31-2023, 07:50 AM
|
0 responses
4 views
0 likes
|
Last Post
by seqadmin
05-31-2023, 07:50 AM
|
||
Exploring French-Canadian Ancestry: Insights into Migration, Settlement Patterns, and Genetic Structure
by seqadmin
Started by seqadmin, 05-26-2023, 09:22 AM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
05-26-2023, 09:22 AM
|
Comment