Originally posted by GenoMax
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
-
Thank you for developing BBMap. Could you please advise how to preprocess the output VCF to make it compatible with VCFAnnotator, SNPEff. I recieve "No gene feature" and "Chromososme is missing" errors on annotation. Tried Pilon also VCFs and errors persist.
Comment
-
Hi Brian,
I have a discontinuous reference genome in a single fasta file (6000 coding sequences) and I would like to use bbmap to align my paired reads to the coding sequence only. I want to know how many reads align to each ORF.
Now I realised that the order of the reference genome affects the alignment because bbmap seems to ignore the ORF borderes in the reference.
Do you have a suggestion how to constrain the alignment within each ORF?
Thank you!
Comment
-
Originally posted by Mat29 View PostThank you for developing BBMap. Could you please advise how to preprocess the output VCF to make it compatible with VCFAnnotator, SNPEff. I recieve "No gene feature" and "Chromososme is missing" errors on annotation. Tried Pilon also VCFs and errors persist.
reformat.sh in=ref.fa out=fixed.fa trd
...then use fixed.fa for mapping and everything else.
Comment
-
Originally posted by YeastGuy View PostHi Brian,
I have a discontinuous reference genome in a single fasta file (6000 coding sequences) and I would like to use bbmap to align my paired reads to the coding sequence only. I want to know how many reads align to each ORF.
Now I realised that the order of the reference genome affects the alignment because bbmap seems to ignore the ORF borderes in the reference.
Do you have a suggestion how to constrain the alignment within each ORF?
Thank you!
Comment
-
Originally posted by Brian Bushnell View PostHow are the ORFs indicated? Do you have them in a separate file (gff/gtf/bed)? BBMap doesn't use any of those, though you could probably use a different tool to mask the reference with them. Generally, I don't recommend constraining alignment to a specific subset of the genome, which incurs confirmation bias.
>lcl||1|YAL001C|TFC3|1|145
TAGTTACTATGGTCGTTAACGAAATAATATTTCATCCAGGGA
>lcl||2|YAL002W|VPS8|1|145
CTGGTCTGGACCCATTACTTTTTCTAGCTTGGGAAAATGTACAG
But after randomising the order withing the ref file the coverage changed.
Comment
-
Oh, that file should be fine; BBMap sees the individual sequences as unconnected. It's still possible for read 1 to map to one csequence and read 2 to a different sequence, though. The coverage may be different for various reasons, typically due to ambiguity (two different reference sequences that are very similar). You can use "ambig=toss" or "ambig=all" to change the behavior of reads mapping to multiple possible locations. But bear in mind that BBMap is slightly nondeterministic when mapping paired reads.
Comment
-
Originally posted by Brian Bushnell View PostOh, that file should be fine; BBMap sees the individual sequences as unconnected. It's still possible for read 1 to map to one csequence and read 2 to a different sequence, though. The coverage may be different for various reasons, typically due to ambiguity (two different reference sequences that are very similar). You can use "ambig=toss" or "ambig=all" to change the behavior of reads mapping to multiple possible locations. But bear in mind that BBMap is slightly nondeterministic when mapping paired reads.
Now that I know what the problem is, it should be possible to find a solution.
Comment
-
Difficulties using bbmap to generate sorted, indexed bam file
Hello people,
I am trying to generate sorted, indexed bam file with the bbmap tool, but I have an error message.
tanshiming@S620100019205:~/Documents/ACDC-trial/shimingles$ bbmap.sh in1=UPWRP0508161KA_TCCGGAGA-ATAGAGGC_L1L2_R1_001.fastq in2=UPWRP0508161KA_TCCGGAGA-ATAGAGGC_L1L2_R2_001.fastq out=mapped.sam -Xmx30g ref=reformatted-contigs.fasta bamscript=bs.sh; sh bs.sh
java -Djava.library.path=/home/tanshiming/tools/bbmap/jni/ -ea -Xmx30g -cp /home/tanshiming/tools/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 in1=UPWRP0508161KA_TCCGGAGA-ATAGAGGC_L1L2_R1_001.fastq in2=UPWRP0508161KA_TCCGGAGA-ATAGAGGC_L1L2_R2_001.fastq out=mapped.sam -Xmx30g ref=reformatted-contigs.fasta bamscript=bs.sh
Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, in1=UPWRP0508161KA_TCCGGAGA-ATAGAGGC_L1L2_R1_001.fastq, in2=UPWRP0508161KA_TCCGGAGA-ATAGAGGC_L1L2_R2_001.fastq, out=mapped.sam, -Xmx30g, ref=reformatted-contigs.fasta, bamscript=bs.sh]
BBMap version 36.02
Retaining first best site only for ambiguous mappings.
Writing reference.
Executing dna.FastaToChromArrays2 [reformatted-contigs.fasta, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=1, midpad=300, startpad=8000, stoppad=8000, nodisk=false]
Set genScaffoldInfo=true
Writing chunk 1
Set genome to 1
Loaded Reference: 0.084 seconds.
Loading index for chunk 1-1, build 1
No index available; generating from reference genome: /home/tanshiming/Documents/ACDC-trial/shimingles/ref/index/1/chr1_index_k13_c7_b1.block
Indexing threads started for block 0-1
Indexing threads finished for block 0-1
Generated Index: 4.146 seconds.
Analyzed Index: 3.074 seconds.
Started output stream: 0.015 seconds.
Cleared Memory: 0.307 seconds.
Processing reads in paired-ended mode.
Started read stream.
Started 32 mapping threads.
Detecting finished threads: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31
------------------ Results ------------------
Genome: 1
Key Length: 13
Max Indel: 16000
Minimum Score Ratio: 0.56
Mapping Mode: normal
Reads Used: 13403154 (2943817149 bases)
Mapping: 634.915 seconds.
Reads/sec: 21110.16
kBases/sec: 4636.56
Pairing data: pct reads num reads pct bases num bases
mated pairs: 82.0206% 5496676 85.8858% 2528321686
bad pairs: 1.6892% 113202 1.6858% 49625732
insert size avg: 339.43
Read 1 data: pct reads num reads pct bases num bases
mapped: 85.2928% 5715965 85.4106% 1312819237
unambiguous: 85.0524% 5699850 85.1949% 1309504251
ambiguous: 0.2405% 16115 0.2157% 3314986
low-Q discards: 0.0000% 0 0.0000% 0
perfect best site: 56.5028% 3786577 57.4090% 882415075
semiperfect site: 57.7478% 3870016 58.6870% 902059233
rescued: 0.7551% 50604
Match Rate: NA NA 95.4737% 1291210481
Error Rate: 32.2729% 1844876 3.9447% 53348516
Sub Rate: 31.0425% 1774536 0.7285% 9852710
Del Rate: 2.5010% 142970 2.9285% 39606112
Ins Rate: 4.1940% 239750 0.2876% 3889694
N Rate: 2.6675% 152488 0.5816% 7866352
Read 2 data: pct reads num reads pct bases num bases
mapped: 85.4634% 5727397 85.6329% 1204639477
unambiguous: 85.1581% 5706936 85.3792% 1201071323
ambiguous: 0.3053% 20461 0.2536% 3568154
low-Q discards: 0.0000% 0 0.0000% 0
perfect best site: 54.6069% 3659525 55.8135% 785155749
semiperfect site: 55.7207% 3734167 56.9902% 801709710
rescued: 1.0234% 68587
Match Rate: NA NA 95.6569% 1184979976
Error Rate: 34.8039% 1993488 3.7863% 46904205
Sub Rate: 33.7245% 1931661 0.7674% 9506079
Del Rate: 2.2309% 127782 2.7561% 34141479
Ins Rate: 3.7482% 214688 0.2629% 3256647
N Rate: 2.4683% 141381 0.5567% 6896775
Total time: 643.704 seconds.
bs.sh: 2: bs.sh: module: not found
bs.sh: 3: bs.sh: module: not found
Note: This script is designed to run with the amount of memory detected by BBMap.
If Samtools crashes, please ensure you are running on the same platform as BBMap,
or reduce Samtools' memory setting (the -m flag).
Note: Please ignore any warnings about 'EOF marker is absent'; this is a bug in samtools that occurs when using piped input.
[bam_sort] Use -T PREFIX / -o FILE to specify temporary and final output files
Usage: samtools sort [options...] [in.bam]
Options:
-l INT Set compression level, from 0 (uncompressed) to 9 (best)
-m INT Set maximum memory per thread; suffix K/M/G recognized [768M]
-n Sort by read name
-o FILE Write final output to FILE rather than standard output
-T PREFIX Write temporary files to PREFIX.nnnn.bam
-@, --threads INT
Set number of sorting and compression threads [1]
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
-O, --output-fmt FORMAT[,OPT[=VAL]]...
Specify output format (SAM, BAM, CRAM)
--output-fmt-option OPT[=VAL]
Specify a single output file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
[E::hts_open_format] fail to open file 'mapped_sorted.bam'
samtools index: failed to open "mapped_sorted.bam": No such file or directory
Could someone please advice me on this?
Thank You.
Comment
-
Do you have samtools installed and available in your $PATH and which version?
If yes then you can create bam files directly in bbmap command and then sort/index them. Following example is for samtools (v.1.3.1).
Code:bbmap.sh in1=UPWRP0508161KA_TCCGGAGA-ATAGAGGC_L1L2_R1_001.fastq in2=UPWRP0508161KA_TCCGGAGA-ATAGAGGC_L1L2_R2_001.fastq out=mapped.bam -Xmx30g ref=reformatted-contigs.fasta samtools sort -o sorted.bam mapped.bam samtools index sorted.bam
Comment
-
Actually, I think the problem in this case is the samtools version. Samtools changed the default command line flags going from 0.x to 1.x, which causes headaches, and I didn't realize this until recently. If you download the latest version of BBMap (36.92), it will look at the version of samtools installed and change the samtools command appropriately.
Alternately, you can just run these commands manually. For samtools 0.x:
Code:samtools view -bSh1 mapped.sam.gz | samtools sort -m 1G -@ 4 - mapped_sorted samtools index mapped_sorted.bam
Code:samtools view -bSh1 mapped.sam | samtools sort -m 1G -@ 4 - -o mapped_sorted.bam samtools index mapped_sorted.bam
Comment
Latest Articles
Collapse
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
-
by seqadmin
Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.
Nobel Prize for MicroRNA Discovery
This week,...-
Channel: Articles
10-07-2024, 08:07 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, Yesterday, 05:31 AM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
Yesterday, 05:31 AM
|
||
Started by seqadmin, 10-24-2024, 06:58 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
10-24-2024, 06:58 AM
|
||
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types
by seqadmin
Started by seqadmin, 10-23-2024, 08:43 AM
|
0 responses
48 views
0 likes
|
Last Post
by seqadmin
10-23-2024, 08:43 AM
|
||
Started by seqadmin, 10-17-2024, 07:29 AM
|
0 responses
58 views
0 likes
|
Last Post
by seqadmin
10-17-2024, 07:29 AM
|
Comment