Seqanswers Leaderboard Ad

**r.rosati** · 02-08-2017, 11:14 AM

BBmap on Ion data

Hello,
I tried to run BBmap to call variants on four BAM files (2 parents, 2 children) generated from exome sequencing on an Ion Proton machine (TMAP software).

The bbmap command was:

Code:

/opt/bbmap/callvariants.sh in=IonXpress_009_rawlib.bam,IonXpress_010_rawlib.bam,IonXpress_011_rawlib.bam,IonXpress_012_rawlib.bam ref=./hg19/hg19.fasta out=vars.vcf multisample ploidy=2 prefilter 1> err.txt 2> out.txt &

I got an error as following (I'm including also the last lines of output):

Code:

Finished second pass.
Writing output.
Merging [(IonXpress_009_rawlib, individual_IonXpress_009_rawlib.vcf.gz), (IonXpress_010_rawlib, individual_IonXpress_010_rawlib.vcf.gz), (IonXpress_011_rawlib, individual_IonXpress_011_rawlib.vcf.gz), (IonXpress_012_rawlib, individual_IonXpress_012_rawlib.vcf.gz)]
Exception in thread "main" java.lang.NumberFormatException: For input string: "162,590"
	at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
	at sun.misc.FloatingDecimal.parseFloat(FloatingDecimal.java:122)
	at java.lang.Float.parseFloat(Float.java:451)
	at var2.MergeSamples.processHeader(MergeSamples.java:233)
	at var2.MergeSamples.processRow(MergeSamples.java:177)
	at var2.MergeSamples.mergeFiles(MergeSamples.java:151)
	at var2.MergeSamples.mergeSamples(MergeSamples.java:117)
	at var2.CallVariants2.process(CallVariants2.java:380)
	at var2.CallVariants2.main(CallVariants2.java:52)
	at var2.CallVariants.main(CallVariants.java:48)

I unzipped the vcf files and used grep to find "162,590". I got a match at line 9 of the vcf for one individual:

Code:

##readLengthAvg=162,590

The other individuals also had similar formatting:
(IonXpress_010)

Code:

##readLengthAvg=160,055

(IonXpress_011)

Code:

##readLengthAvg=167,107

(IonXpress_012)

Code:

##readLengthAvg=169,228

This is the full header for barcode 009:

Code:

##fileformat=VCFv4.2
##BBMapVersion=36.92
##ploidy=2
##rarity=1,00000
##minallelefraction=0,10000
##reads=43874924
##pairedReads=0
##properlyPairedReads=0
##readLengthAvg=162,590
##properPairRate=0,00000
##totalQualityAvg=23,525
##mapqAvg=77,583
##reference=./hg19/hg19.fasta
##contig=<ID=chr1,length=249250621>
##contig=<ID=chr2,length=243199373>
##contig=<ID=chr3,length=198022430>
##contig=<ID=chr4,length=191154276>
##contig=<ID=chr5,length=180915260>
##contig=<ID=chr6,length=171115067>
##contig=<ID=chr7,length=159138663>
##contig=<ID=chr8,length=146364022>
##contig=<ID=chr9,length=141213431>
##contig=<ID=chr10,length=135534747>
##contig=<ID=chr11,length=135006516>
##contig=<ID=chr12,length=133851895>
##contig=<ID=chr13,length=115169878>
##contig=<ID=chr14,length=107349540>
##contig=<ID=chr15,length=102531392>
##contig=<ID=chr16,length=90354753>
##contig=<ID=chr17,length=81195210>
##contig=<ID=chr18,length=78077248>
##contig=<ID=chr19,length=59128983>
##contig=<ID=chr20,length=63025520>
##contig=<ID=chr21,length=48129895>
##contig=<ID=chr22,length=51304566>
##contig=<ID=chrX,length=155270560>
##contig=<ID=chrY,length=59373566>
##contig=<ID=chrM,length=16569>
##FORMAT=<ID=FAIL,Description="Fail">
##FORMAT=<ID=PASS,Description="Pass">
##INFO=<ID=SN,Number=1,Type=Integer,Description="Scaffold Number">
##INFO=<ID=STA,Number=1,Type=Integer,Description="Start">
##INFO=<ID=STO,Number=1,Type=Integer,Description="Stop">
##INFO=<ID=TYP,Number=1,Type=String,Description="Type">
##INFO=<ID=R1P,Number=1,Type=Integer,Description="Read1 Plus Count">
##INFO=<ID=R1M,Number=1,Type=Integer,Description="Read1 Minus Count">
##INFO=<ID=R2P,Number=1,Type=Integer,Description="Read2 Plus Count">
##INFO=<ID=R2M,Number=1,Type=Integer,Description="Read2 Minus Count">
##INFO=<ID=PPC,Number=1,Type=Integer,Description="Paired Count">
##INFO=<ID=LS,Number=1,Type=Integer,Description="Length Sum">
##INFO=<ID=MQS,Number=1,Type=Integer,Description="MAPQ Sum">
##INFO=<ID=MQM,Number=1,Type=Integer,Description="MAPQ Max">
##INFO=<ID=BQS,Number=1,Type=Integer,Description="Base Quality Sum">
##INFO=<ID=BQM,Number=1,Type=Integer,Description="Base Quality Max">
##INFO=<ID=EDS,Number=1,Type=Integer,Description="End Distance Sum">
##INFO=<ID=EDM,Number=1,Type=Integer,Description="End Distance Max">
##INFO=<ID=IDS,Number=1,Type=Integer,Description="Identity Sum">
##INFO=<ID=IDM,Number=1,Type=Integer,Description="Identity Max">
##INFO=<ID=COV,Number=1,Type=Integer,Description="Coverage">
##INFO=<ID=MCOV,Number=1,Type=Integer,Description="Minus Coverage">
##INFO=<ID=CED,Number=1,Type=Integer,Description="Contig End Distance">
##INFO=<ID=HMP,Number=1,Type=Integer,Description="Homopolymer Count">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=1,Type=Float,Description="Allele Fraction">
##INFO=<ID=RAF,Number=1,Type=Float,Description="Revised Allele Fraction">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Ref+, Ref-, Alt+, Alt-">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Allele Depth">
##FORMAT=<ID=AF,Number=1,Type=Float,Description="Allele Fraction">
##FORMAT=<ID=SC,Number=1,Type=Float,Description="Score">
##FORMAT=<ID=PF,Number=1,Type=String,Description="Pass Filter">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	IonXpress_009_rawlib

Anything I should have done differently to not cause the error?
Thank you!

**Brian Bushnell** · 02-08-2017, 11:24 AM

Oh, that's really strange. It's due to your locale settings, I guess. In the US, decimals are printed like "##readLengthAvg=162.590" rather than "##readLengthAvg=162,590". For some reason Java is using commas as the delimiter when printing the floating point numbers, but not when trying to read floating point numbers, which is extremely irritating.

This will take a little research to fix (other than you changing your computer's locale to US or something like that); I'll get back to you.

**GenoMax** · 02-08-2017, 12:08 PM

@r.rosati: See if this helps. https://docs.oracle.com/cd/E23824_01...033/glmha.html

**r.rosati** · 02-09-2017, 11:12 AM

Originally posted by GenoMax View Post

@r.rosati: See if this helps. https://docs.oracle.com/cd/E23824_01...033/glmha.html

Thank you... Indeed I ran it on a ssh session from a computer that had pt_BR as LC_NUMERIC.

I've read Zhou's comment on indels and I thought I'd first run the callvariants script on a test exome from CEPH 1347-02 DNA.
Should I realign the reads (I'm reading it's advised for non-BBmap aligned reads?) i.e.

Code:

callvariants.sh in=IonXpress_014_rawlib.bam ref=./hg19/hg19.fasta out=CEPH_BBmap_variants_TMAP_BAM_realigned.vcf ploidy=2 realign threads=16 1> err.txt 2> out.txt &

**Brian Bushnell** · 02-09-2017, 02:32 PM

Yes, I do recommend "realign" for non-BBMap-aligned reads.

**Marmottontaine** · 02-13-2017, 05:01 AM

BBSplit / BBMap error

Hello,
I'm using BBSplit to remove mouse reads in human samples. My command is the following :
bbsplit.sh ambiguous=best ambiguous2=toss path=$8 maxindel=900000 refstats=stat_alignment_$samplename.txt threads=4 qtrim=f untrim=f in1=$8/trim_R1_$samplename.fq in2=$8/trim_R2_$samplename.fq ref=$2/$1.fa,$2/Grcm38.fa basename=decontamin_$samplename.%_#.fq outu1=unmapped1_$samplename.fq outu2=unmapped2_$samplename.fq

Which give me the following message where the parameters used are not what I specified :

java -Djava.library.path=/home/audrey/bbmap/jni/ -ea -Xmx48113m -cp /home/audrey/bbmap/current/ align2.BBSplitter ow=t fastareadlen=500 minhits=1 minratio=0.56 maxindel=20 qtrim=rl untrim=t trimq=6 ambiguous=best ambiguous2=toss path=. maxindel=900000 refstats=stat_alignment_L6.txt threads=4 qtrim=f untrim=f in1=./trim_R1_L6.fq in2=./trim_R2_L6.fq ref=/home/audrey/reference_genomes/GATK/hg38.fa,/home/audrey/reference_genomes/GATK/Grcm38.fa basename=decontamin_L6.%_#.fq outu1=unmapped1_L6.fq outu2=unmapped2_L6.fq
Executing align2.BBSplitter [ow=t, fastareadlen=500, minhits=1, minratio=0.56, maxindel=20, qtrim=rl, untrim=t, trimq=6, ambiguous=best, ambiguous2=toss, path=., maxindel=900000, refstats=stat_alignment_L6.txt, threads=4, qtrim=f, untrim=f, in1=./trim_R1_L6.fq, in2=./trim_R2_L6.fq, ref=/home/audrey/reference_genomes/GATK/hg38.fa,/home/audrey/reference_genomes/GATK/Grcm38.fa, basename=decontamin_L6.%_#.fq, outu1=unmapped1_L6.fq, outu2=unmapped2_L6.fq]

Converted arguments to [ow=t, fastareadlen=500, minhits=1, minratio=0.56, maxindel=20, qtrim=rl, untrim=t, trimq=6, ambiguous=best, ambiguous2=toss, path=., maxindel=900000, refstats=stat_alignment_L6.txt, threads=4, qtrim=f, untrim=f, in1=./trim_R1_L6.fq, in2=./trim_R2_L6.fq, basename=decontamin_L6.%_#.fq, outu1=unmapped1_L6.fq, outu2=unmapped2_L6.fq, ref_hg38=/home/audrey/reference_genomes/GATK/hg38.fa, ref_Grcm38=/home/audrey/reference_genomes/GATK/Grcm38.fa]
Creating merged reference file ref/genome/1/merged_ref_6706777893850.fa.gz
Ref merge time: 112.084 seconds.
Executing align2.BBMap [ow=t, fastareadlen=500, minhits=1, minratio=0.56, maxindel=20, qtrim=rl, untrim=t, trimq=6, ambiguous=best, ambiguous2=toss, maxindel=900000, refstats=stat_alignment_L6.txt, threads=4, qtrim=f, untrim=f, in1=./trim_R1_L6.fq, in2=./trim_R2_L6.fq, outu1=unmapped1_L6.fq, outu2=unmapped2_L6.fq, ref=ref/genome/1/merged_ref_6706777893850.fa.gz, out_hg38=decontamin_L6.hg38_#.fq, out_Grcm38=decontamin_L6.Grcm38_#.fq]

BBMap version 36.92
Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.560
Reference set statistics will be written to stat_alignment_L6.txt
Set threads to 4
Retaining first best site only for ambiguous mappings.
NOTE: Deleting contents of ref/genome/1 because reference is specified and overwrite=true
Writing reference.
Executing dna.FastaToChromArrays2 [ref/genome/1/merged_ref_6706777893850.fa.gz, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=1, midpad=300, startpad=8000, stoppad=8000, nodisk=false]

Set genScaffoldInfo=true
Writing chunk 1
Writing chunk 2
Writing chunk 3
Writing chunk 4
Writing chunk 5
Writing chunk 6
Writing chunk 7
Writing chunk 8
Writing chunk 9
Writing chunk 10
Writing chunk 11
Writing chunk 12
Writing chunk 13
Set genome to 1

Loaded Reference: 0.005 seconds.
Loading index for chunk 1-13, build 1
No index available; generating from reference genome: /media/audrey/26a8cc89-53d7-4f9b-984c-f8d2c0d519fd/Ievgenia/PDX/Exome/test/ref/index/1/chr1-3_index_k13_c2_b1.block
Indexing threads started for block 0-3
Indexing threads finished for block 0-3
No index available; generating from reference genome: /media/audrey/26a8cc89-53d7-4f9b-984c-f8d2c0d519fd/Ievgenia/PDX/Exome/test/ref/index/1/chr4-7_index_k13_c2_b1.block
Indexing threads started for block 4-7
Indexing threads finished for block 4-7
No index available; generating from reference genome: /media/audrey/26a8cc89-53d7-4f9b-984c-f8d2c0d519fd/Ievgenia/PDX/Exome/test/ref/index/1/chr8-11_index_k13_c2_b1.block
Indexing threads started for block 8-11
Indexing threads finished for block 8-11
No index available; generating from reference genome: /media/audrey/26a8cc89-53d7-4f9b-984c-f8d2c0d519fd/Ievgenia/PDX/Exome/test/ref/index/1/chr12-13_index_k13_c2_b1.block
Indexing threads started for block 12-13
Indexing threads finished for block 12-13
Generated Index: 571.883 seconds.
Analyzed Index: 10.807 seconds.
Reads that map to multiple references will be considered unmapped.
Started output stream: 6.057 seconds.
Creating ref-set statistics table: 0.012 seconds.
Cleared Memory: 4.121 seconds.
Processing reads in paired-ended mode.
Started read stream.
Started 4 mapping threads.

Could you please explain me how to set and force the parameters?
Many thanks in advance.

**GenoMax** · 02-13-2017, 05:23 AM

Which exact parameters are you referring to?

**Marmottontaine** · 02-13-2017, 05:33 AM

Thank you GenoMax for your answer.
I let minratio at the default value. So it should be 0.65 as said in bbsplit help. here it's 0.56.
And for the parameters : maxindel, qtrim and untrim, they are both set as the default then as I asked in the same command line. Will it be problematic?
Many thanks.

**GenoMax** · 02-13-2017, 05:42 AM

I see. @Brian hopefully will be along later today with an official answer but my guess is that it probably has to do with ambigous2=toss option that you have selected.

**Brian Bushnell** · 02-13-2017, 05:47 AM

And for the parameters : maxindel, qtrim and untrim, they are both set as the default then as I asked in the same command line. Will it be problematic?

Later parameters override earlier parameters, so that's not a problem.

As for 0.65 versus 0.56, that looks like either a typo, or else I changed the default at some point but forgot to update the help info; I'll fix that. Thanks! In the meantime you can always manually add "minratio=0.65" to override it.

Edit: To clarify, 0.56 is the intended behavior, and the description is in error.

**Marmottontaine** · 02-13-2017, 05:59 AM

Thank you very much GenoMax and Brian for your helpful answers.

**shimingt** · 02-19-2017, 09:20 PM

Converting a coverage file into a csv format for mmgenome

Hello there,

Is there a way I could convert the mapping files that bbmap produced into csv coverage file that can be used for the mmgenome software (link: http://madsalbertsen.github.io/mmgen...ad_data.html)?

Thanks!

**Brian Bushnell** · 02-21-2017, 04:34 PM

BBMap can directly output coverage using the "covstats" or "basecov" flags, or it can be generated from sam files with pileup.sh. Can you post a few lines from one of these mmgenome csv files so I can see what the format looks like?

**shimingt** · 02-21-2017, 05:47 PM

Dear Brian,

Thanks for your reply.

This is from one of the mapping files (.csv) for mmgenome.

tanshiming@S620100019205:~/Downloads$ head C13.11.14.csv
"Name","Consensus length","Total read count","Single reads","Reads in pairs","Average coverage","Reference sequence","Reference length","Reference common name","Reference Latin name"
"1 mapping","5621","103","19","84","1.443","1","8264","",""
"2 mapping","370","6","2","4","0.625","2","1027","",""
"3 mapping","1665","198","52","146","13.536","3","1665","",""
"4 mapping","94","1","1","0","0.01","4","9056","",""
"5 mapping","3042","95","19","76","3.205","5","3343","",""
"6 mapping","901","8","2","6","9.663E-3","6","98207","",""
"7 mapping","5788","147","11","136","2.612","7","6480","",""
"8 mapping","14602","381","47","334","2.782","8","15790","",""
"9 mapping","1403","1056","282","774","85.135","9","1403","",""
tanshiming@S620100019205:~/Downloads$

**sebastian_1** · 03-15-2017, 12:57 PM

Hello,

I am new to this forum so I apologize if I posted in the wrong place.

Currently I am trying to assemble a single cell genome from some Illumina Hiseq 2x250 bp reads.
One of my first steps is to do normalization of the data. Previously I was using bbnorm.sh because it's fast and very good. However, with this data I cannot use bbnorm.sh on our server due to memory issues. (I have around 100 million PE reads).
So I tried to use bbnorm on a cluster where I allocated 16 cores and 700GB of RAM.

I run the software using this command line:
bbnorm.sh in1=st1c_R1.fastq.gz in2=st1c_R2.fastq.gz out1=st1c_R1_norm.fastq.gz out2=st1c_R2_norm.fastq.gz threads=12 target=60 mindepth=2

The software runs fine, but the process will be killed by the scheduler because of high CPU usage.

=>> PBS: job killed: ncpus 59.52 exceeded limit 16 (sum)

I tried to increase the number of threads requested to 30, lower the number of threads in the command line to 10, but still I have the same problem.

The cluster has installed BBMap version 36.92. It looks like for some reason the number of threads which I define is not taken into account, even if during the run it tells me that the number of threads was set to 12.

Settings:
threads: 12
k: 31
deterministic: true
toss error reads: false
passes: 1
bits per cell: 16
cells: 145.37B
hashes: 3
base min quality: 5
kmer min prob: 0.5

target depth: 240
min depth: 2
max depth: 300
min good kmers: 15
depth percentile: 64.8
ignore dupe kmers: true
fix spikes: false

Any suggestions? I really cannot ask for the entire node just for me.
I understand that due to other components like pigz, it can use more than 12 cores, but here it looks like uses 60 cores, or all available cores.

Thank you,
Sebastian

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 50 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News