Hello,
I've written a script that I'd like to share for batch primer picking for gaps. As many of you know with bacterial finishing we have to close many more gaps (sometimes hundreds) and manually selecting and validating primers can be time consuming. BOSS should take a lot of the pain out of that. Typically, it takes me 5 hours to select and validate primers manually 100 gaps, but with BOSS, it takes 5 minutes. I hope you will give it a try and let me know if you have any feedback. Please see the readme below for details.
download at:
###Batch Oligo Selection Script(BOSS) V2.2 README###
A) DESCRIPTION
Batch Oligo Selection Script V2.2 (BOSS 2.2) is a program that selects primers using a heuristic algorithm. Two template primers an
d sequencing primers are selected and evaluated against user specified quality settings. The algorithm continuously slides
a 500 base window away from the gap edgeuntil the users quality threshold are met. The script outputs a number of files
to assist the user in the finishing process:
* A hits file indicates the uniqueness of each primer in a primet set.
* A stats file provides size and positional information for each primer in a primer set.
* A csv file is provided containing the actual primers selected for a primer set.
* An ace file is created identical to the input ace file but containing tags for all primers selected.
In addition, a set of five files are generated for each gap evaluated (use -log option to prevent them from being removed).
These are a primer 3 input file, primer 3 output file, a fasta file, a blast results file, and a score file that indicates how BOSS
evaluated the BLAST results in the blast file. Please see below for options and usage.
B) DEPENDENCIES
BOSS depends on Primer3 and Blast, and requires one output file from consed. If you do need to download and install either, please v
isit the following links:
Primer3
Blast
Prior to using the program, you must save an info.txt file using
Consed for the ace file assembly you wish to select oligos. Do
this by opening the ace file in consed and selecting Info>Show Maps of
Contigs In Scaffolds>Save to File>OK.
C) INSTALLATION INSTRUCTIONS
1) Extract the boss.tar.gz archive file to the desired location.
2) Open boss.pl in a text editor such as xemacs.
3) Edit line 43 between quotation marks to your path for the primer3core installation. Leave the last "/" off.
For example, if primer3 is installed in /production/tools/primer3, line 43 should read:
my $blast_path="/production/tools/primer3";
4) Edit line 44 between quotation marks to your path for the blastall installation. Leave the last "/" off.
For example, if blastall is installed in /production/tools/blast/blast-2.2.14/bin/blastall, line 44 should read:
my $blast_path="/production/tools/blast/blast-2.2.14/bin";
5) Save the file and exit.
6) If you know how many cores the linux machine that blast is running on has, change the "4" in line 45 to that number. If you do no
t know,
you can try changing this number to 1 if you have any problems.
D) BASIC HELP
DESCRIPTION
Batch Oligo Selection Script V2.2 (BOSS 2.2) is a program that selects primers using a heuristic algorithm. One or two template prim
ers and sequencing primersare selected and evaluated against user specified quality settings. The algorithm continuously slides
a 500 base window away from the gap edgeuntil the users quality threshold are met. The script outputs a number of files
to assist the user in the finishing process:
* A hits file indicates the uniqueness of each primer in a primet set.
* A stats file provides size and positional information for each primer in a primer set.
* A csv file is provided containing the actual primers selected for a primer set.
* An ace file is created identical to the input ace file but containing tags for all primers selected.
In addition, a set of five files are generated for each gap evaluated (use -log option to prevent them from being removed).
These are a primer 3 input file, primer 3 output file, a fasta file, a blast results file, and a score file that indicates how BOSS
evaluated the BLAST results in the blast file. Please see below for options and usage.
-----------
OPTIONS
The following options are available. Options with default settings will be set to internal defaults or config file settings
unless explicity specified. Required Options are indicated as so and script will not run without them.
-a: Designate input ace File(required).
-g: Activate gap identifier column in csv file.
-o Designate a custom ace output name (default is bpp_output.ace).
-i: Designate info.txt scaffold information file(required).
-log: Write processes to a log file named bpp.log.
-f: Designates numeral for first primer number(Default=1) for oligo tags
-t Activate additional order for each gap with template as sequencing primer.
-r: Designate minimum acceptable primer set ranking(default=2). See below for details.
-min: Designate minimum acceptable primer length (default=18).
-opt: Designate optimal acceptable primer length (default=25).
-max: Designate maximum acceptable primer length (default=28).
-tmin: Designate minimum acceptable melting temp (default=53).
-topt: Designate optimal acceptable melting temp (default=55).
-tmax: Designate maximum acceptable melting temp (default=60).
-gcmin: Designate minimum acceptable percent gc (default=20).
-gcmax: Designate maximum acceptable percent gc (default=80).
-gcend: Desigmate maximum allowable Gs and Cs in last 5 based of 3'end(default=2).
-clamp: Designate GC clamp(default=0).
-ex: Designate minimum allowable contig size to allow for primer selection(default=2000).
-polyx: Designmate maximum allowable length of mononucleotide run in primer(default=3).
-------
USAGE
boss.pl <PROJECT> -a <INPUT ACE FILE> -i <INPUT INFO.TXT FILE> -s <SUBCLONE> <OPTIONS: -r -min -max etc..>
-----
PRIMER SET RANK DEFINITIONS
The primer set ranking option (-r) defines what primer sets are accepted and which are rejected. They are as follows from most strin
gent to least:
4 (Excellent) Template and sequencing primers are ALL unique.
3 (Good) Template primers are unique, at least one sequencing primer is not.
2 (Fair) One Template primer is unique, sequencing primers may or may not be unique.
1 (Poor) Neither Template prime is unique. Use this rating for the -r option is not recommended!
In addition, you will sometimes see in the bpp.hits file a ranking called "Forced Selection."
This indicates that the program could not find a primer set that meets at least "Fair" criteria and therefore defaults to the first
primer set selected for the gap.
See the bpp.hits file for examples of these ratings applied to primer sets.
E) UNDERSTANDING OUTPUT
CSV output
BOSS outputs 3 CSV files. The first part of the name will be the project you designated when launching the script. This will be foll
owed by _gaps.csv, _bubble.csv, or _combo.csv depending on which file you are looking at. The gaps.csv file contains one PCR reactio
n per line with 2 template oligos and 2 sequencing oligos. An assembly with 3 gaps will have 3 reactions (or 6 if you use the -t opt
ion) will look like this:
project,subcloneName,oligoOneSequence,oligoTwoSequence,seqOligoOneSequence,seqOligoTwoSequence
P12345,S1234,GAATTTCTTTCAAACCGATGT,TGCATATCGTAAATAAACGTAAATA,GCCGAACACGCTGTCTTT,ATTTCATCATTCCCACAAGAT
P12345,S1234,CCGTTATGCTATGGGTTATC,GAAGACGAAAGACGTATTAATAGAA,CAAGCCTAATGCAGTCAATATAC,GAACTGAATAAATTAAGTTCTTTGG
P12345,S1234,TTTGTCTTTGATTTCTTTGTTTAC,ATTTGCATAGAAACACCACCTA,GTGATTCAGGTAATACTCGGTAAC,AAATTTCATCATTCCCACAA
Each line is a reaction for a project whos designation is P12345, on a subclone that is designated S1234, and consists of four oligo
s. The first two are the left and right template oligos used to generate the PCR template, and the last two are the primers existing
between the two template primers that will be used in the sequencing reaction.
The bubble.csv file will contain a selection of one template and sequencing primer for each scaffold edge in the assembly. This is u
seful for anchored or bubble PCR applications and will look like so:
pcrType,pcrSubtype,chemistry,project,subcloneName,oligoOne,seqOligoSequence
J38842,G1459,AAACCATAGCGGATTAACG,GGGAATCCAGGACGTAAA
J38842,G1459,GTACCGGCTCGGATTCTT,GGTTGTTGTGTCCTTGAAAC
J38842,G1459,CGTGAACCTTGCCAACAA,CGATACAAAGTCGATTGAAA
The combo.csv will contain scaffold edge to scaffold edge primer pairings in every possible combination. So if you have two scaffold
s (4 scaffold edges), BOSS generates a reaction for each possible scaffold-to-scaffold relationship. combo.csv output looks like thi
s:
project,subcloneName,oligoOneSequence,oligoTwoSequence,seqOligoOneSequence,seqOligoTwoSequence
J38842,G1459,GTCCAGTATATCTGGGATAGCTTAT,CGTGAACCTTGCCAACAA,GAAAGAAGTACAGAAAGAAGTACAGA,CGATACAAAGTCGATTGAAA
J38842,G1459,GTCCAGTATATCTGGGATAGCTTAT,GTCAGCGGCACATTCTTT,GAAAGAAGTACAGAAAGAAGTACAGA,AAGTTCGTATTGGAAATCATTC
J38842,G1459,GTCCAGTATATCTGGGATAGCTTAT,TACAACCTTTGCTCCATCA,GAAAGAAGTACAGAAAGAAGTACAGA,TCAGGTAGCCTTATCACGTT
You will notice that the left template and sequencing primer is the same in all three reactions, but the right are not. This is BOSS
pairing one particular scaffold edge with three other scaffold edges.
Analysis output
In addition to the csv files, boss outputs several files to assist in analysis or validation of the primers. These are bbp.hits,bpp.
hits.se,bpp.stats and bpp.stats.se. The versions with the .se extension are the same as those without except that they are pertinent
to the scaffold edge primers used in the bubble.csv and combo.csv files.
The bpp.hits file is the analysis of each 4 primer set selected for each gap in the assembly. The file looks like this:
Gap Identifier T1 Hits T2 Hits S1 Hits S2 Hits Set Strength
contig00013_contig00014 4 1 1 1 Fair
contig00014_contig00015 1 1 1 1 Excellent
contig00015_contig00016 1 1 1 1 Excellent
Gap Identifier indicates the gap between the contigs to the left and right of the underscore. T1 Hits indicates how many "perfect" b
last hits were found that matched the left template oligo, T2 Hits for the right template oligo, S1 Hits for the left sequencing oli
go, and S2 Hits for the right sequencing oligo. Set Strength is a qualitative rating of the primer set. See Basic Help in section D
for details. Thus, this file gives you an idea of how unique the primers selected are with respect to the entire assembly.
The bpp.stats file gives coordinate information regarding the primers selected, and looks like this:
Gap Identifier T1-S T1-E T2-S T2-E S1-S S1-E S1-DTE S1-L S2-S S2-E S2-L Template Size
contig00015_contig00016 67703 67721 400 418 67825 67847 351 Y 207 230 Y 412
contig00016_contig00017 2444 2463 396 417 2573 2592 349 Y 200 218 Y 407
contig00017_contig00018 55441 55464 367 392 55503 55527 407 Y 228 247 Y 377
The first column identifies the gap as in the bpp.hits file. Columns T1-S,T2-S,S1-S,S2-S give the starting coordinates of each oligo
in its respective contig.T1-E,T2-E,S1-E,S2-E gives the ending coordinates of each oligo in its respective contig. S1-DTE indicates
the distance to the gap edge for the left sequencing primer. S2-S provides the same information for the right sequencing primer. S1-
L and S2-L are just confirmations that the sequencing primers are logically placed, that is, within the bounds of the template prime
rs. Template size indicates the distance between T1-S and T2-E, not including the missing sequence in the gap.
F) HELP!
If you need help with BOSS, please contact the author at [email protected]. Thank you for downloading BOSS V2.2.
I've written a script that I'd like to share for batch primer picking for gaps. As many of you know with bacterial finishing we have to close many more gaps (sometimes hundreds) and manually selecting and validating primers can be time consuming. BOSS should take a lot of the pain out of that. Typically, it takes me 5 hours to select and validate primers manually 100 gaps, but with BOSS, it takes 5 minutes. I hope you will give it a try and let me know if you have any feedback. Please see the readme below for details.
download at:
###Batch Oligo Selection Script(BOSS) V2.2 README###
A) DESCRIPTION
Batch Oligo Selection Script V2.2 (BOSS 2.2) is a program that selects primers using a heuristic algorithm. Two template primers an
d sequencing primers are selected and evaluated against user specified quality settings. The algorithm continuously slides
a 500 base window away from the gap edgeuntil the users quality threshold are met. The script outputs a number of files
to assist the user in the finishing process:
* A hits file indicates the uniqueness of each primer in a primet set.
* A stats file provides size and positional information for each primer in a primer set.
* A csv file is provided containing the actual primers selected for a primer set.
* An ace file is created identical to the input ace file but containing tags for all primers selected.
In addition, a set of five files are generated for each gap evaluated (use -log option to prevent them from being removed).
These are a primer 3 input file, primer 3 output file, a fasta file, a blast results file, and a score file that indicates how BOSS
evaluated the BLAST results in the blast file. Please see below for options and usage.
B) DEPENDENCIES
BOSS depends on Primer3 and Blast, and requires one output file from consed. If you do need to download and install either, please v
isit the following links:
Primer3
Blast
Prior to using the program, you must save an info.txt file using
Consed for the ace file assembly you wish to select oligos. Do
this by opening the ace file in consed and selecting Info>Show Maps of
Contigs In Scaffolds>Save to File>OK.
C) INSTALLATION INSTRUCTIONS
1) Extract the boss.tar.gz archive file to the desired location.
2) Open boss.pl in a text editor such as xemacs.
3) Edit line 43 between quotation marks to your path for the primer3core installation. Leave the last "/" off.
For example, if primer3 is installed in /production/tools/primer3, line 43 should read:
my $blast_path="/production/tools/primer3";
4) Edit line 44 between quotation marks to your path for the blastall installation. Leave the last "/" off.
For example, if blastall is installed in /production/tools/blast/blast-2.2.14/bin/blastall, line 44 should read:
my $blast_path="/production/tools/blast/blast-2.2.14/bin";
5) Save the file and exit.
6) If you know how many cores the linux machine that blast is running on has, change the "4" in line 45 to that number. If you do no
t know,
you can try changing this number to 1 if you have any problems.
D) BASIC HELP
DESCRIPTION
Batch Oligo Selection Script V2.2 (BOSS 2.2) is a program that selects primers using a heuristic algorithm. One or two template prim
ers and sequencing primersare selected and evaluated against user specified quality settings. The algorithm continuously slides
a 500 base window away from the gap edgeuntil the users quality threshold are met. The script outputs a number of files
to assist the user in the finishing process:
* A hits file indicates the uniqueness of each primer in a primet set.
* A stats file provides size and positional information for each primer in a primer set.
* A csv file is provided containing the actual primers selected for a primer set.
* An ace file is created identical to the input ace file but containing tags for all primers selected.
In addition, a set of five files are generated for each gap evaluated (use -log option to prevent them from being removed).
These are a primer 3 input file, primer 3 output file, a fasta file, a blast results file, and a score file that indicates how BOSS
evaluated the BLAST results in the blast file. Please see below for options and usage.
-----------
OPTIONS
The following options are available. Options with default settings will be set to internal defaults or config file settings
unless explicity specified. Required Options are indicated as so and script will not run without them.
-a: Designate input ace File(required).
-g: Activate gap identifier column in csv file.
-o Designate a custom ace output name (default is bpp_output.ace).
-i: Designate info.txt scaffold information file(required).
-log: Write processes to a log file named bpp.log.
-f: Designates numeral for first primer number(Default=1) for oligo tags
-t Activate additional order for each gap with template as sequencing primer.
-r: Designate minimum acceptable primer set ranking(default=2). See below for details.
-min: Designate minimum acceptable primer length (default=18).
-opt: Designate optimal acceptable primer length (default=25).
-max: Designate maximum acceptable primer length (default=28).
-tmin: Designate minimum acceptable melting temp (default=53).
-topt: Designate optimal acceptable melting temp (default=55).
-tmax: Designate maximum acceptable melting temp (default=60).
-gcmin: Designate minimum acceptable percent gc (default=20).
-gcmax: Designate maximum acceptable percent gc (default=80).
-gcend: Desigmate maximum allowable Gs and Cs in last 5 based of 3'end(default=2).
-clamp: Designate GC clamp(default=0).
-ex: Designate minimum allowable contig size to allow for primer selection(default=2000).
-polyx: Designmate maximum allowable length of mononucleotide run in primer(default=3).
-------
USAGE
boss.pl <PROJECT> -a <INPUT ACE FILE> -i <INPUT INFO.TXT FILE> -s <SUBCLONE> <OPTIONS: -r -min -max etc..>
-----
PRIMER SET RANK DEFINITIONS
The primer set ranking option (-r) defines what primer sets are accepted and which are rejected. They are as follows from most strin
gent to least:
4 (Excellent) Template and sequencing primers are ALL unique.
3 (Good) Template primers are unique, at least one sequencing primer is not.
2 (Fair) One Template primer is unique, sequencing primers may or may not be unique.
1 (Poor) Neither Template prime is unique. Use this rating for the -r option is not recommended!
In addition, you will sometimes see in the bpp.hits file a ranking called "Forced Selection."
This indicates that the program could not find a primer set that meets at least "Fair" criteria and therefore defaults to the first
primer set selected for the gap.
See the bpp.hits file for examples of these ratings applied to primer sets.
E) UNDERSTANDING OUTPUT
CSV output
BOSS outputs 3 CSV files. The first part of the name will be the project you designated when launching the script. This will be foll
owed by _gaps.csv, _bubble.csv, or _combo.csv depending on which file you are looking at. The gaps.csv file contains one PCR reactio
n per line with 2 template oligos and 2 sequencing oligos. An assembly with 3 gaps will have 3 reactions (or 6 if you use the -t opt
ion) will look like this:
project,subcloneName,oligoOneSequence,oligoTwoSequence,seqOligoOneSequence,seqOligoTwoSequence
P12345,S1234,GAATTTCTTTCAAACCGATGT,TGCATATCGTAAATAAACGTAAATA,GCCGAACACGCTGTCTTT,ATTTCATCATTCCCACAAGAT
P12345,S1234,CCGTTATGCTATGGGTTATC,GAAGACGAAAGACGTATTAATAGAA,CAAGCCTAATGCAGTCAATATAC,GAACTGAATAAATTAAGTTCTTTGG
P12345,S1234,TTTGTCTTTGATTTCTTTGTTTAC,ATTTGCATAGAAACACCACCTA,GTGATTCAGGTAATACTCGGTAAC,AAATTTCATCATTCCCACAA
Each line is a reaction for a project whos designation is P12345, on a subclone that is designated S1234, and consists of four oligo
s. The first two are the left and right template oligos used to generate the PCR template, and the last two are the primers existing
between the two template primers that will be used in the sequencing reaction.
The bubble.csv file will contain a selection of one template and sequencing primer for each scaffold edge in the assembly. This is u
seful for anchored or bubble PCR applications and will look like so:
pcrType,pcrSubtype,chemistry,project,subcloneName,oligoOne,seqOligoSequence
J38842,G1459,AAACCATAGCGGATTAACG,GGGAATCCAGGACGTAAA
J38842,G1459,GTACCGGCTCGGATTCTT,GGTTGTTGTGTCCTTGAAAC
J38842,G1459,CGTGAACCTTGCCAACAA,CGATACAAAGTCGATTGAAA
The combo.csv will contain scaffold edge to scaffold edge primer pairings in every possible combination. So if you have two scaffold
s (4 scaffold edges), BOSS generates a reaction for each possible scaffold-to-scaffold relationship. combo.csv output looks like thi
s:
project,subcloneName,oligoOneSequence,oligoTwoSequence,seqOligoOneSequence,seqOligoTwoSequence
J38842,G1459,GTCCAGTATATCTGGGATAGCTTAT,CGTGAACCTTGCCAACAA,GAAAGAAGTACAGAAAGAAGTACAGA,CGATACAAAGTCGATTGAAA
J38842,G1459,GTCCAGTATATCTGGGATAGCTTAT,GTCAGCGGCACATTCTTT,GAAAGAAGTACAGAAAGAAGTACAGA,AAGTTCGTATTGGAAATCATTC
J38842,G1459,GTCCAGTATATCTGGGATAGCTTAT,TACAACCTTTGCTCCATCA,GAAAGAAGTACAGAAAGAAGTACAGA,TCAGGTAGCCTTATCACGTT
You will notice that the left template and sequencing primer is the same in all three reactions, but the right are not. This is BOSS
pairing one particular scaffold edge with three other scaffold edges.
Analysis output
In addition to the csv files, boss outputs several files to assist in analysis or validation of the primers. These are bbp.hits,bpp.
hits.se,bpp.stats and bpp.stats.se. The versions with the .se extension are the same as those without except that they are pertinent
to the scaffold edge primers used in the bubble.csv and combo.csv files.
The bpp.hits file is the analysis of each 4 primer set selected for each gap in the assembly. The file looks like this:
Gap Identifier T1 Hits T2 Hits S1 Hits S2 Hits Set Strength
contig00013_contig00014 4 1 1 1 Fair
contig00014_contig00015 1 1 1 1 Excellent
contig00015_contig00016 1 1 1 1 Excellent
Gap Identifier indicates the gap between the contigs to the left and right of the underscore. T1 Hits indicates how many "perfect" b
last hits were found that matched the left template oligo, T2 Hits for the right template oligo, S1 Hits for the left sequencing oli
go, and S2 Hits for the right sequencing oligo. Set Strength is a qualitative rating of the primer set. See Basic Help in section D
for details. Thus, this file gives you an idea of how unique the primers selected are with respect to the entire assembly.
The bpp.stats file gives coordinate information regarding the primers selected, and looks like this:
Gap Identifier T1-S T1-E T2-S T2-E S1-S S1-E S1-DTE S1-L S2-S S2-E S2-L Template Size
contig00015_contig00016 67703 67721 400 418 67825 67847 351 Y 207 230 Y 412
contig00016_contig00017 2444 2463 396 417 2573 2592 349 Y 200 218 Y 407
contig00017_contig00018 55441 55464 367 392 55503 55527 407 Y 228 247 Y 377
The first column identifies the gap as in the bpp.hits file. Columns T1-S,T2-S,S1-S,S2-S give the starting coordinates of each oligo
in its respective contig.T1-E,T2-E,S1-E,S2-E gives the ending coordinates of each oligo in its respective contig. S1-DTE indicates
the distance to the gap edge for the left sequencing primer. S2-S provides the same information for the right sequencing primer. S1-
L and S2-L are just confirmations that the sequencing primers are logically placed, that is, within the bounds of the template prime
rs. Template size indicates the distance between T1-S and T2-E, not including the missing sequence in the gap.
F) HELP!
If you need help with BOSS, please contact the author at [email protected]. Thank you for downloading BOSS V2.2.