I have just released a beta version of Join Finder, a Perl script for consed users that helps find joins between gap edges. This is the first I have released anything on Sourceforge so please forgive any oversights on my part. The script also requires blastall and formatdb be installed on your system.
It can be downloaded here:
A) DEPENDENCIES
Join Finder is dependant on two NCBI Tools, formatdb and
blastall. Blast can be downloaded here:
Once you have a working Blast installation, continue with the
installation instructions below.
Also note, this script is designed to work with consed ace files and
requires use to output a file from consed when running. See section C,
BASIC HELP, for details.
B) INSTALLATION INSTRUCTIONS
1) Download join_finder.pl to a location in a linux machine.
2) Open the program in a text editor such as xemacs.
3) Edit line 22 so that the text between quotation marks is the
explicit path to the local blastall installation. For example, if
blastall is installed in
/production/tools/blast/blast-2.2.14/bin/blastall, line 22 should
read:
my $blast_path="/production/tools/blast/blast-2.2.14/bin/blastall";
4) Edit line 23 as you edited line 22, so that the text between quotation marks is the
explicit path to formatdb. So it would look something like:
my $formatdb_path="/production/tools/blast/blast-2.2.14/bin/formatdb";
5) If you know how many cores the linux machine that blast is running
on has, change the "4" in line 26 to that number. If you do not know,
you can try changing this number to 1 if you have any problems.
C) BASIC HELP
Prior to using the program, you must save an info.txt file using
Consed for the ace file assembly you wish to analyze for joins. Do
this by opening the ace file in consed and selecting Info>Show Maps of
Contigs In Scaffolds>Save to File>OK.
DESCRIPTION
Join Finder helps locate joins by blasting contig end sequence probes and looks for link-supported joins.
Sequence probes are BLASTed against 2KB ends of contigs, strict matches are examined against link
information and potential joins are outputed in jf.results.
-----------
OPTIONS
l: Setting to adjust how many low-quality bases(less than 25) are allowed in probe sequences. Default is 0.
p: Setting to adjust probe size in terms of bases. Default is 100.
o: Specifiy alternate output file name. Default is jf.results.
h: Print this help information.
-------
USAGE
join_finder.pl <ACE FILE> <ONO FILE> -l <INTEGER> -p <INTEGER> -o <OUTPUT FILE NAME>
Ex(Default Usage):
~amr/bin/tools/join_finder.pl 454Contigs.ace.1 info.txt
Ex(Advanced Usage):
join_finder.pl 454Contigs.ace.1 info.txt -l 5 -p 30 -o myjoins.txt
-----
A few notes regarding the options above:
The option -l, for low-quality bases, instructs join finder to back away from the gap edge
when selected probe sequence until this threshold is reached. For
example, if -l is set to default(0), this probe would be accepted:
TTCGGGTAACTTCCACTTCGTCATTCCCGCG
But the one below would be rejected, because lower case bases indicate
low quality, and there are 3 low quality bases. Hence join finder
would slide away from the gap and try again.
ttcGGGTAACTTCCACTTCGTCATTCCCGCG
If -l is set to 5, however, the probe would be accepted, since the
user elected to allow 5 low quality bases.
The option -p instructs join finder on the size of probes to use when
looking for joins. The default is 100, but you can find more joins
with a smaller number. However, you will also find more false
positives, in which a join is proposed that is really just a
tandem repeat split by a gap. The script is a useful tool for finding
joins quickly but you still must excercise your own judgement.
D) JOIN FINDER OUTPUT
Join finder outputs several files, but the most important is the file
"jf.results". Output will look like this:
Contig.RightEdge-Contig.LeftEdge Probe Sequence %ID RightStart RightEnd E-Val BitScore
contig00013.right-contig00014.left TCGGGAGCAACAGAAACCGCTCCGCCGTCA 100.00 235 264 8e-12 60.0
contig00015.right-contig00016.left ATGTTGCCGTATTTGAGGATGATGTCCTGT 100.00 58 87 8e-12 60.0
contig00020.right-contig00021.left TCAGCTATTTAGAATAAaTTTtGAAaCTct 100.00 209 238 8e-12 60.0
The first column indicates that the right edge of contig00013 has a
potential join with the left edge of contig 00014. The Probe Sequence
column indicates the sequence that matches on both sides of the gap,
and can be used in a string search in consed. %ID, E-Val, and BitScore
are taken from the blast file "jf_blast.out". Right Start and Right
End are the coordinates of the probe sequence on the right side of the
gap in which a join appears to exist.
E) Please send bug reports and feature requests to
[email protected]. While join finder is provided "as-is" I will
try fix bugs or update features as time permits.
It can be downloaded here:
A) DEPENDENCIES
Join Finder is dependant on two NCBI Tools, formatdb and
blastall. Blast can be downloaded here:
Once you have a working Blast installation, continue with the
installation instructions below.
Also note, this script is designed to work with consed ace files and
requires use to output a file from consed when running. See section C,
BASIC HELP, for details.
B) INSTALLATION INSTRUCTIONS
1) Download join_finder.pl to a location in a linux machine.
2) Open the program in a text editor such as xemacs.
3) Edit line 22 so that the text between quotation marks is the
explicit path to the local blastall installation. For example, if
blastall is installed in
/production/tools/blast/blast-2.2.14/bin/blastall, line 22 should
read:
my $blast_path="/production/tools/blast/blast-2.2.14/bin/blastall";
4) Edit line 23 as you edited line 22, so that the text between quotation marks is the
explicit path to formatdb. So it would look something like:
my $formatdb_path="/production/tools/blast/blast-2.2.14/bin/formatdb";
5) If you know how many cores the linux machine that blast is running
on has, change the "4" in line 26 to that number. If you do not know,
you can try changing this number to 1 if you have any problems.
C) BASIC HELP
Prior to using the program, you must save an info.txt file using
Consed for the ace file assembly you wish to analyze for joins. Do
this by opening the ace file in consed and selecting Info>Show Maps of
Contigs In Scaffolds>Save to File>OK.
DESCRIPTION
Join Finder helps locate joins by blasting contig end sequence probes and looks for link-supported joins.
Sequence probes are BLASTed against 2KB ends of contigs, strict matches are examined against link
information and potential joins are outputed in jf.results.
-----------
OPTIONS
l: Setting to adjust how many low-quality bases(less than 25) are allowed in probe sequences. Default is 0.
p: Setting to adjust probe size in terms of bases. Default is 100.
o: Specifiy alternate output file name. Default is jf.results.
h: Print this help information.
-------
USAGE
join_finder.pl <ACE FILE> <ONO FILE> -l <INTEGER> -p <INTEGER> -o <OUTPUT FILE NAME>
Ex(Default Usage):
~amr/bin/tools/join_finder.pl 454Contigs.ace.1 info.txt
Ex(Advanced Usage):
join_finder.pl 454Contigs.ace.1 info.txt -l 5 -p 30 -o myjoins.txt
-----
A few notes regarding the options above:
The option -l, for low-quality bases, instructs join finder to back away from the gap edge
when selected probe sequence until this threshold is reached. For
example, if -l is set to default(0), this probe would be accepted:
TTCGGGTAACTTCCACTTCGTCATTCCCGCG
But the one below would be rejected, because lower case bases indicate
low quality, and there are 3 low quality bases. Hence join finder
would slide away from the gap and try again.
ttcGGGTAACTTCCACTTCGTCATTCCCGCG
If -l is set to 5, however, the probe would be accepted, since the
user elected to allow 5 low quality bases.
The option -p instructs join finder on the size of probes to use when
looking for joins. The default is 100, but you can find more joins
with a smaller number. However, you will also find more false
positives, in which a join is proposed that is really just a
tandem repeat split by a gap. The script is a useful tool for finding
joins quickly but you still must excercise your own judgement.
D) JOIN FINDER OUTPUT
Join finder outputs several files, but the most important is the file
"jf.results". Output will look like this:
Contig.RightEdge-Contig.LeftEdge Probe Sequence %ID RightStart RightEnd E-Val BitScore
contig00013.right-contig00014.left TCGGGAGCAACAGAAACCGCTCCGCCGTCA 100.00 235 264 8e-12 60.0
contig00015.right-contig00016.left ATGTTGCCGTATTTGAGGATGATGTCCTGT 100.00 58 87 8e-12 60.0
contig00020.right-contig00021.left TCAGCTATTTAGAATAAaTTTtGAAaCTct 100.00 209 238 8e-12 60.0
The first column indicates that the right edge of contig00013 has a
potential join with the left edge of contig 00014. The Probe Sequence
column indicates the sequence that matches on both sides of the gap,
and can be used in a string search in consed. %ID, E-Val, and BitScore
are taken from the blast file "jf_blast.out". Right Start and Right
End are the coordinates of the probe sequence on the right side of the
gap in which a join appears to exist.
E) Please send bug reports and feature requests to
[email protected]. While join finder is provided "as-is" I will
try fix bugs or update features as time permits.
Comment