Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Broadie
    Member
    • Oct 2009
    • 15

    Join Finder Beta Release

    I have just released a beta version of Join Finder, a Perl script for consed users that helps find joins between gap edges. This is the first I have released anything on Sourceforge so please forgive any oversights on my part. The script also requires blastall and formatdb be installed on your system.

    It can be downloaded here:
    Download Join Finder for free. Join Finder helps locate joins by blasting contig end sequence probes and looks for link-supported joins. Probes are BLASTed against 2KB ends of contigs, strict matches are examined against link information and potential joins are outputed to file.



    A) DEPENDENCIES

    Join Finder is dependant on two NCBI Tools, formatdb and
    blastall. Blast can be downloaded here:



    Once you have a working Blast installation, continue with the
    installation instructions below.

    Also note, this script is designed to work with consed ace files and
    requires use to output a file from consed when running. See section C,
    BASIC HELP, for details.

    B) INSTALLATION INSTRUCTIONS

    1) Download join_finder.pl to a location in a linux machine.
    2) Open the program in a text editor such as xemacs.
    3) Edit line 22 so that the text between quotation marks is the
    explicit path to the local blastall installation. For example, if
    blastall is installed in
    /production/tools/blast/blast-2.2.14/bin/blastall, line 22 should
    read:

    my $blast_path="/production/tools/blast/blast-2.2.14/bin/blastall";

    4) Edit line 23 as you edited line 22, so that the text between quotation marks is the
    explicit path to formatdb. So it would look something like:

    my $formatdb_path="/production/tools/blast/blast-2.2.14/bin/formatdb";

    5) If you know how many cores the linux machine that blast is running
    on has, change the "4" in line 26 to that number. If you do not know,
    you can try changing this number to 1 if you have any problems.

    C) BASIC HELP

    Prior to using the program, you must save an info.txt file using
    Consed for the ace file assembly you wish to analyze for joins. Do
    this by opening the ace file in consed and selecting Info>Show Maps of
    Contigs In Scaffolds>Save to File>OK.

    DESCRIPTION
    Join Finder helps locate joins by blasting contig end sequence probes and looks for link-supported joins.
    Sequence probes are BLASTed against 2KB ends of contigs, strict matches are examined against link
    information and potential joins are outputed in jf.results.
    -----------
    OPTIONS
    l: Setting to adjust how many low-quality bases(less than 25) are allowed in probe sequences. Default is 0.
    p: Setting to adjust probe size in terms of bases. Default is 100.
    o: Specifiy alternate output file name. Default is jf.results.
    h: Print this help information.
    -------
    USAGE
    join_finder.pl <ACE FILE> <ONO FILE> -l <INTEGER> -p <INTEGER> -o <OUTPUT FILE NAME>
    Ex(Default Usage):
    ~amr/bin/tools/join_finder.pl 454Contigs.ace.1 info.txt
    Ex(Advanced Usage):
    join_finder.pl 454Contigs.ace.1 info.txt -l 5 -p 30 -o myjoins.txt
    -----

    A few notes regarding the options above:
    The option -l, for low-quality bases, instructs join finder to back away from the gap edge
    when selected probe sequence until this threshold is reached. For
    example, if -l is set to default(0), this probe would be accepted:

    TTCGGGTAACTTCCACTTCGTCATTCCCGCG

    But the one below would be rejected, because lower case bases indicate
    low quality, and there are 3 low quality bases. Hence join finder
    would slide away from the gap and try again.

    ttcGGGTAACTTCCACTTCGTCATTCCCGCG

    If -l is set to 5, however, the probe would be accepted, since the
    user elected to allow 5 low quality bases.

    The option -p instructs join finder on the size of probes to use when
    looking for joins. The default is 100, but you can find more joins
    with a smaller number. However, you will also find more false
    positives, in which a join is proposed that is really just a
    tandem repeat split by a gap. The script is a useful tool for finding
    joins quickly but you still must excercise your own judgement.

    D) JOIN FINDER OUTPUT

    Join finder outputs several files, but the most important is the file
    "jf.results". Output will look like this:

    Contig.RightEdge-Contig.LeftEdge Probe Sequence %ID RightStart RightEnd E-Val BitScore
    contig00013.right-contig00014.left TCGGGAGCAACAGAAACCGCTCCGCCGTCA 100.00 235 264 8e-12 60.0
    contig00015.right-contig00016.left ATGTTGCCGTATTTGAGGATGATGTCCTGT 100.00 58 87 8e-12 60.0
    contig00020.right-contig00021.left TCAGCTATTTAGAATAAaTTTtGAAaCTct 100.00 209 238 8e-12 60.0

    The first column indicates that the right edge of contig00013 has a
    potential join with the left edge of contig 00014. The Probe Sequence
    column indicates the sequence that matches on both sides of the gap,
    and can be used in a string search in consed. %ID, E-Val, and BitScore
    are taken from the blast file "jf_blast.out". Right Start and Right
    End are the coordinates of the probe sequence on the right side of the
    gap in which a join appears to exist.

    E) Please send bug reports and feature requests to
    [email protected]. While join finder is provided "as-is" I will
    try fix bugs or update features as time permits.
  • sklages
    Senior Member
    • May 2008
    • 628

    #2
    What is the advantage of using 'joinfinder' over using 'cross_match' in consed's AssemblyView? 'cross_match' is flexible, fast and output is integrated in consed.

    But I probably missed something,
    cheers,
    Sven

    Comment

    • Broadie
      Member
      • Oct 2009
      • 15

      #3
      Hi Sven,

      Thats a good question. Personally, I haven't had much luck finding joins with cross match. It may be some of the settings we are using, but I very often find joins between neighboring contigs that crossmatch doesn't find. I also find the display to be cluttered and distracting when using cross match.

      Aside from that, I find making joins this way to be more expedient. With cross match, you are continuously reloading the assembly view after each join. We have assemblies ranging from 20 gaps to 200 or more, so this gets tedious. With the join finder identified joins, you simply put the probe sequence in the string search window, bring up the two contigs, compare and then join them.

      If you achieve success with cross match, I would encourage you to stick with what works for you, but I find this method to be easier and faster.

      Comment

      Latest Articles

      Collapse

      • SEQadmin2
        Nine Things a Sample Prep Scientist Thinks About Before Sequencing
        by SEQadmin2


        I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

        Here are nine questions we think about, in roughly the order they matter, before...
        06-18-2026, 07:11 AM
      • SEQadmin2
        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
        by SEQadmin2


        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
        ...
        06-02-2026, 10:05 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, 06-17-2026, 06:09 AM
      0 responses
      36 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-09-2026, 11:58 AM
      0 responses
      99 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-05-2026, 10:09 AM
      0 responses
      120 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-04-2026, 08:59 AM
      0 responses
      113 views
      0 reactions
      Last Post SEQadmin2  
      Working...