Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Join Finder Beta Release

    I have just released a beta version of Join Finder, a Perl script for consed users that helps find joins between gap edges. This is the first I have released anything on Sourceforge so please forgive any oversights on my part. The script also requires blastall and formatdb be installed on your system.

    It can be downloaded here:
    Download Join Finder for free. Join Finder helps locate joins by blasting contig end sequence probes and looks for link-supported joins. Probes are BLASTed against 2KB ends of contigs, strict matches are examined against link information and potential joins are outputed to file.


    Join Finder is dependant on two NCBI Tools, formatdb and
    blastall. Blast can be downloaded here:

    Once you have a working Blast installation, continue with the
    installation instructions below.

    Also note, this script is designed to work with consed ace files and
    requires use to output a file from consed when running. See section C,
    BASIC HELP, for details.


    1) Download to a location in a linux machine.
    2) Open the program in a text editor such as xemacs.
    3) Edit line 22 so that the text between quotation marks is the
    explicit path to the local blastall installation. For example, if
    blastall is installed in
    /production/tools/blast/blast-2.2.14/bin/blastall, line 22 should

    my $blast_path="/production/tools/blast/blast-2.2.14/bin/blastall";

    4) Edit line 23 as you edited line 22, so that the text between quotation marks is the
    explicit path to formatdb. So it would look something like:

    my $formatdb_path="/production/tools/blast/blast-2.2.14/bin/formatdb";

    5) If you know how many cores the linux machine that blast is running
    on has, change the "4" in line 26 to that number. If you do not know,
    you can try changing this number to 1 if you have any problems.


    Prior to using the program, you must save an info.txt file using
    Consed for the ace file assembly you wish to analyze for joins. Do
    this by opening the ace file in consed and selecting Info>Show Maps of
    Contigs In Scaffolds>Save to File>OK.

    Join Finder helps locate joins by blasting contig end sequence probes and looks for link-supported joins.
    Sequence probes are BLASTed against 2KB ends of contigs, strict matches are examined against link
    information and potential joins are outputed in jf.results.
    l: Setting to adjust how many low-quality bases(less than 25) are allowed in probe sequences. Default is 0.
    p: Setting to adjust probe size in terms of bases. Default is 100.
    o: Specifiy alternate output file name. Default is jf.results.
    h: Print this help information.
    Ex(Default Usage):
    ~amr/bin/tools/ 454Contigs.ace.1 info.txt
    Ex(Advanced Usage): 454Contigs.ace.1 info.txt -l 5 -p 30 -o myjoins.txt

    A few notes regarding the options above:
    The option -l, for low-quality bases, instructs join finder to back away from the gap edge
    when selected probe sequence until this threshold is reached. For
    example, if -l is set to default(0), this probe would be accepted:


    But the one below would be rejected, because lower case bases indicate
    low quality, and there are 3 low quality bases. Hence join finder
    would slide away from the gap and try again.


    If -l is set to 5, however, the probe would be accepted, since the
    user elected to allow 5 low quality bases.

    The option -p instructs join finder on the size of probes to use when
    looking for joins. The default is 100, but you can find more joins
    with a smaller number. However, you will also find more false
    positives, in which a join is proposed that is really just a
    tandem repeat split by a gap. The script is a useful tool for finding
    joins quickly but you still must excercise your own judgement.


    Join finder outputs several files, but the most important is the file
    "jf.results". Output will look like this:

    Contig.RightEdge-Contig.LeftEdge Probe Sequence %ID RightStart RightEnd E-Val BitScore
    contig00013.right-contig00014.left TCGGGAGCAACAGAAACCGCTCCGCCGTCA 100.00 235 264 8e-12 60.0
    contig00015.right-contig00016.left ATGTTGCCGTATTTGAGGATGATGTCCTGT 100.00 58 87 8e-12 60.0
    contig00020.right-contig00021.left TCAGCTATTTAGAATAAaTTTtGAAaCTct 100.00 209 238 8e-12 60.0

    The first column indicates that the right edge of contig00013 has a
    potential join with the left edge of contig 00014. The Probe Sequence
    column indicates the sequence that matches on both sides of the gap,
    and can be used in a string search in consed. %ID, E-Val, and BitScore
    are taken from the blast file "jf_blast.out". Right Start and Right
    End are the coordinates of the probe sequence on the right side of the
    gap in which a join appears to exist.

    E) Please send bug reports and feature requests to
    [email protected]. While join finder is provided "as-is" I will
    try fix bugs or update features as time permits.

  • #2
    What is the advantage of using 'joinfinder' over using 'cross_match' in consed's AssemblyView? 'cross_match' is flexible, fast and output is integrated in consed.

    But I probably missed something,


    • #3
      Hi Sven,

      Thats a good question. Personally, I haven't had much luck finding joins with cross match. It may be some of the settings we are using, but I very often find joins between neighboring contigs that crossmatch doesn't find. I also find the display to be cluttered and distracting when using cross match.

      Aside from that, I find making joins this way to be more expedient. With cross match, you are continuously reloading the assembly view after each join. We have assemblies ranging from 20 gaps to 200 or more, so this gets tedious. With the join finder identified joins, you simply put the probe sequence in the string search window, bring up the two contigs, compare and then join them.

      If you achieve success with cross match, I would encourage you to stick with what works for you, but I find this method to be easier and faster.


      Latest Articles


      • seqadmin
        Best Practices for Single-Cell Sequencing Analysis
        by seqadmin

        While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
        06-06-2024, 07:15 AM
      • seqadmin
        Latest Developments in Precision Medicine
        by seqadmin

        Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

        Somatic Genomics
        “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
        05-24-2024, 01:16 PM





      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 07:23 AM
      0 responses
      Last Post seqadmin  
      Started by seqadmin, 06-17-2024, 06:54 AM
      0 responses
      Last Post seqadmin  
      Started by seqadmin, 06-14-2024, 07:24 AM
      0 responses
      Last Post seqadmin  
      Started by seqadmin, 06-13-2024, 08:58 AM
      0 responses
      Last Post seqadmin