Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Join Finder Beta Release

    I have just released a beta version of Join Finder, a Perl script for consed users that helps find joins between gap edges. This is the first I have released anything on Sourceforge so please forgive any oversights on my part. The script also requires blastall and formatdb be installed on your system.

    It can be downloaded here:
    Download Join Finder for free. Join Finder helps locate joins by blasting contig end sequence probes and looks for link-supported joins. Probes are BLASTed against 2KB ends of contigs, strict matches are examined against link information and potential joins are outputed to file.



    A) DEPENDENCIES

    Join Finder is dependant on two NCBI Tools, formatdb and
    blastall. Blast can be downloaded here:



    Once you have a working Blast installation, continue with the
    installation instructions below.

    Also note, this script is designed to work with consed ace files and
    requires use to output a file from consed when running. See section C,
    BASIC HELP, for details.

    B) INSTALLATION INSTRUCTIONS

    1) Download join_finder.pl to a location in a linux machine.
    2) Open the program in a text editor such as xemacs.
    3) Edit line 22 so that the text between quotation marks is the
    explicit path to the local blastall installation. For example, if
    blastall is installed in
    /production/tools/blast/blast-2.2.14/bin/blastall, line 22 should
    read:

    my $blast_path="/production/tools/blast/blast-2.2.14/bin/blastall";

    4) Edit line 23 as you edited line 22, so that the text between quotation marks is the
    explicit path to formatdb. So it would look something like:

    my $formatdb_path="/production/tools/blast/blast-2.2.14/bin/formatdb";

    5) If you know how many cores the linux machine that blast is running
    on has, change the "4" in line 26 to that number. If you do not know,
    you can try changing this number to 1 if you have any problems.

    C) BASIC HELP

    Prior to using the program, you must save an info.txt file using
    Consed for the ace file assembly you wish to analyze for joins. Do
    this by opening the ace file in consed and selecting Info>Show Maps of
    Contigs In Scaffolds>Save to File>OK.

    DESCRIPTION
    Join Finder helps locate joins by blasting contig end sequence probes and looks for link-supported joins.
    Sequence probes are BLASTed against 2KB ends of contigs, strict matches are examined against link
    information and potential joins are outputed in jf.results.
    -----------
    OPTIONS
    l: Setting to adjust how many low-quality bases(less than 25) are allowed in probe sequences. Default is 0.
    p: Setting to adjust probe size in terms of bases. Default is 100.
    o: Specifiy alternate output file name. Default is jf.results.
    h: Print this help information.
    -------
    USAGE
    join_finder.pl <ACE FILE> <ONO FILE> -l <INTEGER> -p <INTEGER> -o <OUTPUT FILE NAME>
    Ex(Default Usage):
    ~amr/bin/tools/join_finder.pl 454Contigs.ace.1 info.txt
    Ex(Advanced Usage):
    join_finder.pl 454Contigs.ace.1 info.txt -l 5 -p 30 -o myjoins.txt
    -----

    A few notes regarding the options above:
    The option -l, for low-quality bases, instructs join finder to back away from the gap edge
    when selected probe sequence until this threshold is reached. For
    example, if -l is set to default(0), this probe would be accepted:

    TTCGGGTAACTTCCACTTCGTCATTCCCGCG

    But the one below would be rejected, because lower case bases indicate
    low quality, and there are 3 low quality bases. Hence join finder
    would slide away from the gap and try again.

    ttcGGGTAACTTCCACTTCGTCATTCCCGCG

    If -l is set to 5, however, the probe would be accepted, since the
    user elected to allow 5 low quality bases.

    The option -p instructs join finder on the size of probes to use when
    looking for joins. The default is 100, but you can find more joins
    with a smaller number. However, you will also find more false
    positives, in which a join is proposed that is really just a
    tandem repeat split by a gap. The script is a useful tool for finding
    joins quickly but you still must excercise your own judgement.

    D) JOIN FINDER OUTPUT

    Join finder outputs several files, but the most important is the file
    "jf.results". Output will look like this:

    Contig.RightEdge-Contig.LeftEdge Probe Sequence %ID RightStart RightEnd E-Val BitScore
    contig00013.right-contig00014.left TCGGGAGCAACAGAAACCGCTCCGCCGTCA 100.00 235 264 8e-12 60.0
    contig00015.right-contig00016.left ATGTTGCCGTATTTGAGGATGATGTCCTGT 100.00 58 87 8e-12 60.0
    contig00020.right-contig00021.left TCAGCTATTTAGAATAAaTTTtGAAaCTct 100.00 209 238 8e-12 60.0

    The first column indicates that the right edge of contig00013 has a
    potential join with the left edge of contig 00014. The Probe Sequence
    column indicates the sequence that matches on both sides of the gap,
    and can be used in a string search in consed. %ID, E-Val, and BitScore
    are taken from the blast file "jf_blast.out". Right Start and Right
    End are the coordinates of the probe sequence on the right side of the
    gap in which a join appears to exist.

    E) Please send bug reports and feature requests to
    [email protected]. While join finder is provided "as-is" I will
    try fix bugs or update features as time permits.

  • #2
    What is the advantage of using 'joinfinder' over using 'cross_match' in consed's AssemblyView? 'cross_match' is flexible, fast and output is integrated in consed.

    But I probably missed something,
    cheers,
    Sven

    Comment


    • #3
      Hi Sven,

      Thats a good question. Personally, I haven't had much luck finding joins with cross match. It may be some of the settings we are using, but I very often find joins between neighboring contigs that crossmatch doesn't find. I also find the display to be cluttered and distracting when using cross match.

      Aside from that, I find making joins this way to be more expedient. With cross match, you are continuously reloading the assembly view after each join. We have assemblies ranging from 20 gaps to 200 or more, so this gets tedious. With the join finder identified joins, you simply put the probe sequence in the string search window, bring up the two contigs, compare and then join them.

      If you achieve success with cross match, I would encourage you to stick with what works for you, but I find this method to be easier and faster.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-25-2024, 11:49 AM
      0 responses
      19 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-24-2024, 08:47 AM
      0 responses
      19 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      62 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Working...
      X