Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • sudders
    Member
    • Dec 2011
    • 32

    Restriction Digests with ambiguity

    Hi All,

    I'm trying to identify sites restriction sites in the genome. Some of the enzymes I'm using contain ambiguity, e.g. ApoI whose site is RAATTY.

    Initially i've been trying to use oligoMatch which is part of the UCSC download. But although it claims to take ambiguity codes, and processes my genome using the RAATTY site as input, it actually outputs any sites with AATT. Which is odd because UCSC say they use oligoMatch to produce the restriction tracks they show on the browser, which are correct.

    Anyone know of any other tool that will do this, or how to make oligoMatch work for me?
  • RDW
    Member
    • Oct 2008
    • 63

    #2
    Originally posted by sudders View Post
    Anyone know of any other tool that will do this, or how to make oligoMatch work for me?
    EMBOSS restrict?:

    EMBOSS, European Molecular Biology Open Source Software Suite, bioinformatics, EBI, European Bioinformatics Institute, Molecular Biology, Computational Biology, Open Source, Software, Database

    Comment

    • sudders
      Member
      • Dec 2011
      • 32

      #3
      Cheers that worked great thanks.

      For anyone reading this in the future though should note that our 128GB server didn't have enough memory to process the whole genome in one go and it was necessary to split it into chromosomes.

      Comment

      • rhinoceros
        Senior Member
        • Apr 2013
        • 372

        #4
        Biopieces has digest_seq

        Biopiece: digest_seq

        Summary

        Split sequences in the stream at a given restriction enzyme's cleavage sites.

        Description

        digest_seq split sequences in the stream at a given restriction
        enzyme's cleavage sites. For each digestion product a new record is created and
        output to the stream. digest_seq requires two arguments:

        1. The restriction enzyme recognition pattern
        2. Cut position relative to the above match

        E.g. the common restriction enzyme BamHI recognizes the pattern GGATCC and the
        cut position for this enzyme is 1 indicating that the cleavage site is after
        the first G:

        cut_pos 1 23456
        pattern G|GATCC
        ^

        Patterns can contain IUPAC ambiguity codes.

        Records with digestion products have a REC_TYPE: DIGEST key/value pair added.
        Furthermore, the sequence coordinates are appended to the sequence name in
        brackets (0-based) and as S_BEG and S_END keys.

        To obtain the reverse-complement products use reverse_seq and
        complement_seq before digest_seq.

        Usage

        ... | digest_seq [options]

        Options

        [-? | --help] # Print full usage description.
        [-p <string> | --pattern=<string>] # Restriction enzyme recognition pattern.
        [-c <int> | --cut_pos=<int>] # Cut position relative to pattern match.
        [-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
        [-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
        [-v | --verbose] # Verbose output.

        Examples

        Consider the following sequence in the file test.fna:

        >test
        cgatcgatcGGATCCgagagggtgtgtagtgGAATTCcgctgc

        We can digest this sequence with BamHI like this:

        read_fasta -i test.fna | digest_seq -p GGATCC -c 1

        SEQ_NAME: test[0-9]
        SEQ: cgatcgatcG
        SEQ_LEN: 10
        S_BEG: 0
        S_END: 9
        REC_TYPE: DIGEST
        ---
        SEQ_NAME: test[10-42]
        SEQ: GATCCgagagggtgtgtagtgGAATTCcgctgc
        SEQ_LEN: 33
        S_BEG: 10
        S_END: 42
        REC_TYPE: DIGEST
        ---

        To obtain the - strand digestion products we re-read the file and
        reverse-complment the sequence before digestion like this:

        read_fasta -i test.fna | reverse_seq | complement_seq | digest_seq -p GGATCC -c 1

        SEQ_NAME: test[0-28]
        SEQ: gcagcgGAATTCcactacacaccctctcG
        SEQ_LEN: 29
        S_BEG: 0
        S_END: 28
        REC_TYPE: DIGEST
        ---
        SEQ_NAME: test[29-42]
        SEQ: GATCCgatcgatcg
        SEQ_LEN: 14
        S_BEG: 29
        S_END: 42
        REC_TYPE: DIGEST
        ---

        It is also possible to do restriction enzyme digestion with multipe enzymes
        simply by piping the result through digest_seq multiple times. Here we
        first digest with BamHI (pattern: GGATCC, cut_pos: 1) and then with EcoRI
        (pattern: GAATTC, cut_pos: 1):

        read_fasta -i test.fna | digest_seq -p GGATCC -c 1 | digest_seq -p GAATTC -c 1

        SEQ_NAME: test[0-9][0-9]
        SEQ: cgatcgatcG
        SEQ_LEN: 10
        S_BEG: 0
        S_END: 9
        REC_TYPE: DIGEST
        ---
        SEQ_NAME: test[10-42][0-21]
        SEQ: GATCCgagagggtgtgtagtgG
        SEQ_LEN: 22
        S_BEG: 0
        S_END: 21
        REC_TYPE: DIGEST
        ---
        SEQ_NAME: test[10-42][22-32]
        SEQ: AATTCcgctgc
        SEQ_LEN: 11
        S_BEG: 22
        S_END: 32
        REC_TYPE: DIGEST
        ---

        See also

        rescan_seq

        read_fasta

        reverse_seq

        complement_seq

        Author

        Martin Asser Hansen - Copyright (C) - All rights reserved.

        [email protected]

        September 2010

        License

        GNU General Public License version 2



        Help

        digest_seq is part of the Biopieces framework.

        This is an option for RAM-limited devices.
        savetherhino.org

        Comment

        Latest Articles

        Collapse

        • GATTACAT
          Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by GATTACAT
          Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
          07-01-2026, 11:43 AM
        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

          Here are nine questions we think about, in roughly the order they matter, before...
          06-18-2026, 07:11 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, 07-02-2026, 11:08 AM
        0 responses
        16 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-30-2026, 05:37 AM
        0 responses
        17 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-26-2026, 11:10 AM
        0 responses
        21 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        54 views
        0 reactions
        Last Post SEQadmin2  
        Working...