Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • rhinoceros
    replied
    Biopieces has digest_seq

    Biopiece: digest_seq

    Summary

    Split sequences in the stream at a given restriction enzyme's cleavage sites.

    Description

    digest_seq split sequences in the stream at a given restriction
    enzyme's cleavage sites. For each digestion product a new record is created and
    output to the stream. digest_seq requires two arguments:

    1. The restriction enzyme recognition pattern
    2. Cut position relative to the above match

    E.g. the common restriction enzyme BamHI recognizes the pattern GGATCC and the
    cut position for this enzyme is 1 indicating that the cleavage site is after
    the first G:

    cut_pos 1 23456
    pattern G|GATCC
    ^

    Patterns can contain IUPAC ambiguity codes.

    Records with digestion products have a REC_TYPE: DIGEST key/value pair added.
    Furthermore, the sequence coordinates are appended to the sequence name in
    brackets (0-based) and as S_BEG and S_END keys.

    To obtain the reverse-complement products use reverse_seq and
    complement_seq before digest_seq.

    Usage

    ... | digest_seq [options]

    Options

    [-? | --help] # Print full usage description.
    [-p <string> | --pattern=<string>] # Restriction enzyme recognition pattern.
    [-c <int> | --cut_pos=<int>] # Cut position relative to pattern match.
    [-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
    [-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
    [-v | --verbose] # Verbose output.

    Examples

    Consider the following sequence in the file test.fna:

    >test
    cgatcgatcGGATCCgagagggtgtgtagtgGAATTCcgctgc

    We can digest this sequence with BamHI like this:

    read_fasta -i test.fna | digest_seq -p GGATCC -c 1

    SEQ_NAME: test[0-9]
    SEQ: cgatcgatcG
    SEQ_LEN: 10
    S_BEG: 0
    S_END: 9
    REC_TYPE: DIGEST
    ---
    SEQ_NAME: test[10-42]
    SEQ: GATCCgagagggtgtgtagtgGAATTCcgctgc
    SEQ_LEN: 33
    S_BEG: 10
    S_END: 42
    REC_TYPE: DIGEST
    ---

    To obtain the - strand digestion products we re-read the file and
    reverse-complment the sequence before digestion like this:

    read_fasta -i test.fna | reverse_seq | complement_seq | digest_seq -p GGATCC -c 1

    SEQ_NAME: test[0-28]
    SEQ: gcagcgGAATTCcactacacaccctctcG
    SEQ_LEN: 29
    S_BEG: 0
    S_END: 28
    REC_TYPE: DIGEST
    ---
    SEQ_NAME: test[29-42]
    SEQ: GATCCgatcgatcg
    SEQ_LEN: 14
    S_BEG: 29
    S_END: 42
    REC_TYPE: DIGEST
    ---

    It is also possible to do restriction enzyme digestion with multipe enzymes
    simply by piping the result through digest_seq multiple times. Here we
    first digest with BamHI (pattern: GGATCC, cut_pos: 1) and then with EcoRI
    (pattern: GAATTC, cut_pos: 1):

    read_fasta -i test.fna | digest_seq -p GGATCC -c 1 | digest_seq -p GAATTC -c 1

    SEQ_NAME: test[0-9][0-9]
    SEQ: cgatcgatcG
    SEQ_LEN: 10
    S_BEG: 0
    S_END: 9
    REC_TYPE: DIGEST
    ---
    SEQ_NAME: test[10-42][0-21]
    SEQ: GATCCgagagggtgtgtagtgG
    SEQ_LEN: 22
    S_BEG: 0
    S_END: 21
    REC_TYPE: DIGEST
    ---
    SEQ_NAME: test[10-42][22-32]
    SEQ: AATTCcgctgc
    SEQ_LEN: 11
    S_BEG: 22
    S_END: 32
    REC_TYPE: DIGEST
    ---

    See also

    rescan_seq

    read_fasta

    reverse_seq

    complement_seq

    Author

    Martin Asser Hansen - Copyright (C) - All rights reserved.

    [email protected]

    September 2010

    License

    GNU General Public License version 2



    Help

    digest_seq is part of the Biopieces framework.

    Biopieces is a bioinformatic framework of tools easily used and easily created.
    This is an option for RAM-limited devices.

    Leave a comment:


  • sudders
    replied
    Cheers that worked great thanks.

    For anyone reading this in the future though should note that our 128GB server didn't have enough memory to process the whole genome in one go and it was necessary to split it into chromosomes.

    Leave a comment:


  • RDW
    replied
    Originally posted by sudders View Post
    Anyone know of any other tool that will do this, or how to make oligoMatch work for me?
    EMBOSS restrict?:

    EMBOSS, European Molecular Biology Open Source Software Suite, bioinformatics, EBI, European Bioinformatics Institute, Molecular Biology, Computational Biology, Open Source, Software, Database

    Leave a comment:


  • sudders
    started a topic Restriction Digests with ambiguity

    Restriction Digests with ambiguity

    Hi All,

    I'm trying to identify sites restriction sites in the genome. Some of the enzymes I'm using contain ambiguity, e.g. ApoI whose site is RAATTY.

    Initially i've been trying to use oligoMatch which is part of the UCSC download. But although it claims to take ambiguity codes, and processes my genome using the RAATTY site as input, it actually outputs any sites with AATT. Which is odd because UCSC say they use oligoMatch to produce the restriction tracks they show on the browser, which are correct.

    Anyone know of any other tool that will do this, or how to make oligoMatch work for me?

Latest Articles

Collapse

  • seqadmin
    Exploring the Dynamics of the Tumor Microenvironment
    by seqadmin




    The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
    07-08-2024, 03:19 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 06:46 AM
0 responses
9 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-24-2024, 11:09 AM
0 responses
26 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-19-2024, 07:20 AM
0 responses
159 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-16-2024, 05:49 AM
0 responses
127 views
0 likes
Last Post seqadmin  
Working...
X