Seqanswers Leaderboard Ad

**RDW** · 10-30-2013, 05:39 AM

Originally posted by sudders View Post

Anyone know of any other tool that will do this, or how to make oligoMatch work for me?

EMBOSS restrict?:

http://emboss.sourceforge.net/

EMBOSS, European Molecular Biology Open Source Software Suite, bioinformatics, EBI, European Bioinformatics Institute, Molecular Biology, Computational Biology, Open Source, Software, Database

EMBOSS: restrict

http://emboss.sourceforge.net/apps/release/6.6/emboss/apps/restrict.html

**sudders** · 10-31-2013, 05:30 AM

Cheers that worked great thanks.

For anyone reading this in the future though should note that our 128GB server didn't have enough memory to process the whole genome in one go and it was necessary to split it into chromosomes.

**rhinoceros** · 10-31-2013, 05:43 AM

Biopieces has digest_seq

Biopiece: digest_seq

Summary

Split sequences in the stream at a given restriction enzyme's cleavage sites.

Description

digest_seq split sequences in the stream at a given restriction
enzyme's cleavage sites. For each digestion product a new record is created and
output to the stream. digest_seq requires two arguments:

1. The restriction enzyme recognition pattern
2. Cut position relative to the above match

E.g. the common restriction enzyme BamHI recognizes the pattern GGATCC and the
cut position for this enzyme is 1 indicating that the cleavage site is after
the first G:

cut_pos 1 23456
pattern G|GATCC
^

Patterns can contain IUPAC ambiguity codes.

Records with digestion products have a REC_TYPE: DIGEST key/value pair added.
Furthermore, the sequence coordinates are appended to the sequence name in
brackets (0-based) and as S_BEG and S_END keys.

To obtain the reverse-complement products use reverse_seq and
complement_seq before digest_seq.

Usage

... | digest_seq [options]

Options

[-? | --help] # Print full usage description.
[-p <string> | --pattern=<string>] # Restriction enzyme recognition pattern.
[-c <int> | --cut_pos=<int>] # Cut position relative to pattern match.
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.

Examples

Consider the following sequence in the file test.fna:

>test
cgatcgatcGGATCCgagagggtgtgtagtgGAATTCcgctgc

We can digest this sequence with BamHI like this:

read_fasta -i test.fna | digest_seq -p GGATCC -c 1

SEQ_NAME: test[0-9]
SEQ: cgatcgatcG
SEQ_LEN: 10
S_BEG: 0
S_END: 9
REC_TYPE: DIGEST
---
SEQ_NAME: test[10-42]
SEQ: GATCCgagagggtgtgtagtgGAATTCcgctgc
SEQ_LEN: 33
S_BEG: 10
S_END: 42
REC_TYPE: DIGEST
---

To obtain the - strand digestion products we re-read the file and
reverse-complment the sequence before digestion like this:

read_fasta -i test.fna | reverse_seq | complement_seq | digest_seq -p GGATCC -c 1

SEQ_NAME: test[0-28]
SEQ: gcagcgGAATTCcactacacaccctctcG
SEQ_LEN: 29
S_BEG: 0
S_END: 28
REC_TYPE: DIGEST
---
SEQ_NAME: test[29-42]
SEQ: GATCCgatcgatcg
SEQ_LEN: 14
S_BEG: 29
S_END: 42
REC_TYPE: DIGEST
---

It is also possible to do restriction enzyme digestion with multipe enzymes
simply by piping the result through digest_seq multiple times. Here we
first digest with BamHI (pattern: GGATCC, cut_pos: 1) and then with EcoRI
(pattern: GAATTC, cut_pos: 1):

read_fasta -i test.fna | digest_seq -p GGATCC -c 1 | digest_seq -p GAATTC -c 1

SEQ_NAME: test[0-9][0-9]
SEQ: cgatcgatcG
SEQ_LEN: 10
S_BEG: 0
S_END: 9
REC_TYPE: DIGEST
---
SEQ_NAME: test[10-42][0-21]
SEQ: GATCCgagagggtgtgtagtgG
SEQ_LEN: 22
S_BEG: 0
S_END: 21
REC_TYPE: DIGEST
---
SEQ_NAME: test[10-42][22-32]
SEQ: AATTCcgctgc
SEQ_LEN: 11
S_BEG: 22
S_END: 32
REC_TYPE: DIGEST
---

See also

rescan_seq

read_fasta

reverse_seq

complement_seq

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

September 2010

License

GNU General Public License version 2

The GNU General Public License v3.0 - GNU Project - Free Software Foundation

http://www.gnu.org/copyleft/gpl.html

Help

digest_seq is part of the Biopieces framework.

www.biopieces.org

http://www.biopieces.org

Biopieces is a bioinformatic framework of tools easily used and easily created.

This is an option for RAM-limited devices.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Restriction Digests with ambiguity

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News