Announcement

Collapse
No announcement yet.

Is there a in silico enzyme digestion script?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is there a in silico enzyme digestion script?

    I am looking for a in silico enzyme digestion program which I can input the enzyme name, cognite sequence, and the genome to be scanned against. the output will the size distribution of the resulting fragment, also a BED format file with fragment coordiate, like chrom, start, end etc. Anyone has such a program and would like to share with me. Thanks a lot!

  • #2
    Hello,


    This can be done with Biopieces (www.biopieces.org) using digest_seq and BamHI as an example:


    Code:
    read_fasta -i genome.fna | digest_seq -p GGATCC -c 1 | plot_lendist -k SEQ_LEN -x
    To get a BED file:

    Code:
    read_fasta -i genome.fna | digest_seq -p GGATCC -c 1 | rename_keys -k SEQ_NAME,S_ID | write_bed -xo fragments.bed
    Or to do both in one go:

    Code:
    read_fasta -i genome.fna |
    digest_seq -p GGATCC -c 1 |
    plot_lendist -k SEQ_LEN -t post -o dist_plot.ps |
    rename_keys -k SEQ_NAME,S_ID |
    write_bed -xo fragments.bed

    Restriction enzyme patterns and cut positions are found at REBASE http://rebase.neb.com - or by typing "rescan_seq --help"


    Cheers,


    Martin

    Comment


    • #3
      You definitely should take a look at the remap tool from the EMBOSS package.
      http://emboss.bioinformatics.nl/cgi-bin/emboss/remap
      Cheers,
      Adhemar

      Comment


      • #4
        thank you!

        Thank you for helping me out! I will give a try.

        Comment


        • #5
          I have difficulty to run the command. I installed the packages in my desktop, and follow the instructions which listed in the web. I am not sure whether the code is sourced, and I run the test code, it seems nothing changed. Could you give more detailed information on how to install it and test it since I am a bench scientist, not that familiar with the command line program. Thanks

          [email protected]:~/Desktop/biopieces$ bp_test
          bp_test: command not found

          Comment


          • #6
            Did you add the following section to your ~/.bashrc file:

            Code:
            # >>>>>>>>>>>>>>>>>>>>>>> Enabling Biopieces if installed <<<<<<<<<<<<<<<<<<<<<<<
            
            # Modify the below paths according to your settings.
            # If you have followed the installation step-by-step as described above,
            # the below should work just fine.
            
            export BP_DIR="$HOME/biopieces"  # Directory where biopieces are installed
            export BP_DATA="$HOME/BP_DATA"   # Contains genomic data etc.
            export BP_TMP="$HOME/tmp"        # Required temporary directory.
            export BP_LOG="$HOME/BP_LOG"     # Required log directory.
            
            if [ -f "$BP_DIR/bp_conf/bashrc" ]; then
                source "$BP_DIR/bp_conf/bashrc"
            fi  
            
            # >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
            AND

            run the command

            Code:
            source ~/.bashrc

            Martin

            Comment


            • #7
              Hi, I am confronting the same problem, in silico digestion for CCGG .
              I have a file with hg19 and one line per chromosome sequence and I do :

              Code:
              cat hg19.txt | sed "s/[COLOR="DarkRed"]CCGG[/COLOR]/\n/g" | awk '{l=length($1); mem[l]++;}
              END{for(i=0;i<=1000;i++){print mem[i]}}'
              Is it stupid ? I don t understand why this team have so different results ....

              Here is my results for instance : I have 9975 time one nucleotide between 2 CCGG's

              Any idea ?

              Comment

              Working...
              X