Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Fad2012
    Member
    • Sep 2012
    • 62

    Extract several DNA ranges from reference sequence. (need help)

    Hello everybody

    I am trying to find a ready Perl scrip, or any equivalent solution, to help me my data analysis task.

    I need a code that takes a CVS file contains DNA ranges, and extract them from two reference sequences, then append them, and write them to a fasta file.

    For example:

    DNA ref. Sequence 1:
    AAAAAGGGGG

    DNA ref. Sequence 2:
    CCCCCTTTTTT

    The CVS file contains five columns, the first is the name of that particular range, next two columns define the range from the first sequence, i.e. where to start extraction and where to end it, and the last two describe the-the range from the second sequence, for example:

    seq1 1 5 5 10
    seq2 5 10 1 5
    seq3 1 6 4 10

    The Perl script output will be

    >seq1
    AAAAATTTTT

    >seq2
    GGGGGCCCCC

    >seq3
    AAAAAGCCTTTTTT

    The only similar tool I found is the DNA range extractor, part of Sequence Manipulation Suite. However, it can extract only one range per time per sequence, which makes it unsuitable for extracting hundreds of ranges.

    Many thanks
    Fadi
  • fanli
    Senior Member
    • Jul 2014
    • 197

    #2
    This seems like exactly the type of beginning Perl script that would be worth your time to figure out...

    Comment

    • Fad2012
      Member
      • Sep 2012
      • 62

      #3
      Thank you fanli, I appreciate your suggestion.

      I am in the final stage of my research, and wanted to look around if there is any ready code that can help me, so I can save myself some time for another task...

      In regards to the code's complexity, I am not sure if we can assume that this is a beginners' task especially for a biologist who has no bioinformatics background.

      In all cases, here what I wrote to do the job. This extracts two sequences (defined by their location ) of 10 nts each from the ref sequences and appends them in one sequence.

      Code:
      #!/ usr/bin/perl
      # subtraction.pl
      use strict; use warnings;
      
      my $seq1="AAAAATTTTTAAAAAAATTTATATAGGAGAGAGAGAGACCCAAAAATATAA";
      my $seq2="aaaaatttttaaaaaaatttatataggagagagagagacccaaaaatataa";
      
      while (<>) {
      	
      	my @locations = split(/\t/, $_);
      	my $seqName = $locations[0];	
      	my $seq1Start = $locations[1]-10;
      	my $seq1End = $locations[1]-1;
      	my $seq2Start = $locations[2]-1;
      	my $seq2End = $locations[2]+10;
      	
      	my $seq1_part = substr $seq1, $seq1Start, 10; 
      	my $seq2_part = substr $seq2, $seq2Start, 10;   
      		
      	print ">$seqName\n $seq1_part$seq2_part\n";
      It takes location from a tab separated values file (seqName start end)
      Code:
      1	10	20
      2	20	40
      3	30	50

      Comment

      Latest Articles

      Collapse

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, 06-09-2026, 11:58 AM
      0 responses
      24 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-05-2026, 10:09 AM
      0 responses
      29 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-04-2026, 08:59 AM
      0 responses
      39 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 12:03 PM
      0 responses
      61 views
      0 reactions
      Last Post SEQadmin2  
      Working...