Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • pony2001mx
    Member
    • Aug 2013
    • 32

    ask perl script: break contigs into overlapping sequences

    Dear All,
    I am a perl beginner. I have a fasta file with many contigs sequences, and need to break these contigs into 2kb overlapping fragments (with overlap length of 100bp). Could anyone help to write a perl script for me, when you have spare time? I will greatly appreciate your help? THANKS!
  • zhidkov.ilia
    Member
    • Dec 2010
    • 25

    #2
    Sounds like question to PerlMonks forum, you can ask there how properly use 'substr' function for your tasks.

    Comment

    • sklages
      Senior Member
      • May 2008
      • 628

      #3
      You need to get an idea on a) how to parse multi fasta files b) how to split each individual sequence found in your file.

      a) http://lmgtfy.com/?q=perl+parse+fasta+file
      b) http://lmgtfy.com/?q=perl+split+large+genome+sequence

      It's a good exercise for a beginner ..

      Comment

      • pony2001mx
        Member
        • Aug 2013
        • 32

        #4
        Dear zhidkov.ilia and sklages,
        THANKS A LOT for your replys. I think again. I can use a simplified method, i.e., combine all contigs into one sequence (I can do this), then break/split the sequence into every 2kb fragments (I need a script for this). Would you or otheres please generate this script for me? GREATLY APPRECIATE YOUR HELPS!!

        Comment

        • pony2001mx
          Member
          • Aug 2013
          • 32

          #5
          perl script:break contig into 2kb sequences

          Dear zhidkov.ilia and sklages,
          THANKS A LOT for your replys. I think again. I can use a simplified method, i.e., combine all contigs into one sequence (I can do this), then break/split the sequence into every 2kb fragments (I need a script for this). Would you or otheres please generate this script for me? GREATLY APPRECIATE YOUR HELPS!!

          Comment

          • bruce01
            Senior Member
            • Mar 2011
            • 160

            #6
            I would use something like the for loop below:

            Code:
            for (my $i=0;$i<length($seq);$i+=1900){
                 my $j=$i+2000;
                 print OUT substr($seq,$i,$j);
            }
            But I don't think anyone is going to write your whole script for you!

            Comment

            • pony2001mx
              Member
              • Aug 2013
              • 32

              #7
              Thank you very much!

              Comment

              • krobison
                Senior Member
                • Nov 2007
                • 734

                #8
                Originally posted by bruce01 View Post
                I would use something like the for loop below:

                Code:
                for (my $i=0;$i<length($seq);$i+=1900){
                     my $j=$i+2000;
                     print OUT substr($seq,$i,$j);
                }
                But I don't think anyone is going to write your whole script for you!
                Sounds like a dare! It's really a trivial program & good template for writing other programs that transform sequence data. A good exercise is to use Getopt::Long to set the cutoff size and overlap size.

                Code:
                use strict;
                use Bio::SeqIO;
                my $cutSize=2000; my $overlapSize=100;
                my $writer=new Bio::SeqIO(-file=>">splits.fa");
                foreach my $arg(@ARGV)
                {
                   my $rdr=new Bio::SeqIO(-file=>$arg);
                   while (my $seqObj=$rdr->next_seq)
                   {
                      for (my $i=1; $i<$seqObj->length; $i+=$cutSize-$overlapSize)
                      {
                          my $endPoint=$i+$cutSize; 
                          $endPoint=$seqObj->length if ($endPoint>$seqObj->length);
                          my $subseq=$seqObj->subseq($i,$i+$cutSize);
                          $writer->write_seq(new Bio::Seq(-id=>$seqObj->id.".$endPoint",-seq=>$subseq));
                      }
                   }
                }
                Typo correction & debugging left as exercise for the student

                Comment

                • bruce01
                  Senior Member
                  • Mar 2011
                  • 160

                  #9
                  Originally posted by krobison View Post
                  Sounds like a dare!
                  Good on you krobison! Wasn't being mean, I would have given it a go but had a bit much in front of me. Debugging is the hardest bit when learning.

                  Comment

                  • sklages
                    Senior Member
                    • May 2008
                    • 628

                    #10
                    I do not have the impression that the OP wants to learn too much ..
                    So he/she could use google to find some ready-to-use solutions, in perl or whatever language, e.g. http://cpansearch.perl.org/src/CJFIE...p_split_seq.pl ..

                    I still think it would be a great exercise for learning perl (in "bioinformatics"). Though I usually try to avoid bioperl ;-)

                    Comment

                    • pony2001mx
                      Member
                      • Aug 2013
                      • 32

                      #11
                      Thank you all for your inputs. As a true beginner of perl (I am mostly involved in bench work), I will persits on learning perl. THANKS for your help!

                      Comment

                      Latest Articles

                      Collapse

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, Yesterday, 10:09 AM
                      0 responses
                      10 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-04-2026, 08:59 AM
                      0 responses
                      19 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-02-2026, 12:03 PM
                      0 responses
                      26 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-02-2026, 11:40 AM
                      0 responses
                      21 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...