Unconfigured Ad

**zhidkov.ilia** · 10-22-2013, 01:59 AM

Sounds like question to PerlMonks forum, you can ask there how properly use 'substr' function for your tasks.

**sklages** · 10-22-2013, 02:32 AM

You need to get an idea on a) how to parse multi fasta files b) how to split each individual sequence found in your file.

a) http://lmgtfy.com/?q=perl+parse+fasta+file
b) http://lmgtfy.com/?q=perl+split+large+genome+sequence

It's a good exercise for a beginner ..

**pony2001mx** · 10-22-2013, 06:51 AM

Dear zhidkov.ilia and sklages,
THANKS A LOT for your replys. I think again. I can use a simplified method, i.e., combine all contigs into one sequence (I can do this), then break/split the sequence into every 2kb fragments (I need a script for this). Would you or otheres please generate this script for me? GREATLY APPRECIATE YOUR HELPS!!

**pony2001mx** · 10-22-2013, 06:54 AM

perl script:break contig into 2kb sequences

Dear zhidkov.ilia and sklages,
THANKS A LOT for your replys. I think again. I can use a simplified method, i.e., combine all contigs into one sequence (I can do this), then break/split the sequence into every 2kb fragments (I need a script for this). Would you or otheres please generate this script for me? GREATLY APPRECIATE YOUR HELPS!!

**bruce01** · 10-22-2013, 07:13 AM

I would use something like the for loop below:

Code:

for (my $i=0;$i<length($seq);$i+=1900){
     my $j=$i+2000;
     print OUT substr($seq,$i,$j);
}

But I don't think anyone is going to write your whole script for you!

**pony2001mx** · 10-22-2013, 07:19 AM

Thank you very much!

**krobison** · 10-22-2013, 07:31 AM

Originally posted by bruce01 View Post

I would use something like the for loop below:

Code:

for (my $i=0;$i<length($seq);$i+=1900){
     my $j=$i+2000;
     print OUT substr($seq,$i,$j);
}

But I don't think anyone is going to write your whole script for you!

Sounds like a dare! It's really a trivial program & good template for writing other programs that transform sequence data. A good exercise is to use Getopt::Long to set the cutoff size and overlap size.

Code:

use strict;
use Bio::SeqIO;
my $cutSize=2000; my $overlapSize=100;
my $writer=new Bio::SeqIO(-file=>">splits.fa");
foreach my $arg(@ARGV)
{
   my $rdr=new Bio::SeqIO(-file=>$arg);
   while (my $seqObj=$rdr->next_seq)
   {
      for (my $i=1; $i<$seqObj->length; $i+=$cutSize-$overlapSize)
      {
          my $endPoint=$i+$cutSize; 
          $endPoint=$seqObj->length if ($endPoint>$seqObj->length);
          my $subseq=$seqObj->subseq($i,$i+$cutSize);
          $writer->write_seq(new Bio::Seq(-id=>$seqObj->id.".$endPoint",-seq=>$subseq));
      }
   }
}

Typo correction & debugging left as exercise for the student

**bruce01** · 10-22-2013, 08:06 AM

Originally posted by krobison View Post

Sounds like a dare!

Good on you krobison! Wasn't being mean, I would have given it a go but had a bit much in front of me. Debugging is the hardest bit when learning.

**sklages** · 10-22-2013, 10:48 PM

I do not have the impression that the OP wants to learn too much ..
So he/she could use google to find some ready-to-use solutions, in perl or whatever language, e.g. http://cpansearch.perl.org/src/CJFIE...p_split_seq.pl ..

I still think it would be a great exercise for learning perl (in "bioinformatics"). Though I usually try to avoid bioperl ;-)

**pony2001mx** · 10-22-2013, 11:34 PM

Thank you all for your inputs. As a true beginner of perl (I am mostly involved in bench work), I will persits on learning perl. THANKS for your help!

Topics	Statistics	Last Post
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, Today, 11:58 AM	0 responses 7 views 0 reactions	Last Post by SEQadmin2 Today, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 24 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 34 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 56 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM

Unconfigured Ad

ask perl script: break contigs into overlapping sequences

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News