Hello,
I am very new to Perl scripting and scripting in general. I am trying to extract information from a multi fasta file to an output file. I have constructed a script but it isn't giving me the output that I want.
For example (this would the current fasta file)
>gi|546265522| SOX6
acgtcaatccag
cgattagcaaat
gtcctgattttgg
>gi|678457845| CMYC
gttaccatgcgatg
caatttgggacacc
I want (notice the ">" is removed:
gi|546265522| SOX6
Seq length: 36
gi|678457845| CMYC
Seq length: 28
I was able to put the lines of the sequences into one string but calculating the length isn't working. I removed any spaces within the new single string and attempted $length = length($line); but that doesn't work. The current method in my script also gives the same output like:::
>gi|546265522| SOX6
121212
>gi|678457845| CMYC
1414
How do I solve this problem??
Ultimately i want some like this but I am taking it in steps so that I actually understand what I'm doing and why.
gi|678457845| CMYC (tab) seq length (tab) AT/GC content
Here is my script
#!/usr/bin/perl -w
print "file: \n";
$in = <STDIN>;
chomp $in;
print "output file: \n";
$out = <STDIN>;
chop $out;
unless ( open(IN, $in) ) {
die ("cant input file $in\n");}
unless ( open(OUT, ">$out") ) {
die("cant open output file $out\n");}
my $line = <IN>;
print OUT $line;
while ($line = <IN>)
{
chomp $line;
if ($line=~/^>(.+)/) {
print OUT "\n",$line,"\n"; }
else { $line =~ s/^\s*(.*)\s*$/$1/;
$a=($line=~tr/aA//);
$c=($line=~tr/cC//);
$g=($line=~tr/gG//);
$t=($line=~tr/tT//);
$n=($line=~tr/nN//);
$x=($line=~tr/xX//);
$length = $a + $c + $g + $t + $n + $x;
print OUT $length; }
}
print OUT "\n";
I am very new to Perl scripting and scripting in general. I am trying to extract information from a multi fasta file to an output file. I have constructed a script but it isn't giving me the output that I want.
For example (this would the current fasta file)
>gi|546265522| SOX6
acgtcaatccag
cgattagcaaat
gtcctgattttgg
>gi|678457845| CMYC
gttaccatgcgatg
caatttgggacacc
I want (notice the ">" is removed:
gi|546265522| SOX6
Seq length: 36
gi|678457845| CMYC
Seq length: 28
I was able to put the lines of the sequences into one string but calculating the length isn't working. I removed any spaces within the new single string and attempted $length = length($line); but that doesn't work. The current method in my script also gives the same output like:::
>gi|546265522| SOX6
121212
>gi|678457845| CMYC
1414
How do I solve this problem??
Ultimately i want some like this but I am taking it in steps so that I actually understand what I'm doing and why.
gi|678457845| CMYC (tab) seq length (tab) AT/GC content
Here is my script
#!/usr/bin/perl -w
print "file: \n";
$in = <STDIN>;
chomp $in;
print "output file: \n";
$out = <STDIN>;
chop $out;
unless ( open(IN, $in) ) {
die ("cant input file $in\n");}
unless ( open(OUT, ">$out") ) {
die("cant open output file $out\n");}
my $line = <IN>;
print OUT $line;
while ($line = <IN>)
{
chomp $line;
if ($line=~/^>(.+)/) {
print OUT "\n",$line,"\n"; }
else { $line =~ s/^\s*(.*)\s*$/$1/;
$a=($line=~tr/aA//);
$c=($line=~tr/cC//);
$g=($line=~tr/gG//);
$t=($line=~tr/tT//);
$n=($line=~tr/nN//);
$x=($line=~tr/xX//);
$length = $a + $c + $g + $t + $n + $x;
print OUT $length; }
}
print OUT "\n";
Comment