Seqanswers Leaderboard Ad

**gringer** · 12-02-2013, 04:46 AM

Okay, I'll just standardise your code a little to make it a bit easier for me to understand:

Code:

#!/usr/bin/perl
use strict;
use warnings;

my %hash = (); # initialise hash
while (<>){ # read a line from standard input (or files specified on command-line)
  chomp; # delete end of line character (if any)
  my @seq = split (/ /, $_); # split line into array, delimiting by a single space
  for (my $i=0; $i<=$#seq; $i++){ # for each index of the array
    $hash {$i}{$seq[$i]} +=1; # increment the value in the hash array at position (i, <base>)
  }
}

foreach my $posi (keys %hash){ # for the keys in the first hash element (indexes)
  foreach my $nt ('A', 'T', 'G', 'C'){ # for the bases A/T/G/C
    print "$posi", "$nt", %hash{$posi}{$nt}/4, "\n"; # print the location, then the base, then the value at (<loc>,<base>) divided by 4
  }
}

Given that you've specified use warnings and use strict, Perl should tell you what you've done wrong syntax-wise. What errors are you getting when you run the code? Alternatively, why do you think it's not working?

I see three potential issues:

You're dividing by 4 rather than the number of lines read
The hash value does not necessarily exist when printing
You're using %hash{a}{b} in the print, rather than $hash{a}{b} [Perl should warn you about this and suggest the correct use]

Here's my suggested modification based on theory (i.e. untested):

Code:

#!/usr/bin/perl
use strict;
use warnings;

my %hash = (); # initialise hash
my $lineCount = 0; # initialise line count
while (<>){ # read a line from standard input (or files specified on command-line)
  $lineCount++; # increment lines read
  chomp; # delete end of line character (if any)
  my @seq = split (/ /, $_); # split line into array, delimiting by a single space
  for (my $i=0; $i<=$#seq; $i++){ # for each index of the array
    $hash{$i}{$seq[$i]} +=1; # increment the value in the hash array at position (i, <base>)
  }
}

foreach my $posi (sort {$a <=> $b} (keys %hash)){ # sort the keys numerically
  foreach my $nt ('A', 'T', 'G', 'C'){ # for the bases A/T/G/C
    print("$posi", "$nt", ($hash{$posi}{$nt})/($lineCount), "\n") if(defined($hash{$posi}{$nt}));
  }
}

Yes, there are other optimisations that can be done, but your code was good enough in terms of readability and I don't think you need to worry over that for this script.

**pony2001mx** · 12-02-2013, 05:49 AM

Dear gringer,

thank you very much for your detailed corrections. I am new in perl programming, especially the perl grammer. My script is based on others' advices, but i had troubles in debugging.

Your script make me better understand perl. However, when adding a "(" before "if", i have not got any output. Would you check it please? THANKS.

my input file is as follows:
ATGCACTGACTGTATGACTG
ATGGTGACTGTGACTGACTG
ATGGACCATGACTGCATGTG
ATCCACTGTGACGTGCAACA

**pony2001mx** · 12-02-2013, 06:05 AM

The last line of your script lacks a "(". After adding it, i found no error message, but i did not get any output. Could you check it please? THANKS.

**gringer** · 12-02-2013, 12:46 PM

I changed it to 'defined'. If there's still no output, it suggests that the hash isn't being referenced appropriately, in which case you can explicitly iterate through the second-level keys:

Code:

foreach my $posi (sort {$a <=> $b} (keys %hash)){ # sort the keys numerically
  foreach my $nt (keys %{$hash{$posi}}){ # for the bases A/T/G/C
    print("$posi", "$nt", ($hash{$posi}{$nt})/($lineCount), "\n");
  }
}

If still no output, the hash probably isn't being created at all, which is a little odd....

**pony2001mx** · 12-02-2013, 06:01 PM

Hi, gringer, there still exists problems. My input file is:

ATGCACTGACTGTATGACTG
ATGGTGACTGTGACTGACTG
ATGGACCATGACTGCATGTG
ATCCACTGTGACGTGCAACA

my output file is:
0.25CACTGACTGTATGACTG
0.25GACCATGACTGCATGTG
0.25CACTGTGACGTGCAACA
0.25GTGACTGTGACTGACTG

When you have time please check again. Anyway i thank you for your help. I really learned somthing from your script and believe it is very very close. I need to stick in perl and continue learning something about hash in perl. THANKS!

**landry** · 12-02-2013, 07:13 PM

Originally posted by pony2001mx View Post

Hi, gringer, there still exists problems. My input file is:

ATGCACTGACTGTATGACTG
ATGGTGACTGTGACTGACTG
ATGGACCATGACTGCATGTG
ATCCACTGTGACGTGCAACA

my output file is:
0.25CACTGACTGTATGACTG
0.25GACCATGACTGCATGTG
0.25CACTGTGACGTGCAACA
0.25GTGACTGTGACTGACTG

When you have time please check again. Anyway i thank you for your help. I really learned somthing from your script and believe it is very very close. I need to stick in perl and continue learning something about hash in perl. THANKS!

From your output, I'm pretty sure the problem is in this statement:

PHP Code:


my @seq = split (//, $_); #DO NOT put any space in between "//"

**gringer** · 12-02-2013, 08:28 PM

Yes, sorry, I forgot about the space separator. That demonstrates why example input/output and a "this is wrong because..." explanation makes the whole problem solving process much easier.

**pony2001mx** · 12-02-2013, 09:32 PM

Thank you both. This script works well. Personally I will continue learning perl, which is powerful for genomic analysis. THANKS.

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, 07-25-2024, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin 07-25-2024, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 160 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

script to calculate A,T,G,C frequency for each position in an alingment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News