Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • script to calculate A,T,G,C frequency for each position in an alingment

    Dear all,
    I am new in programming. I wrote a script to calculate the A,T,G,C frequency for each position in an alignment. Take the following alignment for example, in the position 1 the A frequency is 33.3% and G freqeuncy is 66.6%.

    Code:
    ATGCATGC
    GGTCACGT
    GTAGCGTA
    This script is based on others' suggestions, using hash of hash. However, I am in trouble to debug it. Would you please help to indicate where are my errors? THANK YOU VERY MUCH.

    Code:
    #!/usr/bin/perl
    use strict;
    use warnings;
    
    open SEQ, $ARGV[0];
    my %hash = 0;
    while (<SEQ>){
    chomp;
    my @seq = split (/ /, $_);
    	for (my $i=0; $i<=$#seq; $i++){
    	$hash {$i}{$seq[$i]} +=1;
    	}
    	}
    close SEQ;
    	foreach my $posi (keys %hash){
    		foreach my $nt ('A', 'T', 'G', 'C'){
    	print "$posi", "$nt", %hash{$posi}{$nt}/4, "\n";
    		}
    	}

  • #2
    Okay, I'll just standardise your code a little to make it a bit easier for me to understand:
    Code:
    #!/usr/bin/perl
    use strict;
    use warnings;
    
    my %hash = (); # initialise hash
    while (<>){ # read a line from standard input (or files specified on command-line)
      chomp; # delete end of line character (if any)
      my @seq = split (/ /, $_); # split line into array, delimiting by a single space
      for (my $i=0; $i<=$#seq; $i++){ # for each index of the array
        $hash {$i}{$seq[$i]} +=1; # increment the value in the hash array at position (i, <base>)
      }
    }
    
    foreach my $posi (keys %hash){ # for the keys in the first hash element (indexes)
      foreach my $nt ('A', 'T', 'G', 'C'){ # for the bases A/T/G/C
        print "$posi", "$nt", %hash{$posi}{$nt}/4, "\n"; # print the location, then the base, then the value at (<loc>,<base>) divided by 4
      }
    }
    Given that you've specified use warnings and use strict, Perl should tell you what you've done wrong syntax-wise. What errors are you getting when you run the code? Alternatively, why do you think it's not working?

    I see three potential issues:
    • You're dividing by 4 rather than the number of lines read
    • The hash value does not necessarily exist when printing
    • You're using %hash{a}{b} in the print, rather than $hash{a}{b} [Perl should warn you about this and suggest the correct use]


    Here's my suggested modification based on theory (i.e. untested):

    Code:
    #!/usr/bin/perl
    use strict;
    use warnings;
    
    my %hash = (); # initialise hash
    my $lineCount = 0; # initialise line count
    while (<>){ # read a line from standard input (or files specified on command-line)
      $lineCount++; # increment lines read
      chomp; # delete end of line character (if any)
      my @seq = split (/ /, $_); # split line into array, delimiting by a single space
      for (my $i=0; $i<=$#seq; $i++){ # for each index of the array
        $hash{$i}{$seq[$i]} +=1; # increment the value in the hash array at position (i, <base>)
      }
    }
    
    foreach my $posi (sort {$a <=> $b} (keys %hash)){ # sort the keys numerically
      foreach my $nt ('A', 'T', 'G', 'C'){ # for the bases A/T/G/C
        print("$posi", "$nt", ($hash{$posi}{$nt})/($lineCount), "\n") if(defined($hash{$posi}{$nt}));
      }
    }
    Yes, there are other optimisations that can be done, but your code was good enough in terms of readability and I don't think you need to worry over that for this script.
    Last edited by gringer; 12-02-2013, 12:40 PM. Reason: corrected last if statement

    Comment


    • #3
      Dear gringer,

      thank you very much for your detailed corrections. I am new in perl programming, especially the perl grammer. My script is based on others' advices, but i had troubles in debugging.

      Your script make me better understand perl. However, when adding a "(" before "if", i have not got any output. Would you check it please? THANKS.

      my input file is as follows:
      ATGCACTGACTGTATGACTG
      ATGGTGACTGTGACTGACTG
      ATGGACCATGACTGCATGTG
      ATCCACTGTGACGTGCAACA

      Comment


      • #4
        The last line of your script lacks a "(". After adding it, i found no error message, but i did not get any output. Could you check it please? THANKS.

        Comment


        • #5
          I changed it to 'defined'. If there's still no output, it suggests that the hash isn't being referenced appropriately, in which case you can explicitly iterate through the second-level keys:

          Code:
          foreach my $posi (sort {$a <=> $b} (keys %hash)){ # sort the keys numerically
            foreach my $nt (keys %{$hash{$posi}}){ # for the bases A/T/G/C
              print("$posi", "$nt", ($hash{$posi}{$nt})/($lineCount), "\n");
            }
          }
          If still no output, the hash probably isn't being created at all, which is a little odd....

          Comment


          • #6
            Hi, gringer, there still exists problems. My input file is:

            ATGCACTGACTGTATGACTG
            ATGGTGACTGTGACTGACTG
            ATGGACCATGACTGCATGTG
            ATCCACTGTGACGTGCAACA

            my output file is:
            0.25CACTGACTGTATGACTG
            0.25GACCATGACTGCATGTG
            0.25CACTGTGACGTGCAACA
            0.25GTGACTGTGACTGACTG

            When you have time please check again. Anyway i thank you for your help. I really learned somthing from your script and believe it is very very close. I need to stick in perl and continue learning something about hash in perl. THANKS!

            Comment


            • #7
              Originally posted by pony2001mx View Post
              Hi, gringer, there still exists problems. My input file is:

              ATGCACTGACTGTATGACTG
              ATGGTGACTGTGACTGACTG
              ATGGACCATGACTGCATGTG
              ATCCACTGTGACGTGCAACA

              my output file is:
              0.25CACTGACTGTATGACTG
              0.25GACCATGACTGCATGTG
              0.25CACTGTGACGTGCAACA
              0.25GTGACTGTGACTGACTG

              When you have time please check again. Anyway i thank you for your help. I really learned somthing from your script and believe it is very very close. I need to stick in perl and continue learning something about hash in perl. THANKS!
              From your output, I'm pretty sure the problem is in this statement:
              PHP Code:
              my @seq split (//, $_); #DO NOT put any space in between "//" 

              Comment


              • #8
                Yes, sorry, I forgot about the space separator. That demonstrates why example input/output and a "this is wrong because..." explanation makes the whole problem solving process much easier.

                Comment


                • #9
                  Thank you both. This script works well. Personally I will continue learning perl, which is powerful for genomic analysis. THANKS.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Recent Advances in Sequencing Analysis Tools
                    by seqadmin


                    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                    05-06-2024, 07:48 AM
                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 05-14-2024, 07:03 AM
                  0 responses
                  26 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-10-2024, 06:35 AM
                  0 responses
                  45 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-09-2024, 02:46 PM
                  0 responses
                  59 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-07-2024, 06:57 AM
                  0 responses
                  46 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X