Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Perl script: Make Statistics Of Mirna Abundances For Many Samples

    Dear All,

    Actually I post the following question a few weeks ago on Biostars (https://www.biostars.org/p/97538/#97599). I got a very nice answer there, but it's perl one liner command. I am eager to sort out the problem with perl hash of hash. Could anyone here give me an answer?

    I need to make statistics of mirna abundances for many samples. Below is an example.
    Code:
    SAMPLE    MIR    ABUNDANCE
    sample1   mir1   30
    sample1   mir3   100
    sample1   mir4   120
    sample2   mir1   40
    sample2   mir2   200
    sample3   mir1   190
    ......
    I want to change the format to below.
    Code:
              sample1    sample2    sample3
    mir1      30           40         190
    mir2      0            200         0
    mir3      190          0           0
    mir4      120          0           0
    ......
    i tried to write perl hash of hash, but was stuck (see below). Could perl export teaches me with this? I greatly appreciate your help!!
    Code:
    open FH, '<', $ARGV[0] or die "open failed:$!";
    my %h;
    while (<>){
            my ($sample, $mir, $abun) = /(.+?)\t(.+)\t(.+)/;
            $h{$sample}{$mir} = $abun; 
    }
    foreach my $sample (keys %h){
            foreach my $mir (keys %{h{$sample}})
                    print "   "      # i am stuck here. Need your help!
    }

  • #2
    see if this works:

    Code:
            my ($sample, $mir, $abun) = /(.+?)\t(.+)\t(.+)/;
            $h{$mir}{$sample} = $abun; 
    }
    foreach my $mir (sort keys %h){
            print "$mir\t";
            foreach my $sample (sort keys %{h{$mir}}){
                    print "$$h{$mir}{$sample}\t;"
            }
            print "\n";
    }
    Last edited by mastal; 05-02-2014, 12:21 PM.

    Comment


    • #3
      Hi mastal, I still have problem, but thanks a lot anyway.

      Comment


      • #4
        What problem are you still having? I think mastal re-organized it correctly. You want to print a line that has a mir, and then the value for each sample. So you would definitely want to have the outer loop be mir, and the inner loop be sample. That way it prints the mir, then on the same line prints each of the sample values.

        mastal's code may have some typos in it (with Perl, it is difficult to tell the difference between a typo and brilliant code, so I am not sure), but I edited it and it works:

        Code:
        foreach my $mir (sort keys %h){
                print "$mir\t";
                foreach my $sample (sort keys %{$h{$mir}}){ # changed h{$mir} to $h{$mir}
                        print "$h{$mir}{$sample}\t"; # changed $$h to $h and \t;" to \t";
                }
                print "\n";
        }

        when I made a little tester it outputs this:
        m1 1.1 1.2 1.3
        m2 2.1 2.2 2.3
        which is correct.
        Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

        Comment


        • #5
          Hi SNPSaurus,
          Thanks a lot for your comments! Actually it's not so easy. If the input data is as follows, then it's ok.
          Code:
          sample1	mir1	1.1
          sample1	mir2	1.2
          sample2	mir1	2.1
          sample2	mir2	2.2
          However, if the input data changes to below, it won't be what i expect. The problem is MISSING VALUE.
          Code:
          sample1	mir1	1.1
          sample1	mir2	1.2
          sample2	mir1	2.1
          sample2	mir2	2.2
          sample3	mir4	3.1
          i am a beginner and am learning perl. I tried best to write a script as follows (i add some comments for easier understanding), but still have problem. Could you please check please? I appreciate your helps!
          Code:
          #!/usr/bin/perl
          use strict;
          use warnings;
          
          open FH, '<', $ARGV[0] || die "open failed $!";
          my %h;
          my %h2;
          while (<FH>){
                  my ($sample, $mir, $abun) = /(\S+?)\t(\S+)\t(\S+)/;
                  $h{$mir}{$sample} = $abun; 
          		$h2{$sample} +=1; #increament to calculate total samples
          }
          
          foreach my $sample_h2 (sort keys %h2){ #print sample header 
          	print "\t$sample_h2";
          }
          print "\n";
          
          foreach my $mir (sort keys %h){
              print "$mir\t";  #print mir name
          	foreach my $sample2(sort keys %h2){ #sort according to sample header
          		foreach my $sample (sort keys %{$h{$mir}}){  #search sample name in %h2 from that in %h
          			if ($sample eq $sample2) {  
          				print "$h{$mir}{$sample}\t"; #when matched print 
          				last;
          			}
          		}
          	}
          	print "\n";
          }
          Last edited by pony2001mx; 05-04-2014, 05:36 AM.

          Comment


          • #6
            I think I see what you are trying to do. Some mir don't have data for all samples. So you construct a list of samples separate from the hash of hashes. You go through the hash of samples, and then go through the list of samples in your hash of hashes, and if they match you print. This is probably better done with an "exist" check, and a printing of a blank if not present:

            Code:
            foreach my $mir (sort keys %h){
                print "$mir\t";  #print mir name
            	foreach my $sample2(sort keys %h2){ #sort according to sample header
            		if (exists $h{$mir}{$sample2}) {
            				print "$h{$mir}{$sample2}\t"; #if exists print 
            		} else {
            			print "\t"; # print a blank if that sample doesn't exist for that mir
            		}
            	}
            	print "\n";
            }
            Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

            Comment


            • #7
              Hi SNPSaurus, Thank you very much! It's really good stuff for me to learn. Thanks.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 12:08 PM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              17 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              14 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              43 views
              0 likes
              Last Post seqadmin  
              Working...
              X