Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Perl script: Make Statistics Of Mirna Abundances For Many Samples

    Dear All,

    Actually I post the following question a few weeks ago on Biostars (https://www.biostars.org/p/97538/#97599). I got a very nice answer there, but it's perl one liner command. I am eager to sort out the problem with perl hash of hash. Could anyone here give me an answer?

    I need to make statistics of mirna abundances for many samples. Below is an example.
    Code:
    SAMPLE    MIR    ABUNDANCE
    sample1   mir1   30
    sample1   mir3   100
    sample1   mir4   120
    sample2   mir1   40
    sample2   mir2   200
    sample3   mir1   190
    ......
    I want to change the format to below.
    Code:
              sample1    sample2    sample3
    mir1      30           40         190
    mir2      0            200         0
    mir3      190          0           0
    mir4      120          0           0
    ......
    i tried to write perl hash of hash, but was stuck (see below). Could perl export teaches me with this? I greatly appreciate your help!!
    Code:
    open FH, '<', $ARGV[0] or die "open failed:$!";
    my %h;
    while (<>){
            my ($sample, $mir, $abun) = /(.+?)\t(.+)\t(.+)/;
            $h{$sample}{$mir} = $abun; 
    }
    foreach my $sample (keys %h){
            foreach my $mir (keys %{h{$sample}})
                    print "   "      # i am stuck here. Need your help!
    }

  • #2
    see if this works:

    Code:
            my ($sample, $mir, $abun) = /(.+?)\t(.+)\t(.+)/;
            $h{$mir}{$sample} = $abun; 
    }
    foreach my $mir (sort keys %h){
            print "$mir\t";
            foreach my $sample (sort keys %{h{$mir}}){
                    print "$$h{$mir}{$sample}\t;"
            }
            print "\n";
    }
    Last edited by mastal; 05-02-2014, 12:21 PM.

    Comment


    • #3
      Hi mastal, I still have problem, but thanks a lot anyway.

      Comment


      • #4
        What problem are you still having? I think mastal re-organized it correctly. You want to print a line that has a mir, and then the value for each sample. So you would definitely want to have the outer loop be mir, and the inner loop be sample. That way it prints the mir, then on the same line prints each of the sample values.

        mastal's code may have some typos in it (with Perl, it is difficult to tell the difference between a typo and brilliant code, so I am not sure), but I edited it and it works:

        Code:
        foreach my $mir (sort keys %h){
                print "$mir\t";
                foreach my $sample (sort keys %{$h{$mir}}){ # changed h{$mir} to $h{$mir}
                        print "$h{$mir}{$sample}\t"; # changed $$h to $h and \t;" to \t";
                }
                print "\n";
        }

        when I made a little tester it outputs this:
        m1 1.1 1.2 1.3
        m2 2.1 2.2 2.3
        which is correct.
        Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

        Comment


        • #5
          Hi SNPSaurus,
          Thanks a lot for your comments! Actually it's not so easy. If the input data is as follows, then it's ok.
          Code:
          sample1	mir1	1.1
          sample1	mir2	1.2
          sample2	mir1	2.1
          sample2	mir2	2.2
          However, if the input data changes to below, it won't be what i expect. The problem is MISSING VALUE.
          Code:
          sample1	mir1	1.1
          sample1	mir2	1.2
          sample2	mir1	2.1
          sample2	mir2	2.2
          sample3	mir4	3.1
          i am a beginner and am learning perl. I tried best to write a script as follows (i add some comments for easier understanding), but still have problem. Could you please check please? I appreciate your helps!
          Code:
          #!/usr/bin/perl
          use strict;
          use warnings;
          
          open FH, '<', $ARGV[0] || die "open failed $!";
          my %h;
          my %h2;
          while (<FH>){
                  my ($sample, $mir, $abun) = /(\S+?)\t(\S+)\t(\S+)/;
                  $h{$mir}{$sample} = $abun; 
          		$h2{$sample} +=1; #increament to calculate total samples
          }
          
          foreach my $sample_h2 (sort keys %h2){ #print sample header 
          	print "\t$sample_h2";
          }
          print "\n";
          
          foreach my $mir (sort keys %h){
              print "$mir\t";  #print mir name
          	foreach my $sample2(sort keys %h2){ #sort according to sample header
          		foreach my $sample (sort keys %{$h{$mir}}){  #search sample name in %h2 from that in %h
          			if ($sample eq $sample2) {  
          				print "$h{$mir}{$sample}\t"; #when matched print 
          				last;
          			}
          		}
          	}
          	print "\n";
          }
          Last edited by pony2001mx; 05-04-2014, 05:36 AM.

          Comment


          • #6
            I think I see what you are trying to do. Some mir don't have data for all samples. So you construct a list of samples separate from the hash of hashes. You go through the hash of samples, and then go through the list of samples in your hash of hashes, and if they match you print. This is probably better done with an "exist" check, and a printing of a blank if not present:

            Code:
            foreach my $mir (sort keys %h){
                print "$mir\t";  #print mir name
            	foreach my $sample2(sort keys %h2){ #sort according to sample header
            		if (exists $h{$mir}{$sample2}) {
            				print "$h{$mir}{$sample2}\t"; #if exists print 
            		} else {
            			print "\t"; # print a blank if that sample doesn't exist for that mir
            		}
            	}
            	print "\n";
            }
            Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

            Comment


            • #7
              Hi SNPSaurus, Thank you very much! It's really good stuff for me to learn. Thanks.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Genetic Variation in Immunogenetics and Antibody Diversity
                by seqadmin



                The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                11-06-2024, 07:24 PM
              • seqadmin
                Choosing Between NGS and qPCR
                by seqadmin



                Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                10-18-2024, 07:11 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 11:09 AM
              0 responses
              22 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Today, 06:13 AM
              0 responses
              20 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 11-01-2024, 06:09 AM
              0 responses
              30 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 10-30-2024, 05:31 AM
              0 responses
              21 views
              0 likes
              Last Post seqadmin  
              Working...
              X