Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • perl?

    Hi,
    I have a file like this:
    ID : 741158
    PARENT ID : 9605
    RANK : species
    GC ID : 1
    MGC ID : 2
    SCIENTIFIC NAME : Homo sp. Altai
    GENBANK COMMON NAME : Denisova hominin
    //
    ID : 756884
    PARENT ID : 9598
    RANK : subspecies
    GC ID : 1
    MGC ID : 2
    SCIENTIFIC NAME : Pan troglodytes ellioti
    //
    I need just ID number if rank :species. so for this example the uotput should be :741158.
    my perl script is like this:
    #!/usr/bin/perl -w
    use strict;
    use warnings;


    open (FILE, 'm.txt');
    while (my $p = <FILE>){
    if ($p =~ /^\/\/\n/){
    last;
    }elsif ($p =~ /GC ID : 1/){
    next;
    }elsif ($p =~ /MGC ID : 2/){
    next;
    }elsif ($p =~ /SCIENTIFIC NAME :\D/){
    next;
    }elsif ($p =~ /\bspecies$/){
    print "ID number";?????
    }
    }

    Any sugeestion? Thanks.

  • #2
    This should work if the file is formatted exactly as you've shown:

    Code:
    open (FILE, 'm.txt');
    while(<FILE>) {
      if ($_ =~ m/^ID :/ ) {
        @id = split(/ : /,$_);
      }
      if ($_ =~ m/\bspecies$/) {
        print $id[1];
      }
    }
    Note that it's pretty fragile -- it doesn't do any checking and assumes the input is just as you've shown. In particular: (1) Every record must have an ID; (2) the ID must always come before the rank; (3) the ID line must have a space, colon, space and then the ID number; (4) Any line that ends with the word "species" will cause the ID to be printed, so "subspecies" (etc) must always be one word and "species" shouldn't appear as the final word of any other line.

    Comment


    • #3
      Hi thurisaz,
      Thanks for your answer. All your assumptions are true but your code is not working.

      Comment


      • #4
        Hi,

        Just to be clear: I omitted the initial "#!/usr/bin/perl -w" line. If you want to copy & paste into a file, you will need to include that, like this:

        Code:
        #!/usr/bin/perl -w
        
        open (FILE, 'm.txt');
        while(<FILE>) {
          if ($_ =~ m/^ID :/ ) {
            @id = split(/ : /,$_);
          }
          if ($_ =~ m/\bspecies$/) {
            print $id[1];
          }
        }
        If that code isn't working for you, then please let me know what exactly is going wrong. I just copied & pasted to be sure and it seems to work fine.

        Comment


        • #5
          I just copy your code but the error is :
          Use of uninitialized value in print at 110.pl line 13, <FILE> line 3
          for that I defined array and variable ID (my) but noting change.
          Thanks.

          Comment


          • #6
            An entry at around line 13 in your file (m.txt) is breaking the script; it seems like it's because there is a line that ends in "species" _before_ an ID has been provided. Like I said, it's a fragile script that assumes everything is well-behaved. A few changes will make sure that ID has been assigned before printing and also clears the ID at the end of each record:

            Code:
            #!/usr/bin/perl -w
            
            open (FILE, 'm.txt');
            while(<FILE>) {
                if ($_ =~ m/^ID :/ ) {
                  @id = split(/ : /,$_);
                }
                if ($_ =~ m/\bspecies$/ && $id[1]) {
                  print $id[1];
                }
                if ($_ =~ m?^//$?) {
                  $id[1]=0;
                }
            }
            It sounds like you have something unexpected going on with your input file, though, so I strongly recommend having a good look at it, especially since the script makes so many assumptions.
            Last edited by thurisaz; 07-18-2011, 05:02 AM.

            Comment


            • #7
              Thanks but it is still not working. Each time I just used exactly that file posted on this page.It is really strange. But anyway thanks so much for your help.

              Comment


              • #8
                Originally posted by semna View Post
                I just copy your code but the error is :
                Use of uninitialized value in print at 110.pl line 13, <FILE> line 3
                for that I defined array and variable ID (my) but noting change.
                Thanks.
                Make sure you define your id-array with
                Code:
                my @id;
                and not
                Code:
                my $id;
                If this is not it, just copy & paste your code here.

                Comment


                • #9
                  unix script

                  ID : 741158
                  PARENT ID : 9605
                  RANK : species
                  GC ID : 1
                  MGC ID : 2
                  SCIENTIFIC NAME : Homo sp. Altai
                  GENBANK COMMON NAME : Denisova hominin
                  //

                  At the command prompt you can

                  more filename.txt | egrep ID | awk '{print :$3}' | more
                  that will give you your list. of IDs and the colon in front.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Genetic Variation in Immunogenetics and Antibody Diversity
                    by seqadmin



                    The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                    Today, 07:24 PM
                  • seqadmin
                    Choosing Between NGS and qPCR
                    by seqadmin



                    Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                    10-18-2024, 07:11 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 11-01-2024, 06:09 AM
                  0 responses
                  24 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 10-30-2024, 05:31 AM
                  0 responses
                  21 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 10-24-2024, 06:58 AM
                  0 responses
                  25 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 10-23-2024, 08:43 AM
                  0 responses
                  56 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X