Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • perl?

    Hi,
    I have a file like this:
    ID : 741158
    PARENT ID : 9605
    RANK : species
    GC ID : 1
    MGC ID : 2
    SCIENTIFIC NAME : Homo sp. Altai
    GENBANK COMMON NAME : Denisova hominin
    //
    ID : 756884
    PARENT ID : 9598
    RANK : subspecies
    GC ID : 1
    MGC ID : 2
    SCIENTIFIC NAME : Pan troglodytes ellioti
    //
    I need just ID number if rank :species. so for this example the uotput should be :741158.
    my perl script is like this:
    #!/usr/bin/perl -w
    use strict;
    use warnings;


    open (FILE, 'm.txt');
    while (my $p = <FILE>){
    if ($p =~ /^\/\/\n/){
    last;
    }elsif ($p =~ /GC ID : 1/){
    next;
    }elsif ($p =~ /MGC ID : 2/){
    next;
    }elsif ($p =~ /SCIENTIFIC NAME :\D/){
    next;
    }elsif ($p =~ /\bspecies$/){
    print "ID number";?????
    }
    }

    Any sugeestion? Thanks.

  • #2
    This should work if the file is formatted exactly as you've shown:

    Code:
    open (FILE, 'm.txt');
    while(<FILE>) {
      if ($_ =~ m/^ID :/ ) {
        @id = split(/ : /,$_);
      }
      if ($_ =~ m/\bspecies$/) {
        print $id[1];
      }
    }
    Note that it's pretty fragile -- it doesn't do any checking and assumes the input is just as you've shown. In particular: (1) Every record must have an ID; (2) the ID must always come before the rank; (3) the ID line must have a space, colon, space and then the ID number; (4) Any line that ends with the word "species" will cause the ID to be printed, so "subspecies" (etc) must always be one word and "species" shouldn't appear as the final word of any other line.

    Comment


    • #3
      Hi thurisaz,
      Thanks for your answer. All your assumptions are true but your code is not working.

      Comment


      • #4
        Hi,

        Just to be clear: I omitted the initial "#!/usr/bin/perl -w" line. If you want to copy & paste into a file, you will need to include that, like this:

        Code:
        #!/usr/bin/perl -w
        
        open (FILE, 'm.txt');
        while(<FILE>) {
          if ($_ =~ m/^ID :/ ) {
            @id = split(/ : /,$_);
          }
          if ($_ =~ m/\bspecies$/) {
            print $id[1];
          }
        }
        If that code isn't working for you, then please let me know what exactly is going wrong. I just copied & pasted to be sure and it seems to work fine.

        Comment


        • #5
          I just copy your code but the error is :
          Use of uninitialized value in print at 110.pl line 13, <FILE> line 3
          for that I defined array and variable ID (my) but noting change.
          Thanks.

          Comment


          • #6
            An entry at around line 13 in your file (m.txt) is breaking the script; it seems like it's because there is a line that ends in "species" _before_ an ID has been provided. Like I said, it's a fragile script that assumes everything is well-behaved. A few changes will make sure that ID has been assigned before printing and also clears the ID at the end of each record:

            Code:
            #!/usr/bin/perl -w
            
            open (FILE, 'm.txt');
            while(<FILE>) {
                if ($_ =~ m/^ID :/ ) {
                  @id = split(/ : /,$_);
                }
                if ($_ =~ m/\bspecies$/ && $id[1]) {
                  print $id[1];
                }
                if ($_ =~ m?^//$?) {
                  $id[1]=0;
                }
            }
            It sounds like you have something unexpected going on with your input file, though, so I strongly recommend having a good look at it, especially since the script makes so many assumptions.
            Last edited by thurisaz; 07-18-2011, 05:02 AM.

            Comment


            • #7
              Thanks but it is still not working. Each time I just used exactly that file posted on this page.It is really strange. But anyway thanks so much for your help.

              Comment


              • #8
                Originally posted by semna View Post
                I just copy your code but the error is :
                Use of uninitialized value in print at 110.pl line 13, <FILE> line 3
                for that I defined array and variable ID (my) but noting change.
                Thanks.
                Make sure you define your id-array with
                Code:
                my @id;
                and not
                Code:
                my $id;
                If this is not it, just copy & paste your code here.

                Comment


                • #9
                  unix script

                  ID : 741158
                  PARENT ID : 9605
                  RANK : species
                  GC ID : 1
                  MGC ID : 2
                  SCIENTIFIC NAME : Homo sp. Altai
                  GENBANK COMMON NAME : Denisova hominin
                  //

                  At the command prompt you can

                  more filename.txt | egrep ID | awk '{print :$3}' | more
                  that will give you your list. of IDs and the colon in front.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Latest Developments in Precision Medicine
                    by seqadmin



                    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                    Somatic Genomics
                    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                    05-24-2024, 01:16 PM
                  • seqadmin
                    Recent Advances in Sequencing Analysis Tools
                    by seqadmin


                    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                    05-06-2024, 07:48 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 05-24-2024, 07:15 AM
                  0 responses
                  15 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-23-2024, 10:28 AM
                  0 responses
                  17 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-23-2024, 07:35 AM
                  0 responses
                  21 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-22-2024, 02:06 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X