Seqanswers Leaderboard Ad

**steven** · 07-13-2011, 12:55 AM

cat file | awk '$1=="ID"{id=$0} $1=="RANK" && $3=="species"{print id}'

Assuming an ID line is provided for each entry.
If you just need the ID number, use "id=$3" instead of "id=$0".

**semna** · 07-13-2011, 01:30 AM

Thanks Steven. But I am not sure that it works in perl .

**gringer** · 07-13-2011, 01:37 AM

So, you asked about perl. Here's a one-liner that roughly matches steven's awk script:

Code:

$ perl -ne 'if(/ID *: *(\d+)/){$id=$1};if(/RANK *: *species/){print "$id\n"}' file.txt
9605

If you're using it inside some other perl code, you'd probably do something like this:

Code:

my @new_list = ();
my $id = "";
# this assumes you want to store IDs from files or standard in
while(<>){
  if(/^ID *: *(\d+)/){$id=$1};
  if($id && /^RANK *: *species/){
    push(@new_list, $id);
  }
}

Or if you're reading from the array @list, which contains one element per line:

Code:

my @new_list = ();
my $id = "";
foreach(@list){
  if(/^ID *: *(\d+)/){$id=$1};
  if($id && /^RANK *: *species/){
    push(@new_list, $id);
  }

Or, assuming your example is exactly as your code looks (i.e. each element in @list contains a number of lines, but only one record per list element):

Code:

my @new_list = ();
foreach (@list){
  if(/^RANK *: *species/){
    if(/^ID *: *(\d+)/){
      push(@new_list, $1);
    }
  }
}

[but if that were the case, your code would probably work, but would spit out the entire record, rather than just the ID]

I would advise you to stay away from using grep in situations like this where you're modifying things inside a loop. It would do weird things like not adding to your result array if $id were 0, and changing $_ would alter your original list. See here for more information:

grep - Perldoc Browser

http://perldoc.perl.org/functions/grep.html

**steven** · 07-13-2011, 01:39 AM

Yes, that is not perl indeed but awk. If you really need to include this in a bigger perl code, the equivalent can be obtained by splitting the lines and adding a couple of "if" and "else". And I bet the same can be obtained by just using linux command lines, directly from the shell.

**steven** · 07-13-2011, 01:42 AM

Nice job gringer, that is a complete answer

**gringer** · 07-13-2011, 01:51 AM

I bet the same can be obtained by just using linux command lines, directly from the shell

I think an awk one-liner is close enough to 'linux commands, directly from the shell'. A command-line equivalent using multiple grep pipes would go something like this:

$ grep '^$ID\|RANK$' file.txt | grep -B 1 '^RANK : species' | grep -v '^RANK'
ID : 741158

Although you'd run into problems with that if any IDs didn't have associated ranks following them.

**semna** · 07-13-2011, 02:31 AM

Thanks so much Gringer and Steven.

Topics	Statistics	Last Post
The Role of Spliceosomes in RNA Splicing and Genome Evolution by seqadmin Started by seqadmin, Yesterday, 07:03 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 07:03 AM
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 36 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 42 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 37 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM

Seqanswers Leaderboard Ad

Announcement

regular expression in perl?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News