I've two sets of large number of proteins( in the order 100K) , and wish to find out unique proteins belonging to each set.
Is there any tool for doing it fast?
Thanks
Is there any tool for doing it fast?
Thanks
You are currently viewing the SEQanswers forums as a guest, which limits your access. Click here to register now, and join the discussion
cat file1 file2 | sort | uniq -u
#!/usr/bin/perl use warnings; use strict; open(my $f1, "< file1.fasta") or die("Cannot open file1"); open(my $f2, "< file2.fasta") or die("Cannot open file2"); my %seenSequences = (); my $sequence = ""; my $seqID = ""; while(<$f1>){ chomp; if(/^>(.*)$/){ $seenSequences{$sequence} = $seqID if $seqID ne ""; $sequence = ""; $seqID = $1; } else { $sequence .= $_; } } $seenSequences{$sequence} = $seqID if $seqID ne ""; close($f1); $sequence = ""; $seqID = ""; while(<$f2>){ chomp; if(/^>(.*)$/){ if(($seqID ne "") && !exists($seenSequences{$sequence})){ printf(">%s [2]\n%s\n", $seqID, $sequence); } else { delete($seenSequences{$sequence}); } $seqID = $1; $sequence = ""; } else { $sequence .= $_; } } if(($seqID ne "") && !exists($seenSequences{$sequence})){ printf(">%s [2]\n%s\n", $seqID, $sequence); } else { delete($seenSequences{$sequence}); } close($f1); while(my ($seq, $id) = each(%seenSequences)){ printf(">%s [1]\n%s\n", $id, $seq); }
./141964.pl > out.fasta && head *.fasta ==> file1.fasta <== >1 PRTEINEIN >3 PRTEINTHREE ==> file2.fasta <== >1 PRTEINEIN >2 PRTEINNI ==> out.fasta <== >2 [2] PRTEINNI >3 [1] PRTEINTHREE
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 06:25 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
Today, 06:25 AM
|
||
Started by seqadmin, Yesterday, 01:02 PM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Yesterday, 01:02 PM
|
||
Started by seqadmin, 09-18-2024, 06:39 AM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
09-18-2024, 06:39 AM
|
||
Started by seqadmin, 09-11-2024, 02:44 PM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
09-11-2024, 02:44 PM
|
Comment