I'm not so familiar with perl, is there script can fetch specific length of sequence from a fasta file? many thanks!
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
its name is bp_aacomp.pl, this script can count the aminoacids of a FASTA format file. This doesn't work good with multifasta file, because it takes count of all the aminoacids of the file. If you want to count the aminoacids of 1 sequence I would recomend you. the usage is really simple
perl bp_aacomp.pl yourfile.fasta
if you need to bp, you can try this LINK where the is another script.
#!perl
use strict;
use warnings;
use Carp;
use Bio::SeqIO;
use Getopt::Long;
use Bio::SeqUtils;
use Bio::Tools::IUPAC;
my $table = new Bio::SeqUtils;
my @BASES = $table->valid_aa(0);
my %all = $table->valid_aa(2);
my ($file,$format,$help) = ( undef, 'fasta');
GetOptions(
'i|in:s' => \$file,
'f|format:s' => \$format,
'h|help|?' => \$help,
);
my $USAGE = "usage: aacomp [-f format] filename\n\tdefault format is fasta\n";
$file = shift unless $file;
die $USAGE if $help;
my $seqin;
if( defined $file ) {
print "Could not open file [$file]\n$USAGE" and exit unless -e $file;
$seqin = new Bio::SeqIO(-format => $format,
-file => $file);
} else {
$seqin = new Bio::SeqIO(-format => $format,
-fh => \*STDIN);
}
my %composition;
my $total;
foreach my $base ( @BASES ) {
$composition{$base} = 0;
}
while ( my $seq = $seqin->next_seq ) {
if( $seq->alphabet ne 'protein' ) {
confess("Must only provide amino acid sequences to aacomp...skipping this seq");
next;
}
foreach my $base ( split(//,$seq->seq()) ) {
$composition{uc $base}++;
$total++;
}
}
printf("%d aa\n",$total);
printf("%5s %4s\n", 'aa', '#' );
my $ct = 0;
foreach my $base ( @BASES ) {
printf(" %s %s %3d\n", $base, $all{$base}, $composition{$base} );
$ct += $composition{$base};
}
printf( "%6s %s\n", '','-'x5);
printf( "%6s %3d\n", '',$ct);
__END__
=head1 NAME
bp_aacomp - amino acid composition of protein sequences
=head1 SYNOPSIS
bp_aacomp [-f/--format FORMAT] [-h/--help] filename
or
bp_aacomp [-f/--format FORMAT] < filename
or
bp_aacomp [-f/--format FORMAT] -i filename
=head1 DESCRIPTION
This scripts prints out the count of amino acids over all protein
sequences from the input file.
=head1 OPTIONS
The default sequence format is fasta.
The sequence input can be provided using any of the three methods:
=over 3
=item unnamed argument
bp_aacomp filename
=item named argument
bp_aacomp -i filename
=item standard input
bp_aacomp < filename
=back
=head1 FEEDBACK
=head2 Mailing Lists
User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to
the Bioperl mailing list. Your participation is much appreciated.
[email protected] - General discussion
http://bioperl.org/wiki/Mailing_lists - About the mailing lists
=head2 Reporting Bugs
Report bugs to the Bioperl bug tracking system to help us keep track
of the bugs and their resolution. Bug reports can be submitted via the
web:
=head1 AUTHOR - Jason Stajich
Email [email protected]
=head1 HISTORY
Based on aacomp.c from an old version of EMBOSS
=cut
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 05-02-2024, 08:06 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
05-02-2024, 08:06 AM
|
||
Started by seqadmin, 04-30-2024, 12:17 PM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
04-30-2024, 12:17 PM
|
||
Started by seqadmin, 04-29-2024, 10:49 AM
|
0 responses
25 views
0 likes
|
Last Post
by seqadmin
04-29-2024, 10:49 AM
|
||
Started by seqadmin, 04-25-2024, 11:49 AM
|
0 responses
28 views
0 likes
|
Last Post
by seqadmin
04-25-2024, 11:49 AM
|
Comment