Seqanswers Leaderboard Ad

**gringer** · 11-06-2013, 11:03 PM

I remembered a Rosalind problem about this:

My "quickie" code to solve that problem was written in Perl:

Code:

#!/usr/bin/perl
use warnings;
use strict;
my %merCount = ();
my @merOrder = ();
for(my $i = 0; $i < 256; $i++){
    my $mer4 = sprintf("%s%s%s%s",
		       int($i / 64)%4, int($i / 16)%4, 
		       int($i /  4)%4, int($i /  1)%4);
    $mer4 =~ tr/0123/ACGT/;
    $merCount{$mer4} = 0;
		$merOrder[$i] = $mer4;
}
open(my $f, "< input.txt");
my $fastaLabel = <$f>;
my $seq = "";
while(<$f>){
		s/\s+//;
		$seq .= $_;
}
for(my $i = 0; $i < (length($seq)-3); $i++){
		$merCount{substr($seq, $i, 4)}++;
}
my @counts = ();
for(my $i = 0; $i < 256; $i++){
		printf("%s - %d\n",$merOrder[$i],$merCount{$merOrder[$i]});
		push(@counts, $merCount{$merOrder[$i]});
}
print(join(" ", @counts)."\n");

If you don't care about the order and don't want zero counts, then this code simplifies quite a lot.

**LeightonP** · 11-07-2013, 04:24 AM

If you can get your sequences into R, then the oligonucleotideFrequency function (which underlies alphabetFrequency/dinucleotideFrequency etc.) may be helpful to you.

http://svitsrv25.epfl.ch/R-doc/library/Biostrings/html/alphabetFrequency.html

Code:

oligonucleotideFrequency(yeast1, 4)

where yeast1 is your DNAString/DNAStringSet.

**yzzhang** · 11-07-2013, 05:30 AM

the script "calc.kmerfreq.pl" in the pipeline multi-metagenome assembly: https://github.com/MadsAlbertsen/multi-metagenome can do this job

**AndrewRGross** · 11-07-2013, 04:12 PM

Awesome: each of you have been a big help.

My current plan is to employ the R script, because the server I'm working on already has R and the associated command line tools installed.

That Albertsen metagenome guide looks very, very promising too, so I might incorporate whatever tools I can from that.

Again, thank you all.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 18 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

How do I calculate tetranucleotide frequencies?

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News