Seqanswers Leaderboard Ad

**atcghelix** · 08-05-2013, 03:04 PM

Not pretty, but works for me.

Code:

#!/usr/bin/perl

use strict;
use warnings;

open (my $codes, "<", "codes.txt");
open (my $seqs, "<", "test.fasta");
open (my $output, ">", "merged.txt");

my %codeHash;
while (my $code = <$codes>) {
    chomp $code;
    if ($code =~ /(\S+)\s(\S+)/) {
        print $1 . "  " . $2 . "\n";
        $codeHash{$1} = $2;
    }
}

while (my $seq = <$seqs>) {
    chomp $seq;
    if ($seq =~ />(\S+)/) {
        if (exists $codeHash{$1}) {
            print $output ">" . $codeHash{$1} . "_" . $1 . "\n";
        } else {
        print $output $seq . "\n";
        }
    } else {
        print $output $seq . "\n";
    }
}

**Andres_Gomez** · 08-05-2013, 03:14 PM

Thanks, can you give me a hand in usage?

**Andres_Gomez** · 08-05-2013, 03:49 PM

I tried running the code you provided as:
Desktop $ perl merge.pl -code Howler.final.groups -seq Howler.final.fasta -output >merged.txt

and goy this error:

readline() on closed filehandle $codes at merge.pl line 11.
readline() on closed filehandle $seqs at merge.pl line 19.

Can you please give me a clue on what has happened?

**dnewkirk** · 08-05-2013, 04:25 PM

Originally posted by Andres_Gomez View Post

I tried running the code you provided as:
Desktop $ perl merge.pl -code Howler.final.groups -seq Howler.final.fasta -output >merged.txt

and goy this error:

readline() on closed filehandle $codes at merge.pl line 11.
readline() on closed filehandle $seqs at merge.pl line 19.

Can you please give me a clue on what has happened?

The files are hard coded, ie, the script isn't using Getopt::Long to accept flags. So, you need to change the names in the script.

**Andres_Gomez** · 08-05-2013, 07:05 PM

Any clues on where I should change the names? is it lines 11 and 19?

**SES** · 08-06-2013, 06:10 AM

Originally posted by Andres_Gomez View Post

Any clues on where I should change the names? is it lines 11 and 19?

Take a look at the open() statements. That is, lines 6, 7, and 8 are the lines where you need to change the file names. Also, it is best practices to test that you could open the file by putting an "or die ...." statement after open. That way, the program will halt and you can easily see what is going wrong (and it's trying to open a specific file).

**mastal** · 08-06-2013, 06:30 AM

after changing the names of the files in the script, you want to call the script as follows:

Code:

Desktop $ perl merge.pl Howler.final.groups Howler.final.fasta merged.txt

**kmcarr** · 08-06-2013, 08:31 AM

Originally posted by mastal View Post

after changing the names of the files in the script, you want to call the script as follows:

Code:

Desktop $ perl merge.pl Howler.final.groups Howler.final.fasta merged.txt

No, you wouldn't. If the names of the files are hardcoded in the script there is no purpose to putting them on the command line. The will simply be ignored since the script does not have any code to do anything with command line arguments. Assuming you have hardcoded the file names in the script (ignoring that that is BAD(tm)) you would simply call the script as:

Code:

Desktop $ perl merge.pl

The files Howler.final.groups and Howler.final.fasta must be present in the current working directory or the script will throw an error. The file merged.txt will be created by script, OVERWRITING any previous version of the file present in the current working directory.

**mastal** · 08-06-2013, 08:37 AM

yes, kmcarr is right.

**atcghelix** · 08-06-2013, 09:08 AM

Sorry about that--just saw this. Here's a better, safer version. Won't overwrite output file if it already exists, and uses command line flags for the files.

Usage:
perl merge.pl --code codefile.txt --seq seqfile.fasta --out mergedOutput.fasta

Code:

#!/usr/bin/perl

use strict;
use warnings;
use Getopt::Long;

my $codes;
my $seqs;
my $out;
my $usage = "Usage: perl merge.pl --code <file_with_codes.txt> --seq <sequences.fasta> --out <outfile.txt>\n";

GetOptions  ("code=s" => \$codes,
             "seq=s" => \$seqs,
             "out=s" => \$out);

if (!defined $codes) {
    print "Must supply codes file name.\n";
    die "$usage";
} elsif (!defined $seqs) {
    print "Must supply sequences file name.\n";
    die "$usage";
} elsif (!defined $out) {
    print "Must supply output file name.\n";
}

if(-e "$out") {
    die "File $out already exists--stopping so you don't overwrite.\n";
}   

open (my $codesFH, "<", "$codes");
open (my $seqsFH, "<", "$seqs");
open (my $outputFH, ">", "$out");

my %codeHash;
while (my $code = <$codesFH>) {
    chomp $code;
    if ($code =~ /(\S+)\s(\S+)/) {
        $codeHash{$1} = $2;
    }
}

while (my $seq = <$seqsFH>) {
    chomp $seq;
    if ($seq =~ />(\S+)/) {
        if (exists $codeHash{$1}) {
            print $outputFH ">" . $codeHash{$1} . "_" . $1 . "\n";
        } else {
        print $outputFH $seq . "\n";
        }
    } else {
        print $outputFH $seq . "\n";
    }
}
print "Finished executing.\n";

**atcghelix** · 08-06-2013, 09:14 AM

Also, what do you want it to do if it doesn't find the corresponding name in the codes file. Currently it just prints unmatched names without a C_XXX prefix in the output file if it doesn't find a match.

**Andres_Gomez** · 08-06-2013, 10:50 AM

Worked wonders! thanks a lot atcghelix, mastal, kmcarr, SES and everybody else giving advice.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

perl_merge fasta and group files (454)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News