Hi,
I have a gff3 file which contains ID numbers in the name attribute. I want to replace these ID numbers with transcript names. I have another file containing the ID numbers in column 1, and the corresponding transcript name in column 2 (inputheaderfile), so I want to use this as the input file for to guide replacement.
I've tried using perl script to do a system call of sed to replace the ID numbers in file, but I've calculated this to take a very long time and I don't have the scripting skills to write something better. Could anyone help or provide a better way of doing this? I've pasted what I've written below and I realize how newbie it is but hope to get some help.
Thanks in advance!
I have a gff3 file which contains ID numbers in the name attribute. I want to replace these ID numbers with transcript names. I have another file containing the ID numbers in column 1, and the corresponding transcript name in column 2 (inputheaderfile), so I want to use this as the input file for to guide replacement.
I've tried using perl script to do a system call of sed to replace the ID numbers in file, but I've calculated this to take a very long time and I don't have the scripting skills to write something better. Could anyone help or provide a better way of doing this? I've pasted what I've written below and I realize how newbie it is but hope to get some help.
Thanks in advance!
Code:
# $usage = "$0 <inputheaderfile> <gfffile>"; $headfile = $ARGV[0]; $gfffile = $ARGV[1]; open(GFF, $gfffile) or die "cannot open $gfffile\n"; ## for every line the GFF file, scan through the entire input header file and replace the name attribute from ID number to transcript name, but only where the name attribute is at the end of line (type = gene) while ($line = <GFF>) { print $.,"\n"; open(HEADER, $headfile) or die "cannot open $headfile\n"; while ($line2 = <HEADER>) { chomp $line2; @cols = split("\t", $line2); # if line in gff file contains 'Name=IDnumber' at end of line, then replace this with 'Name=transcriptname' based on the inputheaderfile if ( $line =~ /Name=$cols[0]\n/ ) { `sed -i 's/Name=$cols[0]\$/Name=$cols[1]/g' $gfffile`; } else { next; } } } exit;
Comment