Seqanswers Leaderboard Ad

**lottpaul** · 11-20-2014, 09:56 AM

Creating Annovar Index

The Annovar idx format is as follows:

1. File is tab separated.

2. First Line: #BIN <BiN SIZE> <File Size>

3. Remaining lines:
<Chromosome> <BIN Starting Position> <Starting position in File> <Ending position in File>

In Perl, the following routine would create the index of the file in a hash (dictionary/map), then you'd just need to print it out:

Code:

#!/usr/bin perl;
use warnings;
use strict;

die "$0 <Annovar Database File> <BIN Size>" unless @ARGV == 2;
my $input_file = $ARGV[0];
my $bin_size = $ARGV[1];
 
if (!-e $input_file) {
	die "$input_file not found\n";
}

my $file_size = -s $input_file;

my %index;
open(my $in, "<", $input_file) or die "Couldn't open $input_file for indexing\n";

my $previous_file_position = tell $in;

while (my $ln = <$in>) {
	
	#Check input file. Some are (chr,start,stop) and others are (id,chr,start,stop).
	#If you have the latter you'll need to change the next line to account for the id column   
	my ($chr,$start,$stop) = split "\t", $ln;
	my $bin_start = int($start/$bin_size) * $bin_size;
	my $current_file_position = tell $in;

	if (!exists $index{$chr}->{$bin_start}) {
		$index{$chr}->{$bin_start} = [$previous_file_position, $current_file_position];
	}
	else{
		$index{$chr}->{$bin_start}->[1] = $current_file_position;
	}
	
	$previous_file_position = $current_file_position;
}

close $in;

print "#BIN\t$bin_size\t$file_size\n";
foreach my $chr ((1,10..19,2,20,21,22,3..9,"MT","X","Y")){ #Ordered array to match other Annovar idx files
	foreach my $index_region (sort keys %{$index{$chr}}){
		my $start	= $index{$chr}->{$index_region}->[0];
		my $stop	= $index{$chr}->{$index_region}->[1];
		print "$chr\t$index_region\t$start\t$stop\n";
	}
}

I've checked the output against a couple idx files (clinvar20140702, AFR.sites.2012) provided by Annovar and get perfect agreement.

**molgen2** · 05-28-2015, 01:37 AM

The script is not working for me. I get an error message ("$current_position" requires explicit package name). After changing $current_position to $current_file_position in line 27, I get error messages 'Argument "chr4" isn't numeric in division (/) at ./makeannovarindex.pl line 23, <$in> line 20493.'

then I change line 22 from
my ($chr,$start,$stop) = split "\t", $ln;
to
my ($junk,$chr,$start,$stop) = split "\t", $ln;

the errors stop, but get no output (except for line 1: "#BIN 1000 24679810")

Does anybody experience the same issues? Could anyone get this script working?

**canisirius** · 05-28-2015, 04:05 AM

Hi,

First of, thanks to lottpaul for providing the solutions.

I have modified a line or two, I suppose. So I am attaching the script that I used finally.

Following is the command, I used to run the script.

Code:

perl compileAnnnovarIndex.pl hg19_snp138NonFlagged.txt 1000 > hg19_snp138NonFlagged.txt.idx

I hope it works for you too.

Attached Files

compileAnnnovarIndex.pl (1.2 KB, 512 views)

Topics	Statistics	Last Post
The Adaptation of the Cell Cycle in Multiciliated Cells by seqadmin Started by seqadmin, 06-07-2024, 06:58 AM	0 responses 13 views 0 likes	Last Post by seqadmin 06-07-2024, 06:58 AM
New Method for DNA Sequence Amplification by seqadmin Started by seqadmin, 06-06-2024, 08:18 AM	0 responses 20 views 0 likes	Last Post by seqadmin 06-06-2024, 08:18 AM
New Tools Enhance Single-Molecule DNA Analysis with Minimal Samples by seqadmin Started by seqadmin, 06-06-2024, 08:04 AM	0 responses 18 views 0 likes	Last Post by seqadmin 06-06-2024, 08:04 AM
SIX2 Protein Identified as a Key Player in Prostate Cancer Treatment Resistance by seqadmin Started by seqadmin, 06-03-2024, 06:55 AM	0 responses 13 views 0 likes	Last Post by seqadmin 06-03-2024, 06:55 AM

Seqanswers Leaderboard Ad

Announcement

Creating index for Annovar database file

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News