Seqanswers Leaderboard Ad

**spenthil** · 01-16-2010, 06:55 PM

Are you familiar with python or perl? This can be be done pretty easily with BioPython/BioPerl.

Hopefully someone else can chime in with an existing solution.

**lindseyjane** · 01-18-2010, 12:47 AM

Yes, I am learning perl. Does anyone have an existing solution?

**spenthil** · 01-18-2010, 12:50 AM

AFAIK, nothing existing but you could use bioperl to accomplish it without too much work (my guess is a few hours tops).

**simonandrews** · 01-19-2010, 01:20 AM

I had a quick go since this seemed a fairly simple task. The documentation for mapview output isn't very clear so I may not be handling the sequence the right way when it's a reverse hit, but you can at least use this to get an idea for how this could be done.

Code:

#!/usr/bin/perl
use warnings;
use strict;

my @mapview_files = @ARGV;

foreach my $file (@mapview_files) {
  process_file ($file);
}

sub process_file {
  my ($file) = @_;
  open (IN,$file) or die "$file: $!";

  # Data Structure is a hash of chromosomes
  # containing arrays of positions where each
  # position is a hash of GATC mapping to a count
  my %chrs;

  while (<IN>) {
    chomp;
    # Ignore headers
    next if (/^#/);

    # Extract data
    my ($chr,$pos,$seq) = (split(/\t/))[1,2,14];

    # Split sequence into bases
    my @seq = split(//,$seq);

    # Add each base to the data structure
    for my $offset (0..$#seq) {
      ++$chrs{$chr}->[$pos+$offset]->{uc($seq[$offset])};
    }
  }

  # Print a header
  print join("\t",qw(File Chr Pos G A T C)),"\n";

  # Go through each chromosome
  foreach my $chr (sort keys %chrs) {

    # Go through the positions on that chromosome
    my @positions = @{$chrs{$chr}};
    for my $position (1..$#positions) {
      my @line = ($file,$chr,$position);

      # Get the counts for each base
      foreach my $base qw(G A T C) {
	if (exists $positions[$position]->{$base}) {
	  push @line, $positions[$position]->{$base};
	}
	else {
	  push @line, 0;
	}
      }

      # Print the result
      print join("\t",@line),"\n";
    }
  }
}

**lindseyjane** · 01-19-2010, 03:00 AM

Thank you so very much for taking the time to produce this code. It is an excellent starting point for me and I appreciate the time you have spent.

I hope soon I will be able to produce perl code this quickly myself!

**krobison** · 01-19-2010, 07:49 AM

Hmm, perhaps the site should have a wiki for contributed code (I once posted a Perl program as well)

**lindseyjane** · 01-19-2010, 08:43 AM

Originally posted by krobison View Post

Hmm, perhaps the site should have a wiki for contributed code (I once posted a Perl program as well)

Excellent idea, yes please! That would be so useful.

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, Yesterday, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 159 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

How to calculate proportion of reads with each base at every reference position

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News