Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • lh3
    Senior Member
    • Feb 2008
    • 686

    bowtie wrapper script

    I was using the following script to launch bowtie. I think it may be useful to others and so post it here. This wrapper script uses bowtie internally, but does a little bit more things which are at least convenient to me:

    1) Take gzip'ed fastq files as input. (Most fastq files are compressed)

    2) For paired-end reads, align singletons in bowtie's single-end mode. (This saves a bowtie command line typed by yourself)

    3) Fix a minor problem in SAM out. (Names of two ends in a pair should be identical; otherwise duplicate removal may have trouble)

    4) More accurate (albeit slower) command line option: [--best --strata] -m 1. (Recommended by a users who uses bowtie a lot)

    Please reply to this thread if you think the command line option is not appropriate, or the script is buggy/can be improved. Thanks.

    Code:
    #!/usr/bin/perl -w
    
    # Contact: lh3
    # Last modified: 05NOV2009
    
    use strict;
    use warnings;
    use Cwd qw/getcwd abs_path/;
    use File::Temp qw/tempfile/;
    
    die("Usage: run_bowtie.pl <bowtieIndex> <read1.fq> [read2.fq] [options] | gzip > aln.sam.gz\n") if (@ARGV < 2 || $ARGV[0] =~ /^-/ || $ARGV[1] =~ /^-/);
    
    my $idx = shift(@ARGV);
    my $rd1 = (@ARGV && $ARGV[0] !~ /^-/)? shift(@ARGV) : '-';
    my $rd2 = (@ARGV && $ARGV[0] !~ /^-/)? shift(@ARGV) : '';
    
    # set bowtie cmd options
    my $opt0 = join(" ", @ARGV);
    $opt0 =~ s/--maxins/-X/;
    $opt0 =~ s/--minins/-I/;
    $opt0 =~ s/--un \S+\b//;
    
    # locate bowtie
    my $bowtie = &gwhich('bowtie');
    die("[run_bowtie] fail to find the bowtie executable.\n") unless ($bowtie);
    
    my ($fh1, $fh2);
    open($fh1, ($rd1 =~ /\.gz$/)? "gzip -dc $rd1 |" : $rd1) || die;
    if ($rd2) {
      open($fh2, ($rd2 =~ /\.gz$/)? "gzip -dc $rd2 |" : $rd2) || die;
    }
    
    if ($fh2) { # paired-end mode
      my ($fh, $fn_ump, @col);
      if ($opt0 !~ /-X/) {
    	warn("[run_bowtie] the maximum insert size is set as the default value (250).\n");
      }
      (undef, $fn_ump) = tempfile("./rb-$$-XXXXXX");
      open($fh, qq/| $bowtie $opt0 -S -m 1 $idx --12 -/
    	   . q/ | awk '{if($3!="*"||$12!="XM:i:0")print;else print $1"\t"$10"\t"$11>/ . qq/"$fn_ump"}'/ . q{ | perl -pe 's/^(\S+)\/[12]\t/$1\t/'}) || die;
      while (<$fh1>) {
    	$col[0] = $1 if (/^@(\S+)/);
    	$col[1] = <$fh1>; chomp($col[1]); <$fh1>;
    	$col[2] = <$fh1>; chomp($col[2]);
    	<$fh2>;
    	$col[3] = <$fh2>; chomp($col[3]); <$fh2>;
    	$col[4] = <$fh2>; chomp($col[4]);
    	print $fh join("\t", @col), "\n";
      }
      close($fh); close($fh1); close($fh2);
      # map singletons
      my $opt_se = $opt0;
      $opt_se =~ s/-X\s*\d+//;
      $opt_se =~ s/-I\s*\d+//;
      system(qq/$bowtie $opt_se -S -m 1 --best --strata --sam-nohead $idx --12 $fn_ump | awk '\$3!="*"||\$12!="XM:i:0"'/);
      unlink($fn_ump);
    } else { # single-end mode
      my ($fh, @col);
      open($fh, qq/| $bowtie $opt0 -S -m 1 --best --strata $idx --12 - | awk '\$3!="*"||\$12!="XM:i:0"'/) || die;
      while (<$fh1>) {
    	$col[0] = $1 if (/^@(\S+)/);
    	$col[1] = <$fh1>; chomp($col[1]); <$fh1>;
    	$col[2] = <$fh1>; chomp($col[2]);
    	print $fh join("\t", @col), "\n";
      }
      close($fh); close($fh1);
    }
    
    # routines to locate an executable
    
    sub dirname {
      my $prog = shift;
      my $cwd = getcwd;
      return $cwd if ($prog !~ /\//);
      $prog =~ s/\/[^\s\/]+$//g;
      return $prog;
    }
    
    sub which {
      my $file = shift;
      my $path = (@_)? shift : $ENV{PATH};
      return if (!defined($path));
      foreach my $x (split(":", $path)) {
    	$x =~ s/\/$//;
    	return "$x/$file" if (-x "$x/$file" && -f "$x/$file");
      }
      return;
    }
    
    sub gwhich {
      my $progname = shift;
      my $dirname = &dirname($0);
      my $tmp;
      chomp($dirname);
      if (-x $progname && -f $progname) {
    	return abs_path($progname);
      } elsif (defined($dirname) && (-x "$dirname/$progname" && -f "$dirname/$progname")) {
    	return abs_path("$dirname/$progname");
      } elsif (($tmp = &which($progname))) { # on the $PATH
    	return $tmp;
      } else {
    	warn("[gwhich] fail to find executable $progname.\n");
    	return;
      }
    }
  • sridharacharya
    Member
    • May 2010
    • 24

    #2
    The wrapper is indeed a good one to use compressed input files.
    I tried to use it with SOLiD reads. Thanks for posting.
    However, I found that, it can be useful only if the input is in fastq format. If we have separate compressed (cs)fasta and qual files, then, there is no way I can supply them simultaneously on command line input.

    I hope I did not miss a possible way of using compressed (cs)fasta and qual files simultaneously with the wrapper script. Or are there any other methods by which you can deal with this?

    Comment

    • Lee Sam
      Member
      • Oct 2008
      • 57

      #3
      Thanks for posting.

      Comment

      • earonesty
        Member
        • Mar 2011
        • 52

        #4
        or a patch:

        Comment

        • blackgore
          Member
          • Sep 2009
          • 20

          #5
          Originally posted by earonesty View Post
          Hi earonesty,
          thanks for the link. I couldn't get the patch to apply (though admittedly I'm not experienced with doing so). Is the command much different to "patch bowtie bowtie-gzip.patch" ?

          Comment

          • Erin Wagner
            Junior Member
            • Oct 2011
            • 1

            #6
            I also attempted to use the patch, but it failed to apply properly. Is there a chance, since the patch was last updated in Jaunary 2012, the patch is incompatible with the newest version 0.12.8 (which was released in May 2012)?

            If anyone has any ideas on how to apply the patch mention above to version 0.12.8, it would be greatly appreciated.

            Thanks.

            Erin

            Comment

            Latest Articles

            Collapse

            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM
            • SEQadmin2
              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
              by SEQadmin2


              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
              ...
              06-02-2026, 10:05 AM
            • SEQadmin2
              Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
              by SEQadmin2


              With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


              Introduction

              Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
              05-22-2026, 06:42 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            21 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-09-2026, 11:58 AM
            0 responses
            38 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-05-2026, 10:09 AM
            0 responses
            45 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-04-2026, 08:59 AM
            0 responses
            49 views
            0 reactions
            Last Post SEQadmin2  
            Working...