Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • sratoolkit v.2.3.1

    Has anyone tried the sratoolkit (v.2.3.1) that is currently available from NCBI SRA. The development kit user guide documentation seems to refer to *two* perl scripts (config-assistant.pl and reference-assistant.pl) but I can only see one (config-assistant.pl) in the download tarball for CentOS.

    I am trying to extract files from a recent SRA accession and the process is failing because reference files are not available.

    Main announcement on the SRA page seems to indicate that the process of downloading necessary data/reference files should be automatic. Anyone managed to get this to work?

  • #2
    Update:

    I noticed after I posted the original thread that this page seems to have notes that one should run a java jar file (on linux and OS X) that will automatically set up the download environment. This java file seems to be present only in the pre-compiled tar (if one builds the toolkit from source then there is no corresponding jar file).

    Unfortunately neither (pre-compiled binary or built from source) seem to be working with SRA files (including this test dataset SRR390728) at the moment.

    Code:
    fastq-dump.2.3.1 err: manager not found while constructing path within virtual file system module - failed SRR390728.sra Written 0 spots total
    Odd thing is it is still making a fastq file (can't tell if it is complete).

    Time to contact "sra@ncbi". If they respond, I will update this thread.

    Comment


    • #3
      Try building from source.

      There's always little nagging stuff with SRA.

      Comment


      • #4
        Originally posted by Richard Finney View Post
        Try building from source.

        There's always little nagging stuff with SRA.
        Already tried that. No go.

        Comment


        • #5
          I had a old bzip tar ball of sra_sdk-2.1.6 if you still need it


          -bash-3.00$ cat sra_sdk-2.1.6/reference-assistant.pl
          Code:
          
          #!/usr/local/bin/perl -w
          ################################################################################
          use strict;
          
          use File::Basename;
          use File::Spec;
          
          sub println { print @_; print "\n"; }
          
          my $MSWIN;
          ++$MSWIN if ($^O =~ /mswin/i);
          
          print "Checking refseq configuration... ";
          my $VDB_CONFIG = find_bin("vdb-config");
          die "not found" unless ($VDB_CONFIG);
          println "OK";
          
          print "Checking align-info... ";
          my $ALIGN_INFO = find_bin("align-info");
          die "not found" unless ($ALIGN_INFO);
          println "found";
          
          my $WGET;
          print "Checking wget... ";
          my $out = `wget -h 2>&1`;
          if ($? == 0) {
            println "found";
            $WGET = "wget -O";
          } else {
            println "not found";
          }
          unless ($WGET) {
            print "Checking curl...";
            $out = `curl -h 2>&1`;
            if ($? == 0) {
              println "found";
              $WGET = "curl -o";
            } else {
              println "not found";
            }
          }
          unless ($WGET) {
            print "Checking ./wget... ";
            my $cmd = dirname($0) ."/wget";
            $out = `$cmd -h 2>&1`;
            if ($? == 0) {
              println "found";
              $WGET = "$cmd -O";
            } else {
              println "not found.\nCannot continue.";
              exit 1;
            }
          }
          
          my $refseq_dir = simple_refseq_path();
          
          if ($#ARGV > -1) {
              foreach (@ARGV) {
                  load($_);
              }
          } else {
              while (1) {
                  my $f = ask("Enter cSRA file name (Press Enter to exit)");
                  last unless ($f);
                  load($f);
              }
          }
          
          sub ask {
              my ($prompt) = @_;
              print "$prompt: ";
              my $in = <STDIN>;
              chomp $in;
              return $in;
          }
          
          sub load {
              my ($f) = @_;
              println "Determining $f external dependencies...";
              my $cmd = "$ALIGN_INFO $f";
              my @info = `$cmd`;
              my $refs = 0;
              if ($?) {
                  println "$f: failed";
              } else {
                  my $ok = 0;
                  my $ko = 0;
                  foreach (@info) {
                      chomp;
                      my @r = split /,/;
                      if ($#r >= 3) {
                          my ($seqId, $remote) = ($r[0], $r[3]);
                          ++$refs;
                          if ($remote eq 'remote') {
                              print "Downloading $seqId... ";
                              my $cmd = "$WGET \"$refseq_dir/$seqId\""
                                  . " http://ftp-trace.ncbi.nlm.nih.gov/sra/refseq/$seqId"
                                  . " 2>&1";
                              `$cmd`;
                              if ($?) {
                                  println "failed";
                                  ++$ko;
                              }
                              else {
                                  println "OK";
                                  ++$ok;
                              }
                          }
                      }
                  }
                  print "All " . $refs . " references were checked (";
                  print "$ko failed, " if ($ko);
                  println "$ok downloaded)";
              }
          }
          
          sub simple_refseq_path {
              my %refseq;
              $refseq{s} = refseq_config('servers');
              $refseq{v} = refseq_config('volumes');
              $refseq{p} = refseq_config('paths');
          
              if (   ($refseq{s} && !$refseq{v})
                  || ($refseq{v} && !$refseq{s}))
              {   die "Invalid configuration"; }
          
              if ($refseq{s} && $refseq{v}) {
                  if ((index($refseq{s}, ":") != -1) || (index($refseq{v}, ":") != -1)) {
                      die "Unexpected '$refseq{s}/$refseq{v}'";
                  } else {
                      return "$refseq{s}/$refseq{v}";
                  }
              } elsif ($refseq{p}) {
                  return PATH_VDB2WIN($refseq{p});
              } else {
                  print "Cannot find configuration. Please run 'config-assistant.pl'\n";
                  exit 1;
              }
          }
          
          sub refseq_config {
              my ($nm) = @_;
              my $v = `$VDB_CONFIG refseq/$nm 2>&1`;
              if ($?) {
                  if ($v =~ /path not found while opening node/) {
                      $v = '';
                  } else {
                      die $!;
                  }
              } else {
                  $v =~ /<$nm>(.*)<\/$nm>/;
                  die "Invalid 'refseq/$nm' configuration" unless ($1);
                  $v = $1;
              }
              return $v;
          }
          
          sub find_bin {
            my ($name) = @_;
          
            my $basedir = dirname($0);
          
            # built from sources
            if (-e File::Spec->catfile($basedir, "Makefile")) {
              my $f = File::Spec->catfile($basedir, "build");
              $f = File::Spec->catfile($f, "Makefile.env");
              if (-e $f) {
                my $try = `make -s bindir -C $basedir 2>&1`;
                if ($? == 0) {
                  chomp $try;
                  $try = File::Spec->catfile($try, $name);
                  my $tmp = `$try -h 2>&1`;
                  if ($? == 0) {
                    return $try;
                  }
                }
              }
            }
          
            # try the same directory as the script
            my $try = File::Spec->catfile($basedir, $name);
            my $tmp = `$try -h 2>&1`;
            if ($? == 0) {
              return $try;
            }
          
            # check from PATH
            $try = "$name";
            $tmp = `$try -h 2>&1`;
            if ($? == 0) {
              return $try;
            }
          
            return 0;
          }
          
          sub WIN_TRANSLATE {
            ($_) = @_;
            return $_ unless($MSWIN);
            tr|/|\\|;
            return $_;
          }
          
          sub PATH_VDB2WIN {
            ($_) = @_;
            return $_ unless($MSWIN);
            $_ = WIN_TRANSLATE($_);
            s/^\\([a-zA-Z])\\/$1:\\/;
            return $_;
          }
          
          ################################################################################
          # EOF #
          The SRA documentation appears out of sync with the code.

          Comment


          • #6
            After some communication with the SRA support here is the new protocol to be followed.

            Each user needs to run the "configuration-assistant.perl" that is present in the "bin" directory (if you compile from source) or you could use the Java jar found in the precompiled tarball (do one or the other).

            While you are running this perl script you will reach a point where the software asks you "Would you like to test SRA files for remote reference dependencies? [y/N]". Choose "Yes" (default answer is No) which will then prompt you to provide an SRA accession number a step or two down the road. Have a test SRA# handy (if you are not working with a specific one). If you have correctly setup everything then the script will "download" the reference files it needs on the fly and store them in a directory that you designate when you run the perl script first time. A group writable directory can be used for this purpose, if multiple people need to dump SRA data.

            PS: The SRA# that I was originally working with turned out to have a corrupt .sra file at source. SRA is going to fix that problem.
            Last edited by GenoMax; 03-28-2013, 12:19 PM.

            Comment


            • #7
              Dear all, try using the ABSOLUTE PATH of that SRA file. The error means the file could not be found.

              Comment


              • #8
                Yeah, using an absolute path works.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Latest Developments in Precision Medicine
                  by seqadmin



                  Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                  Somatic Genomics
                  “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                  05-24-2024, 01:16 PM
                • seqadmin
                  Recent Advances in Sequencing Analysis Tools
                  by seqadmin


                  The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                  05-06-2024, 07:48 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 05-24-2024, 07:15 AM
                0 responses
                93 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-23-2024, 10:28 AM
                0 responses
                105 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-23-2024, 07:35 AM
                0 responses
                106 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-22-2024, 02:06 PM
                0 responses
                11 views
                0 likes
                Last Post seqadmin  
                Working...
                X