Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • gmap_build error

    Hi guys,
    I am in process of configuring GSNAP on the cluster of my university however I am repeatedly encountering an error in one step and I cant seem to solve it. I have installed the software on the cluster and am in the process of building the mm9 genome. I have followed the steps so far as per the documentation and gmap_build works fine until it reaches the step where it says on my console:

    Building suffix array
    SACA_K called with n = 2725765482, K = 5, level 0


    It is after this step that the process crashes and gives me an error message:

    /home/satyajit/GSNAP/bin/gmapindex -d mm9 -F /home/satyajit/GSNAP/gmap-2014-07-04/gmapdb/mm9 -D /home/satyajit/GSNAP/gmap-2014-07-04/gmapdb/mm9 -S failed with return code 131 at /home/satyajit/GSNAP/bin/gmap_build line 360.

    I have tried to run this installation several times now and on different machines as well and every time it crashes during this particular phase of configuration. The maximum memory I have used to configure this is a 64GB RAM with 16 cores of processing power on the cluster. Is this step the most memory intensive? Does it require even more memory than the one I have used? Or am I simply doing something fundamentally wrong? I am quite frankly at a loss about how to go forward tackling this issue and any help you could provide me with would be greatly appreciated.
    I plan on using GSNAP for SNP tolerant alignment in my datasets.
    The command I used for gmap_build is:

    gmap_build -d mm9 -g -k 15 chr1.fa.gz chr1_random.fa.gz chr2.fa.gz chr3_random.fa.gz chr3.fa.gz chr4_random.fa.gz chr4.fa.gz chr5_random.fa.gz chr5.fa.gz chr6.fa.gz chr7_random.fa.gz chr7.fa.gz chr8_random.fa.gz chr8.fa.gz chr9_random.fa.gz chr9.fa.gz chr10.fa.gz chr11.fa.gz chr12.fa.gz chr13_random.fa.gz chr13.fa.gz chr14.fa.gz chr15.fa.gz chr16_random.fa.gz chr16.fa.gz chr17_random.fa.gz chr17.fa.gz chr18.fa.gz chr19.fa.gz chrX_random.fa.gz chrX.fa.gz chrY_random.fa.gz chrY.fa.gz chrM.fa.gz chrUn_random.fa.gz
    Last edited by Satya; 07-15-2014, 11:42 AM.

  • #2
    It appears that the build step requires sequence files to be uncompressed (https://github.com/julian-gehring/GMAP-GSNAP, look for section 4c). Have you tried using uncompressed sequence files?

    Comment


    • #3
      Isn't that the requirement for gmap_setup though? I thought gmap_build would accept gzipped files after using the -g option? It didn't work with uncompressed fastq files. I tried it out just in case right now.
      Last edited by Satya; 07-15-2014, 11:53 AM.

      Comment


      • #4
        You are right there is a "-g" option mentioned for gmap_build.

        Out of curiosity can you try the build with a single uncompressed chromosome fasta file to see if it goes through?

        Comment


        • #5
          Excellent suggestion! It worked when I used just a single uncompressed fasta file. Does this mean this I need to simply allocate more memory for the entire process?

          Comment


          • #6
            If you were passing that job along to a scheduler with a specific memory allocation then it would not hurt to increase that request.

            My hunch is that perhaps one of the chromosome files (*random*/ *un* come to mind as a culprit) may be causing the original error. You may have already tried this but I would say add a couple more chromosomes and see if that works and after that point everything except the random/un would be the next logical step to try.

            Comment


            • #7
              Dear all,

              I resolved this by running the gmap_build on a larger machine. I also got this error and chased down many paths, in the end it was as simple as needing more memory.

              In my case, I was building hg19 to work with Pacific Biosciences ToFU command line pipeline. https://github.com/PacificBiosciences/cDNA_primer/wiki. I installed the latest gmap on an ubuntu instance, started through use of MIT's starcluster software http://star.mit.edu/cluster/about.html. Resolving the proper perl version (starcluster AMI instances are notoriously out of date, so the default perl version is too far gone, so I used the smrtanalysis version to get it correct.

              So success involved first setting two environmental variables:

              export PERL5LIB=/mnt/smrtanalysis/current/miscdeps/basesys/usr/lib64/perl5:/mnt/smrtanalysis/current/miscdeps/basesys/usr/lib64/perl5/5.8.8
              export PATH=/usr/local/bin:/usr/bin:/bin:$PATH



              After setting the path correctly got me to the point where I had the same error reported above:

              Building suffix array
              SACA_K called with n = 3137161265, K = 5, level 0
              Killed
              /usr/local/bin/gmapindex -d hg19 -F "/mnt/hg19/hg19" -D "/mnt/hg19/hg19" -S failed with return code 35072 at /mnt/\
              smrtanalysis/current/analysis/bin/gmap_build line 376.


              However, Genomax provided me the hint I needed. Rather than thinking I had anything else wrong, it was clearly worth trying a bigger box. Success came by running the software on a larger ubuntu instance - r3.8xlarge (240GB) machine. Which I instantiated and added to my configuration -- I logged into the new node and executed the command:

              gmap_build -s none -k 15 -d hg19 -D /mnt/hg19 /mnt/hg19/hg19.fa

              Successfully
              Last edited by adeslat; 02-15-2016, 06:07 AM.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Advances in Sequencing Technologies
                by seqadmin







                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                Long-Read Sequencing
                Long-read sequencing has...
                12-02-2024, 01:49 PM
              • seqadmin
                Genetic Variation in Immunogenetics and Antibody Diversity
                by seqadmin



                The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                11-06-2024, 07:24 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 12-02-2024, 09:29 AM
              0 responses
              112 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-02-2024, 09:06 AM
              0 responses
              47 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-02-2024, 08:03 AM
              0 responses
              37 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 11-22-2024, 07:36 AM
              0 responses
              66 views
              0 likes
              Last Post seqadmin  
              Working...
              X