Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • DNAjunk
    Member
    • Jun 2009
    • 62

    SHRiMP Memory Usage

    Hello!

    By using SHRiMP (version 1.2.1) I have tried to map about 380 Mio. SOLiD SAGE reads of length 35bp in color space format onto the reference sequence.
    However, the RAM memory usage was increasing to more than 5GB, and there was no indication that augmentation would stop. So, I terminated the program manually without having any result or output file.

    Has anybody made the same experience?

    Should I split the reads file into several smaller data files and run them separately? And what is the optimal/maximal reads one could give as input in a run when the program should not use more than, let's say 2GB of RAM?

    Thanks for any help and suggestions!
  • Torst
    Senior Member
    • Apr 2008
    • 275

    #2
    Originally posted by DNAjunk View Post
    By using SHRiMP (version 1.2.1) I have tried to map about 380 Mio. SOLiD SAGE reads of length 35bp in color space format onto the reference sequence. However, the RAM memory usage was increasing to more than 5GB, and there was no indication that augmentation would stop.
    You need to split your reads file into chunks of 1,000,000 reads say. Run SHRIMP separately on each chunk. Then just concatenate the SHRIMP output files. The result is identical to what you would have got by feeding all the reads at once!

    The reason this works is because SHRIMP indexes the reads. Give it less reads, and it needs less memory. You will need to experiment with the chunk size to suit your computer's RAM size.

    We use this method even on our server with 64GB RAM.

    Comment

    • nilshomer
      Nils Homer
      • Nov 2008
      • 1283

      #3
      Originally posted by Torst View Post
      The reason this works is because SHRIMP indexes the reads.
      If you split the reads for any aligner, this is still the case . A good question is it theoretically more optimal to index the reads or the reference given a lookup into the index is ~O(1)?

      Comment

      • Torst
        Senior Member
        • Apr 2008
        • 275

        #4
        Originally posted by nilshomer View Post
        If you split the reads for any aligner, this is still the case
        This may be true for BFAST (your software?) and SHRIMP, but some short read aligners only index the reference. I think MAQ still does this? In those cases there is no memory occupied by a read index - the memory is only proportional to the reference index.

        Comment

        • nilshomer
          Nils Homer
          • Nov 2008
          • 1283

          #5
          Originally posted by Torst View Post
          This may be true for BFAST (your software?) and SHRIMP, but some short read aligners only index the reference. I think MAQ still does this? In those cases there is no memory occupied by a read index - the memory is only proportional to the reference index.
          If you index 6.4 billion reference positions, it does take up a non-trivial amount of memory (i.e. BFAST). On the other hand, indexing the reads, like you say, is proportional to the number of reads (see MAQ and SHRiMP). That is why BWA and Bowtie use a Burrows-wheeler transform to compress the reference index at the cost of speed. Nevertheless, you have to "sort" or index each read chunk, whereas a reference index is only computed once per reference. It follows that indexing a reference is better than indexing reads, assuming the lookup is O(1), which can be achieved.

          I still don't understand why
          The result is identical to what you would have got by feeding all the reads at once.
          is explained by
          The reason this works is because SHRIMP indexes the reads
          Could you give me an example where splitting the reads into discreet chunks and then merging (or catting) them together would not give you the same answer as aligning all the reads together?

          Comment

          • Torst
            Senior Member
            • Apr 2008
            • 275

            #6
            Could you give me an example where splitting the reads into discreet chunks and then merging (or catting) them together would not give you the same answer as aligning all the reads together?
            There is no such example. My explanation to the original poster was imprecise.

            Comment

            Latest Articles

            Collapse

            • SEQadmin2
              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
              by SEQadmin2


              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
              ...
              06-02-2026, 10:05 AM
            • SEQadmin2
              Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
              by SEQadmin2


              With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


              Introduction

              Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
              05-22-2026, 06:42 AM
            • SEQadmin2
              Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
              by SEQadmin2

              Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


              Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
              05-06-2026, 09:04 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, Today, 08:59 AM
            0 responses
            8 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-02-2026, 12:03 PM
            0 responses
            21 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-02-2026, 11:40 AM
            0 responses
            15 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 05-28-2026, 11:40 AM
            0 responses
            29 views
            0 reactions
            Last Post SEQadmin2  
            Working...