Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Approximate botwie runtime

    this maybe a general question, from reading the article at http://genomebiology.com/2009/10/3/R25, I kind of feel that the instance of bowtie that I am running right now is behaving differently than mentioned in the article in terms of the time print... I maybe doing something wrong or something

    I have SRA paired-end data downloaded from http://www.ncbi.nlm.nih.gov/sra/SRX026384?report=fullthat I am mapping against the human reference genome that is provided by bowtie... the file is roughly about 19 million reads and is about 6 GB in size. The article says it should be possible to use a normal desktop computer to be able to carry out the task in a very short time, they mentioned a matter of minutes without the indexing step (building BWT). But later on they mentioned that a server computer can take upto 21 hours building the index. My laptop is Ubuntu 32 bits, 2 GB ram, 4 GB swap, dual core and I have been running a multi-threaded bowtie instance for the past 3 days, does this sound normal ? How long did it estimably for you my colleagues when you ran bowtie ??

    Here is how my query looks like :

    $ bowtie hg19 -q /PATH/SRR065070.fastq -S align.map --offrate 20 -p 2



    the 'hg19' argument passed to the bowtie is just a placeholder for the reference since I am invoking bowtie from within the directory where the hg19.ebwt.zip was extracted.. It generates an alignment file align.map but it is getting populated on a very slow rate that over the past 3 days and 10 hours only 205 MB were written to it

  • #2
    You say you only have 2GB of RAM. How much is specified as a minimum requirement in the manual and/or paper? Consider that hg19 is 3.2 billion bases.

    Comment


    • #3
      According to the paper (http://genomebiology.com/2009/10/3/R25):

      'A Bowtie index for the human genome fits in 2.2 GB on disk and has a memory footprint of as little as 1.3 GB at alignment time, allowing it to be queried on a workstation with under 2 GB of RAM.'

      However, the current pre-built hg indices on their site are larger than 2.2 GB. Also, the memory footprint might be bigger for p>1; have you tried running this single threaded? Use something like the System Monitor or 'top' to figure out if your job fits in the machine's RAM - if your system is forced to use swap just to hold the index I expect the run will be desperately slow.

      Comment


      • #4
        Speaking of parallel performance; the paper says that the memory image of the index is shared by threads which could increase performance on multiple cores and that there will not be a 'substantial' increase in memory consumption upon using multiple threads. So these threads they synchronize their activities (fetching reads, outputting results, switching between indices and marking jobs).

        On your cue, RDW, I checked whether or not SWAP was involved, so I see that both processors are running full blast and the bowtie job occupies 1.5 GB of RAM, however, I see the swap with 1.3 GB consumption but it is not clear to me whether this is coming from bowtie, I haven't tried running a single threaded job, my decision to run a dual thread was the notion that parallelism was gonna cut short the time...

        Nilshomer, in the paper they ran bowtie on a server and on a PC and benchmarked the performance, the PC had 2 GB of RAM and this is why I was optimistic...

        Comment


        • #5
          the human genome takes about 3.3GB of memory, so the swapping is caused by bowtie. this is your major bottleneck.
          multithreading does not increase memory requirement.

          Comment


          • #6
            When I run Bowtie on a Core 2 Duo 2.0 Ghz using both cores (-p 2) with 3.3 GB available RAM under Ubuntu 32 bit it would take a few hours to align 19 million reads to hg19, depending on what options I'm using. I often run it overnight so I don't know exactly how long, but definitely less than 8 hours.

            You should probably add more RAM (put in 4GB to get the max ~3.3 available on a 32 bit system).

            Comment


            • #7
              use -t to know the run time

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Genetic Variation in Immunogenetics and Antibody Diversity
                by seqadmin



                The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                11-06-2024, 07:24 PM
              • seqadmin
                Choosing Between NGS and qPCR
                by seqadmin



                Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                10-18-2024, 07:11 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 11-08-2024, 11:09 AM
              0 responses
              128 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 11-08-2024, 06:13 AM
              0 responses
              97 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 11-01-2024, 06:09 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 10-30-2024, 05:31 AM
              0 responses
              25 views
              0 likes
              Last Post seqadmin  
              Working...
              X