Header Leaderboard Ad

Collapse

Anyone using SNAP from UC Berkeley?

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Anyone using SNAP from UC Berkeley?

    I might late for the game, but this thing http://snap.cs.berkeley.edu/ said it's 10-100x faster than BWA etc. With a cost of needing 64GB memory.

    Best,

    dong

  • #2
    Didn't know this. Quite impressive. For my simulated 100*2 data set, bwa takes 71 seconds. Snap is indeed much faster. Here is the "time" output:

    real 5m27.244s
    user 0m3.329s
    sys 0m15.825s

    Note that the CPU time taken by snap should be between 3.3 and 3.3+15.8 seconds. It is hard to get an accurate timing on my tiny data set (so don't take mine as a good evaluation). For snap, most wall-clock time goes to index loading. My machine cannot cache the entire snap index in memory.

    On accuracy, bwa is able to align 93% of reads without a single mismapping (bowtie2 is similar; novoalign can do even better), while for the highest mapQ=60, 0.05% snap mappings are wrong, which means snap does not have enough power to distinguish some good and bad hits. The manuscript chooses 0.05% as the cutoff because snap is unable to achieve higher accuracy while bwa can. That being said, how much 0.05% mismapping matters to variant calling is unknown to me (certainly matters to SV discovery); accuracy on real data may also be different.

    The peak memory used by snap is about 37GB, not as bad as 64GB.

    In summary, snap trades memory for speed to achieve >10X speedup in comparison to bwa. For 100bp simulated PE reads, it is not as accurate as bwa and novoalign, but its accuracy is arguably sufficient for SNP/indel calling.

    Comment


    • #3
      Hmm... "bwa fastmap" takes 11 seconds. Although "fastmap" does not give the final alignments, it only takes a few more seconds to generate them. The accuracy is about 0.05% as I remember. To this end, snap is only marginally faster than fastmap, while taking 7X more memory.

      Comment


      • #4
        The current version if SNAP as I understand is mainly supported for human genome...I had a list of trouble getting it to run on other genomes and finally gave it..may need some more time to turn into a mature software.

        -Abhi

        Comment


        • #5
          Hi I tried to index this fasta with "-s 16" parameter but I couldn't do it with 64GB RAM. Can someone give it a try and tell me how much RAM I need to run this??

          http://chiulab.ucsf.edu/SURPI/databa...red.uniq.fa.gz

          Thanks a lot in advance

          The command line looks like:
          snap index Bacterial_Refseq_05172012.CLEAN.LenFiltered.uniq.fa snap_index_Bacterial_Refseq_05172012.CLEAN.LenFiltered.uniq_s16 -s 16 -O1000

          Comment


          • #6
            SNAP is incredibly fast but very inaccurate in my testing, which was over a year ago; it may have improved. Also it has (or had, anyway) a hard limit of ~3gbp reference size. Human HG19 barely fit in some versions, and didn't on others.

            So - if that file is more than 3 gigabases or so, it won't work no matter how much RAM you have. BBMap is slower than SNAP, but has no upper bound on the number of scaffolds or total reference size; it works on both refseq microbial and nt. It does, however, require ~6 bytes per bp, or roughly 3 bytes per bp in low-memory mode.

            Comment


            • #7
              Originally posted by Brian Bushnell View Post
              SNAP is incredibly fast but very inaccurate in my testing, which was over a year ago; it may have improved. Also it has (or had, anyway) a hard limit of ~3gbp reference size. Human HG19 barely fit in some versions, and didn't on others.

              So - if that file is more than 3 gigabases or so, it won't work no matter how much RAM you have. BBMap is slower than SNAP, but has no upper bound on the number of scaffolds or total reference size; it works on both refseq microbial and nt. It does, however, require ~6 bytes per bp, or roughly 3 bytes per bp in low-memory mode.
              Oh I see. Thanks for your quick reply.

              I am trying the SURPI pipeline developed by UCSF

              I think I will try to understand its script and see if it is possible to substitute snap with bwa...

              Comment


              • #8
                My last comment was old. Recent snap is good. It is very fast and fairly accurate. What I am not sure is whether it is able to find a bit longer indels.

                Comment


                • #9
                  Originally posted by lh3 View Post
                  My last comment was old. Recent snap is good. It is very fast and fairly accurate. What I am not sure is whether it is able to find a bit longer indels.
                  Are you talking about the 1.0beta or the old 0.15.4?

                  What about RAM usage?

                  Thanks

                  Comment


                  • #10
                    I found that later in the pipeline requires indexing the 70GB nt file.

                    I suppose this might require more than 1TB RAM to run. I don't think I have budget for such a machine. I might as well think about using bwa as a sub...

                    Comment


                    • #11
                      I forgot the version I was trying. It has been a while. SNAP does require a lot of memory, tens of GB for human genome. I don't know if it works for genomes longer than 4GB.

                      I have talked to the SNAP developers once. They are extremely strong on the technical end.

                      Bwa works on nt. The index is here (max 20 connections):

                      ftp://hengli-data:[email protected]/nt/

                      You need ~110GB RAM for mapping.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Targeted Sequencing: Choosing Between Hybridization Capture and Amplicon Sequencing
                        by seqadmin




                        Targeted sequencing is an effective way to sequence and analyze specific genomic regions of interest. This method enables researchers to focus their efforts on their desired targets, as opposed to other methods like whole genome sequencing that involve the sequencing of total DNA. Utilizing targeted sequencing is an attractive option for many researchers because it is often faster, more cost-effective, and only generates applicable data. While there are many approaches...
                        03-10-2023, 05:31 AM
                      • seqadmin
                        Expert Advice on Automating Your Library Preparations
                        by seqadmin



                        Using automation to prepare sequencing libraries isn’t a new concept, and most researchers are aware that there are numerous benefits to automating this process. However, many labs are still hesitant to switch to automation and often believe that it’s not suitable for their lab. To combat these concerns, we’ll cover some of the key advantages, review the most important considerations, and get real-world advice from automation experts to remove any lingering anxieties....
                        02-21-2023, 02:14 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 03-17-2023, 12:32 PM
                      0 responses
                      8 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-15-2023, 12:42 PM
                      0 responses
                      17 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-09-2023, 10:17 AM
                      0 responses
                      66 views
                      1 like
                      Last Post seqadmin  
                      Started by seqadmin, 03-03-2023, 12:03 PM
                      0 responses
                      64 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X