Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • lh3
    Senior Member
    • Feb 2008
    • 686

    SAM: a generic alignment format

    For NGS data analysis, an aligner tends to be successful when it comes with utilities for comprehensive downstream analyses such as reference based assembly, SNP/indel calling and alignment viewer. Eland/GAPipeline, Soap and Maq are such examples. Unfortunately, it is non-trivial to implement all these downstream analyses and implementing these for each aligner would be a waste of time and human resources as well. Mostly we want to separate alignment from the downstream analyses after the alignment. To achieve this, we need a generic alignment format that makes all aligners happy. NovoAlign and Bowtie can output Maq alignment format to take the advantage of Maq downstream data processing. However, Maq format does not really suit the goal. It does not support longer reads nor alignment with more than one indel and it is too specific to Maq. To solve this problem, the 1000Genome Project Committee decided to develop a generic alignment format. And now the first version of specification and implementation have come out.

    The new alignment format, SAM (Sequence Alignment/Map), is the collaborative result of several major genome centres. It eliminates the major defects of Maq format while retaining its advantages. We also migrated and improved various downstream data processing implemented in Maq/Maqview, such as indexing, pileup, viewer and consensus caller. For more information, please check website:



    I hope samtools may help aligner developers to promote their own software: once a program can generate alignment in SAM format, Maq-like downstream analysis will be available right now.
  • bioinfosm
    Senior Member
    • Jan 2008
    • 483

    #2
    Thanks Heng.
    It looks this will be very useful and make it easy to try various new upcoming tools..

    Is it possible to have a workflow like MAQ's easyrun that takes through a user case for SAM/BAM?
    --
    bioinfosm

    Comment

    • ECO
      --Site Admin--
      • Oct 2007
      • 1360

      #3
      Hey lh3,

      Thanks for posting this here. I'm going to sticky it in the Bioinformatics forum for a while to make sure everyone sees it!

      Comment

      • lparsons
        Member
        • Nov 2008
        • 28

        #4
        The documentation notes that "Only MAQ->SAM converter is implemented." However, I could not find anywhere that referenced this conversion utility. Is there software to perform this conversion?

        Comment

        • lh3
          Senior Member
          • Feb 2008
          • 686

          #5
          To lparsons:

          After you compile samtools with "make", you will find "maq2sam-short" and "maq2sam-long" in the "misc/" directory. There is also a script "export2sam.pl" that converts Illumina's export to SAM. I have not thoroughly tested this script on all export files, though.

          Comment

          • corthay
            Member
            • Oct 2008
            • 25

            #6
            I downloaded samtools-0.1.1 but could not find "wgsim" or "wgsim_eval.pl" programs which are noted in bwa-0.3.0 documentation.
            How can I get these programs ?

            Comment

            • lh3
              Senior Member
              • Feb 2008
              • 686

              #7
              To corthay:

              You are quick. I am planning a new bwa release as I realized that I could improve it a little without much work (PS: the new version is released now). Wgsim, wgsim_eval.pl and converters for soap and bowtie are available from SVN only:

              svn co https://samtools.svn.sourceforge.net...s/dev/samtools samtools
              Last edited by lh3; 01-06-2009, 07:34 AM.

              Comment

              • myrna
                Member
                • Feb 2008
                • 44

                #8
                indelpe vs samtools indels

                Hi Heng Li.
                Could you comment on how the indel detection works in SAM pileups vs MAQ indelpe? I am seeing many more indels in my SAM pileup generated from a MAQ alignment (as compared to the output from indelpe). Is there a good filtering strategy for these?

                Thanks,

                Ryan

                Comment

                • lh3
                  Senior Member
                  • Feb 2008
                  • 686

                  #9
                  I am planning to release samtools-0.1.2 which fixed some bugs in the old version and added new features. For now you can check out source codes from SVN. It should be quite close to 0.1.2.

                  The new version comes with a Bayesian indel caller, although it is just a prototype at present. The strength of the samtools' caller is that it makes use of reads mapped without indel. Using this information helps to reduce false negatives. In addition, the new caller gives genotype rather than just saying there is an indel. You cannot easily tell from maq's indelpe if the indel is a heterozygote or a homozygote. With the new caller, the filters could be: a) the indel score; b) two indels should not be too close to each other.

                  Comment

                  • kon104
                    Junior Member
                    • Dec 2008
                    • 2

                    #10
                    What's the difference between maq2sam-short and -long?

                    Also, short seems to segfault on 64-bit versions of Red Hat and Ubuntu... Am I missing something?

                    Comment

                    • lh3
                      Senior Member
                      • Feb 2008
                      • 686

                      #11
                      maq2sam-short is for the .map files generated by maq-0.6.x, while maq2sam-long for files generated by maq-0.7.x. Sorry for the confusion, and one of the aims of SAM is to avoid such confusions in future.

                      Comment

                      • webbrewer
                        Junior Member
                        • Aug 2008
                        • 8

                        #12
                        samtools index seg fault

                        I am using the most current version of samtools from svn.
                        I successfully ran the "samtools import" command on my .sam file from bwa.
                        When I then run "samtools index" on the .bam file, it seg faults.
                        Let me know if you need more information to determine what is causing this.
                        Last edited by webbrewer; 03-05-2009, 08:28 PM.

                        Comment

                        • myrna
                          Member
                          • Feb 2008
                          • 44

                          #13
                          samtools import

                          samtools import is for making a .bam file from a .sam file. Why are you attempting to run this command on a .bam file?

                          Comment

                          • webbrewer
                            Junior Member
                            • Aug 2008
                            • 8

                            #14
                            Originally posted by myrna View Post
                            samtools import is for making a .bam file from a .sam file. Why are you attempting to run this command on a .bam file?
                            Oops. I meant to say that "samtools index" seg faults.

                            Comment

                            • myrna
                              Member
                              • Feb 2008
                              • 44

                              #15
                              samtools index

                              Have you tried samtools view foo.bam?

                              If you get the sam alignments back, then all should be well. I believe you get a warning if the .bam file is unsorted, but perhaps you should try this if you haven't already:

                              samtools sort foo.bam bar.sort

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...