Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • qtrinh
    Member
    • May 2008
    • 20

    #31
    Hi Heng,
    I did do "samtools sort" first. Here is what I did:

    samtools import homo_sapiens.fasta.fai S1.sam S1.bam

    samtools sort S1.bam S1_sorted

    samtools pileup -f homo_sapiens.fasta -c S1_sorted.bam

    Q

    Comment

    • thondeboer
      Member
      • Jan 2009
      • 24

      #32
      Is there a running list somewhere that shows which programs produce (or take as input) SAM/BAM format? It would help us make a final decision in what format we should produce our mapping results for the mapping we do here at Complete Genomics.

      The negative gap structure will probably be something that is dealt with, with special tags (GS, GQ and GC) so software that wants to make optimal use of all the data should use those tags, but most of the software should work "out of the box" if they support SAM/BAM...
      Thon
      __________________________________
      Thon de Boer, Ph.D.
      Director of Product Management, Software
      Strand Life Sciences
      548 Market Street, Suite 82804
      San Francisco, CA 94104, USA
      [email protected]
      www.strandls.com
      Pioneers in Discovery Research Informatics
      _______________________________________

      Comment

      • apfejes
        Senior Member
        • Feb 2008
        • 236

        #33
        On the subject of tools that use SAMtools, I'm very interested in adding in support for my project, based in java. I'm aware that there exists java based tools for reading/writing in this format, but I'm unable to find any documentation on the software. Has anyone come across any information on how to use the Java SAM tools code?
        The more you know, the more you know you don't know. —Aristotle

        Comment

        • lh3
          Senior Member
          • Feb 2008
          • 686

          #34
          To thondeboer: currently BWA natively generates alignments in the SAM format. BFAST also generates SAM. We also provide converters for SOAP, Bowtie, Export, novoalign and even blast. However, most of these converters are incomplete in that sometimes they cannot convert every information due to the lack of documentation especially for short indels. So far as I know, all aligners generate its own format. SAM is probably the first effort in unifying the alignment format, in particular for alignment for the new sequencing data.

          To apfejes: I am not able to comment much on the Java implemention. I know the I/O part is complete and actually does more nice things than the C version of samtools. you may send an email to the mailing list to ask for the documentations.

          Comment

          • TylerBackman
            Member
            • Oct 2008
            • 13

            #35
            This is an excellent, and very exciting idea. With a standard alignment format (SAM), and a standard raw read format (Phred/Sanger fastq) we can drastically reduce the time most of us spend writing our own file format parsers and converters, and eliminate a common source of error in data analysis (incorrect parsing).

            It's great to see the bioinformatics community coming together in this way.

            To anyone developing alignment tools: Please include support for this format in future versions of your software!
            @1
            NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
            +
            """"""""""""""""""""""""""""""""""""

            Comment

            • gcrdb
              Junior Member
              • Jan 2009
              • 9

              #36
              Can SAMtools convert SAM back to MAQ?

              Hi lh3,
              I am glad that SAMtools can do maq2sam , but will it be easy to do sam2maq?
              The reason I ask is that I want to MAQ to generate SNP and INDELs. BWA can not do that (yet).

              thanks
              g

              Comment

              • lh3
                Senior Member
                • Feb 2008
                • 686

                #37
                SAMtools has a SNP caller, based on the same code of MAQ. See this page for more information: http://samtools.sourceforge.net/cns0.shtml. What is missing in SAMtools is a SNP filtration script like "maq.pl SNPfilter", but it is easy to write your own at the moment.

                SAMtools' indel caller uses a different algorithm. It outperforms MAQ.

                Comment

                • gcrdb
                  Junior Member
                  • Jan 2009
                  • 9

                  #38
                  lh3,
                  Actually, I tried SAMtools before but somehow "pileup" it's not outputing anything so I am thinking go back to use MAQ.
                  Did I run the program correctly? (use the example files which come with samtools package)
                  examples> ../samtools pileup -f ex1.fa ex1.sam
                  --> return nothing
                  examples> ../samtools pileup ex1.sam
                  --> return nothing

                  thanks again!
                  g

                  Comment

                  • lh3
                    Senior Member
                    • Feb 2008
                    • 686

                    #39
                    should be: ../samtools pileup -t ex1.fa.fai ex1.sam or ../samtools pileup ex1.bam. I have added a Makefile.

                    Note that there is a companion format called BAM which is the binary representation of SAM. Most of samtools commands work on BAM only. I know having two formats is a bit confusing, but this is necessary for faster parsing.

                    Comment

                    • gcrdb
                      Junior Member
                      • Jan 2009
                      • 9

                      #40
                      lh3,
                      Thanks for quick response, pileup is working now!
                      Here is some questions about pileup format when I look at them at first time:
                      (1) what is a "*" in read bases , which is not documented in "http://samtools.sourceforge.net/pileup.shtml".
                      (2) Is it okay for a base in the same position pile-up twice ? (chr1 1949878 occur twice in my first pileup output)
                      thanks,
                      Below is the piece pile-up output I found the problem:
                      chr1 1949878 A A 142 0 60 55 C$....,,,...,,.C..,.,,..,...,,,.,.,.+1C,,,,..,.,........,^F
                      ,^], &5,2IIII5I<II+%5=I+II(II8@CII3*I0I+IIII,I$@IIAAIIDI@I*5
                      chr1 1949878 * */+C 38 38 * +C 13 4 30 8
                      chr1 1949879 A A 150 0 60 54 ....,G,...,,.-1G..-1G.,.,,..,...,,,.,.,.,,,,..,.,........,,
                      , II*IIII;I.II&(&1I3II%9III:II7&I&@&IIII5I"6IIIIG.I33I$6
                      chr1 1949879 * -G/* 481 481 -G * 6 9 30 9
                      chr1 1949880 G G 25 0 60 55 .$A$..,A,..A,,*.*A,A,,A.,A..,,,.,A,.,,,,.+1A.,.,........A,,
                      ^], +(,.III)8-II8%D.I0II#,I@5III.$I+I,IIII$I2EIIIIIIIIIE&?/
                      chr1 1949880 * */+A 350 350 * +A 24 3 19 9
                      chr1 1949881 A A 162 0 60 53 ..,,,...,,....,.,,..,...,,,.,.,.,,,,..,.,........G,,, 6DI
                      II3I$II81D)I%II+.III'III$I2I+IIII+II<II46IIAHI.2IB

                      Comment

                      • lh3
                        Senior Member
                        • Feb 2008
                        • 686

                        #41
                        You are invoking pileup with "-c" and you should also read this page:



                        A read base "*" means a deletion. The second line at "chr1 1949878" shows indel call. In principle this is not part of pileup.

                        Comment

                        • nilshomer
                          Nils Homer
                          • Nov 2008
                          • 1283

                          #42
                          I have a working patch to view ABI SOLiD color space using samtools (http://samtools.sourceforge.net/) text viewer. For example, some of the features using output from BFAST (in SAM format), which includes the "CS" and "CQ" tags, are:

                          - option to display colors instead of nucleotides.
                          - option to color bases/colors based on color. This is similar if you want to color bases based on the given base.
                          - option to color bases/colors based on color quality.
                          - the "." (dot) option when displaying color space will only show those colors that were corrected during alignment (i.e. the color errors).
                          - option to remove all insertions in the current display (in some regions, spurious insertions can cause a headache when viewing that region).

                          PM me and I can supply you with a source version.

                          Comment

                          • emucaki
                            Member
                            • Apr 2009
                            • 12

                            #43
                            Hi, I'm a novice geneticist who is interested in using the 1000 Genome project data available on NCBI and I can't quite figure how to obtain sequence information from the BAM file, SAMTools' website is little help. I am wondering if anyone knows a good place to get information for this kind of work.

                            (Offtopic, anyone know why the 1000 Genome project has a log-in but no register option?)

                            Comment

                            • lh3
                              Senior Member
                              • Feb 2008
                              • 686

                              #44
                              The first thing you may want to try is:

                              samtools view -h aln.bam | less -S

                              Comment

                              • mhc
                                Junior Member
                                • Jun 2008
                                • 2

                                #45
                                Understanding samtools pileup output

                                Hi,

                                I'm having trouble trying to parse the samtools output. In the example below, at position 60, I have 108 reads. As I understand it, 8 reads terminate (since there are 8 '$'s), and there are 2 new reads (marked by the '^') on the next line.
                                So the next line - line 61 - should have 108-8+2=102 reads.
                                Instead, it has 99.
                                What am I missing here?
                                This is the 40th line of input with 40bp reads, and this is the first instance where '$' appears. Other lines seem to work out fine.


                                seq1 60 a 108 .$,$,$,$,$g$....ggt*.G,g,,.,,+2tt,+3agcG.,+4atgc,+4ttgcg.c,c.$.,,,..$,+4aggat+6ccgttt,..,tt,.,,..,.
                                +7CTGCCTG,.,.,,.,,..,.,..,.,.,,.,,..,,.,.,.,.,.,,.,.,.,.,,^].^],^], CBB=ABBA>BBCBB7BB<BBBBCBBBBBCBB@BB@BBCB:CBBAACC>ABCBBBBBBBBBCC9BBABB@B
                                B<BBB7CBBBBABBBBBCBBBBB@BB;BBBC@CBCBCB
                                seq1 61 g 99 A$.$.$.$t$c$t$,..,,a,A,,$..,.a,,.,+4gcag,,.,,$,A.,,+7ctgtttg,t$A,a..,.,.,.,,.,,..,.,..,.,.,,.,,..,,
                                .,.,.,.,.,,.,.,.,.,,.,,^].^]t BCAABB@B1BB<BBBBBBBBBBBBBACBB@BBABBBBBBBBBCBBCCBBBBBBBBCCBBB6BBBBBBBBBBBABCBBBBBBBBBBBBCAC?BCCBCBB@
                                Last edited by mhc; 05-01-2009, 10:30 AM.

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, Today, 08:59 AM
                                0 responses
                                7 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 12:03 PM
                                0 responses
                                21 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 11:40 AM
                                0 responses
                                14 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                29 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...