Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • zee
    NGS specialist
    • Apr 2008
    • 249

    #46
    I was wondering if I could get some help on the SAM specification.

    I get the error:

    samtools: bam_lpileup.c:115: tview_func: Assertion `l == tv->n_pre' failed
    When I page the viewer to my reads containing indels. A sample of my data is

    suis_indels_30749_30917_a0b7e 147 Streptococcus_suis 30782 150 60M7D15M * 0 0 CAGATTGTTGCGGTTTATAAGGATCAGGCGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCA YX[QRQZETFQLZT[FQMX]_``\SZ```_`^``_[``^XZ```^``__``X]T[```_^`_````````````` NM:i:7 MD:Z:60M7D15M
    suis_indels_30862_31024_a3384 131 Streptococcus_suis 30801 150 41M7D34M * 0 0 AGGATCAGGCGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATTTCATCTTT `````H^_^^_```Z]_``````]``^]``_]```]]X````]_]^````]Z_URYT]]\\]_X``_^][U^``\ NM:i:7 MD:Z:41M7D34M
    suis_indels_30874_31073_a388e 131 Streptococcus_suis 30813 150 29M7D46M * 0 0 CGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATTTCATCTTTGATTTTTGAAAA ```^`\[`````````````_X][U]RIST\`XDDLIHGK]J\XMFVER^]`````[POHLGTU]^_Z``]]]KV NM:i:7 MD:Z:29M7D46M
    suis_indels_30890_31082_a9685 131 Streptococcus_suis 30829 150 13M7D62M * 0 0 TAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATTTCATCTTTGATTTTTGAAAAATTGAATGAAGCTGGT `]__`ZOY^``ZUV``]TTY`_``````^LU```_`_``XV[````^\`\\^UU[`XHOPKITNEUNIRRNDWW` NM:i:7 MD:Z:13M7D62M
    suis_indels_16281_16438_b9354 131 Streptococcus_suis 16224 150 6M1D69M * 0 0 AAATGAGATATTTACGAATTATCTCACGCAATACAATCACCGGGATTGATAAACGATGAATTATTAATGTTGAG`[``````_`_I_]O[__`__Y__VFMY\````^Z```^Z\U``_``````]S`\Y^[_]^```^]`_\Y^ZQXH NM:i:2 MD:Z:6M1D69M
    suis_indels_2007371_2007505_bf820 147 Streptococcus_suis 2007364 150 71M3D4M * 0 0 GTTCTGACTTTTATTCACATCTTGTGGAAAACTTTTCTTAACAGTGTGGATTTTAAAAATTATCTGTGGAATTTT MRT^\_]RZ__`]]YP[^`ZT`T]VH``^]^TS]]``]^^````````_W_`_\\```_````````````\^`^ NM:i:3 MD:Z:71M3D4M
    suis_indels_30783_30934_c09ed 147 Streptococcus_suis 30799 150 43M7D32M * 0 0 TAAGGATCAGGCGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATTTCATCT U]PRKMJTRKIK___`Y`\]`\^```ZXVYMZX]`_```Z``_``WWQ]_````_]``]````````````Z_W` NM:i:7 MD:Z:43M7D32M
    suis_indels_16171_16358_c4122 147 Streptococcus_suis 16227 150 3M1D72M * 0 0 TGAGATATTTACGAATTATCTCACGCAATACAATCACCGGGATTGATAAACGATGAATTATTAATGTTGAGAGGNFMU``_`_`NTPV]^__^]YYQLX^ZK[^[[[^^``]RUX``Z]Y\Z````````^``_`]U\`^_```]```` NM:i:1 MD:Z:3M1D72M
    suis_indels_16164_16353_ce374 147 Streptococcus_suis 16222 150 8M1D67M * 0 0 CCAAATGAGATATTTACGAATTATCTCACGCAATACAATCACCGGGATTGATAAACGATGAATTATTAATGTTG[]]^[]\X[^`\ZZX`^_^\_``````\`_JL[\R]X```_`````_``_Y[]`````[\````Z```````^_H NM:i:2 MD:Z:8M1D67M
    suis_indels_30842_31012_d14fb 131 Streptococcus_suis 30781 150 61M7D14M * 0 0 CCAGATTGTTGCGGTTTATAAGGATCAGGCGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTC ```_`_`````]`````_```^_VUW^```_^_ZXZT^`````__]YNXTEKV`_YXNRW\\REHDDLSGMFURD NM:i:7 MD:Z:61M7D14M
    suis_indels_30721_30932_d9806 147 Streptococcus_suis 30797 150 45M7D30M * 0 0 TATAAGGATCAGGCGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATTTCAT PPJ[TFRJ``````TVU`__`^^`_]_`````GFPZTWY\VX`_````__``````^`````````````````` NM:i:7 MD:Z:45M7D30M
    suis_indels_30785_30945_e17b9 147 Streptococcus_suis 30810 150 32M7D43M * 0 0 CGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATTTCATCTTTGATTTTTGA TFWOOKDUNQZ`^RU^^`STTSVW^_``````^`_V``_``````````\[[`XR_```_W\]O_``__`_T^`` NM:i:7 MD:Z:32M7D43M
    suis_indels_30737_30928_e6336 147 Streptococcus_suis 30793 150 49M7D26M * 0 0 GGTTTATAAGGATCAGGCGACGGCTTTTAATGGTGGTAAGAAAGAACAGGCAAGGGCCGGCTCAATAATCTGATT LMRUPOTQPY[V]]^``]TS\^`\^`UXX^S^```_^Z``\\``_```^Y^Q]```````V\]^``^``K^```` NM:i:7 MD:Z:49M7D26M
    And I'm using novoalign format where I've modified the novo2sam.pl script to produce a 'correct' CIGAR format. Most samtools operation work except for the tview

    Comment

    • nilshomer
      Nils Homer
      • Nov 2008
      • 1283

      #47
      I have also had a lot of problems when viewing locations with many insertions and deletions. I have been trying to find the bug in the source code, but so far no luck.

      Nils

      Comment

      • lh3
        Senior Member
        • Feb 2008
        • 686

        #48
        Please check out the latest version (0.1.3-9 r256) from SVN. I fixed a kind of typo but I am not sure if this also fixes this assertion failure. It is not easy to see this. Please let me know if this works. Many thanks.
        Last edited by lh3; 05-04-2009, 10:22 AM.

        Comment

        • gcrdb
          Junior Member
          • Jan 2009
          • 9

          #49
          Hi,
          The bwa aln is fast (few hours),
          but somehow bwa samse(or sampe) is slow.
          This program have been running for almost a week:
          bwa samse -n 255 mouse_genome.fa data.sai data.fq >data.sam

          My data.fq has about 80 million reads. Is that to much reads for samse(pe)? Should I split them and merge?

          Comment

          • lh3
            Senior Member
            • Feb 2008
            • 686

            #50
            Bwa and samtools are two separated packages. I will start a new thread about questions related to bwa.

            I answer your question here. I guess you are using reads shorter than 30bp. BWT based algorithms are generally slow for enumerating multiple hits and very short reads are more likely to have many hits. Therefore, BWA's 'sampe' and 'samse -n' are inefficient for these short reads. Probably non-BWT algorithms are better for such alignment. If you really want to align very short reads, you may consider use smaller -n with samse and smaller -o (say 200) with sampe. Using smaller -o will make pairing less effective, though.

            The memory of bwa is almost independent of the number of input reads and the speed is linear in that. You can separate input into smaller batches.
            Last edited by lh3; 05-04-2009, 11:03 AM.

            Comment

            • zee
              NGS specialist
              • Apr 2008
              • 249

              #51
              Originally posted by lh3 View Post
              Please check out the latest version (0.1.3-9 r256) from SVN. I fixed a kind of typo but I am not sure if this also fixes this assertion failure. It is not easy to see this. Please let me know if this works. Many thanks.
              I have checked out the 'latest' version from SVN:

              Program: samtools (Tools for alignments in the SAM format)
              Version: 0.1.3-16 (r267)


              This seems to be a later version, would it contain the fix?

              Comment

              • jkbonfield
                Senior Member
                • Jul 2008
                • 146

                #52
                Feature request

                I find it cumbersome that samtools import *requires* an external index file to tell it what reference sequences are present. This is even true if there is a header in the .sam file.

                For my own sanity I knocked up this simple, albeit slow, perl script:

                Code:
                #!/usr/bin/perl -w
                
                # Reads a sam file and generates an index suitable for "samtools import".
                
                use strict;
                
                #--- Build index
                my %ref; #key=name, value=length
                while (<>) {
                    my @F=split(" ", $_);
                    my $rname = $F[2];
                
                    next if $rname eq "*";
                    my $end = $F[3] + length($F[9]);
                
                    $ref{$rname} = $end if (!exists($ref{$rname}) || $ref{$rname} < $end);
                }
                
                #--- Output input to stdout
                foreach (sort keys %ref) {
                    print "$_\t$ref{$_}\n";
                }
                It generates a file suitable for passing as the 1st argument after "samtools import". Admittedly it doesn't know the true ending of the reference sequence, but for de-novo assemblies the reference is just the consensus anyway so it's faked.

                Note that the above is an estimate. It doesn't consider the fact that the sequences may have CIGAR string possibly altering the mapped coordinates of the reference, but it's a start at least.

                Comment

                • lh3
                  Senior Member
                  • Feb 2008
                  • 686

                  #53
                  The latest SVN parses the header @SQ lines. You need to use the "view" command to do this.

                  Comment

                  • lh3
                    Senior Member
                    • Feb 2008
                    • 686

                    #54
                    On behalf of the Java API developers: the Java APIs and command-line tools are formally released here (also linked from http://samtools.sourceforge.net):



                    The Java implementation is more flexible than SAMtools-C on sorting/viewing/merging/rmdup. Developers may also find Java APIs are easier to use than the C APIs.

                    By the way, Lincoln Stein has implemented initial Perl bindings for the SAM/BAM format, based on the C APIs. This Perl module will become part of BioPerl as well as GBrowse.

                    Comment

                    • zee
                      NGS specialist
                      • Apr 2008
                      • 249

                      #55
                      Is there a way to convert a SAM consensus output (using -c option for pileup) to the old maq-style .cns consensus?

                      I have some maq-based pipelines I would like to use on my BWA results.

                      Comment

                      • pparg
                        Member
                        • Aug 2008
                        • 19

                        #56
                        I read through the latest SAM Format Specification on optional fields. But I still don’t understand some cases in the .sam file I got. For instances:
                        XT:A:R NM:i:0 X0:i:30406 XM:i:0 XO:i:0 XG:i:0
                        XT:A:R NM:i:1 X0:i:2 X1:i:1 XM:i:1 XO:i:0 XG:i:0 MD:Z:3G32
                        I just can’t figure out what do those X? fileds mean? I know these are are reserved for local use and re not be formally defined. But what is the point here? What are the types A, i, Z etc for?
                        Any hint/ideas would be appreciated.

                        Comment

                        • lh3
                          Senior Member
                          • Feb 2008
                          • 686

                          #57
                          X? tags are for aligner specific information. You need to read the documentation of the program that generates these tags.

                          Comment

                          • jess
                            Junior Member
                            • May 2009
                            • 7

                            #58
                            Originally posted by zee View Post
                            Is there a way to convert a SAM consensus output (using -c option for pileup) to the old maq-style .cns consensus?

                            I have some maq-based pipelines I would like to use on my BWA results.
                            Yes, I am looking for the same. Heng li cud u help out plz?

                            Comment

                            • corthay
                              Member
                              • Oct 2008
                              • 25

                              #59
                              I would like to generate bam file from blast result of EST.

                              Firstly, I made sam file from blast result using blsat2sam.pl in samtools packages.
                              Secondly, I have tried to convert from sam file into bam file using "samtools import" but I've got following error.

                              Parse error at 1: CIGAR and sequence length are inconsistent.

                              seq1 16 chr20 18289430 255 23H426M760H * 0 0 * * AS:i:844 EV:Z:0.0

                              Could you give me any idea please ?

                              Comment

                              • lh3
                                Senior Member
                                • Feb 2008
                                • 686

                                #60
                                @corthay

                                You can convert with "blast2sam.pl -s" to save the sequence in SAM. Currently, samtools cannot parse SAM without sequence, although the specification allows this.

                                @jess and zee

                                Sorry. I am afraid that you cannot easily convert pileup output to maq-cns. You can call SNPs/indels directly from samtools and in SVN samtools.pl now filters SNPs/indels in the maq-like way. .cns has been replaced with glf, although they contain different information.

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, Today, 08:59 AM
                                0 responses
                                7 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 12:03 PM
                                0 responses
                                21 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 11:40 AM
                                0 responses
                                14 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                29 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...