Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by litali View Post
    I also attach a few more reads from the first lines of the sam format:
    Code:
    F01BJ5E01DBVMG  0       chrX    138807718       100  ...
    I see the flag in coloumn2 is always 0 or 16, though I have the positions of the mapping,,
    So, do you think it is a problem with the BAM file?
    Thank you alot!!!
    The flags are fine. I suspect one issue is simply the starting coordinate. If your first reads align at base 138807718 and it's been sorted then maybe the programs you are using are simply starting up by showing a blank region of the reference.

    I've found in some cases that it can actually be quite hard to find places with alignments by scrolling around if all you have is some pulldown library. You either need a way to skip ahead to places with alignments or know (by reports or reading the SAM file) the appropriate locations.

    I'm more intrigued by the error from picard about "Read name F01BJ5E01DP1XH, No M or N operator between pair of I operators in CIGAR". Could you quote this reading? It seems a strange alignment and I cannot see why it would occur that way. Then again there's nothing that explicitly states it as invalid although it defies point 2 of the recommended practice in the SAM spec.

    Certainly the example you gave had a portion with neighbouring I and D operators: 15M1D1I118M. I know this shows up bugs in some programs (samtools, maybe more), although personally I think it's fine and sometimes the right alignment just is that strange looking. Perhaps not with pairwise alignments, but certain these things crop up from multiple alignment.

    Comment


    • #32
      Originally posted by jkbonfield View Post
      I'm more intrigued by the error from picard about "Read name F01BJ5E01DP1XH, No M or N operator between pair of I operators in CIGAR". Could you quote this reading? It seems a strange alignment and I cannot see why it would occur that way. Then again there's nothing that explicitly states it as invalid although it defies point 2 of the recommended practice in the SAM spec.

      Certainly the example you gave had a portion with neighbouring I and D operators: 15M1D1I118M. I know this shows up bugs in some programs (samtools, maybe more), although personally I think it's fine and sometimes the right alignment just is that strange looking. Perhaps not with pairwise alignments, but certain these things crop up from multiple alignment.
      I've seen this kind of thing from other Newbler BAM files - it stems from the way they currently use deletion/inserts rather than substitutions.

      As a specific example, here is one "failing" read from my data (in Picard's lenient mode it is just a warning):

      SAM validation error: ERROR: Read name GD0J0NL04IXFMD, No M or N operator between pair of I operators in CIGAR

      Code:
      GD0J0NL04IXFMD  0       contig00018     1       100     1H140M2I1M2I1D1I1D2M1I23M1D21M1I8M      *       0       0       ACGGACTTTCCCAGCAGTCAGCATGGATCAGTCGCAGGCACTCAAGGAGACCGACGAACACCGTCAAATGCGACGAATTGCTTTCGTTGCGGTCGTTGTTTCAACGGTCGCTGTGATTGCATCGGTTGTCACCCTGCCGANGTTTTTCCTACAATTATGTTCAATCCTTCCATCGCATTTGATGGTCGAGACTAGATTACTG      FFFFFFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIHHHHIIHHHF??==FF88?CCCCCCCFFFFFFFFFFFFFFFFFFFFFF?:::DDBBAAAAADD888FFFFFFFFFFDDDFFFFFFFFFFFFFFFFFFFFFFFFF@@ACDDDD???A6668DDDDBBAACDDCFCAAABBBAAAA6654<=99=:!3,----35
      Breaking up the CIGAR string, 1H 140M 2I 1M 2I 1D 1I 1D 2M 1I 23M 1D 21M 1I 8M, it seems Picard unhappy with the 2I 1D 1I 1D. Keeping in mind SAM/BAM is essentially a collection of pairwise alignments, I would have expected something like 3I 2D or perhaps 1I 2M (or 1I 2X with the new style).

      (But as you say, this is still legal in the current wording of the specification - its another example of Picard being extra strict)
      Last edited by maubp; 09-22-2011, 01:08 AM. Reason: (adding note about spec)

      Comment


      • #33
        Thank you all for your help!
        jkbonfield, I got this error "No M or N operator between pair of I operators in CIGAR" in picard for all the reads..

        Comment


        • #34
          Originally posted by maubp View Post
          Breaking up the CIGAR string, 1H 140M 2I 1M 2I 1D 1I 1D 2M 1I 23M 1D 21M 1I 8M, it seems Picard unhappy with the 2I 1D 1I 1D. Keeping in mind SAM/BAM is essentially a collection of pairwise alignments, I would have expected something like 3I 2D or perhaps 1I 2M (or 1I 2X with the new style).

          (But as you say, this is still legal in the current wording of the specification - its another example of Picard being extra strict)
          So the Picard warning is incorrect - there aren't two I records next to each other, rather it's I and D being adjacent.

          Also while most aligners that output to SAM/BAM are indeed pairwise, the actual file itself represents multiple alignments at each point so it is natural to assume we may one day be outputting SAM from a genuine multiple alignment tool, or in this case from a tidied up assembly. In such cases we expect such oddities in CIGAR. Indeed they often represent a better alignment.

          Programs complaining about such things (or worse processing them incorrectly and completely losing or changing bases) has been a bug-bear of mine for some time. (Ultimately it's why I ditched samtools and wrote my own code for importing SAM/BAM into Gap5.)

          Comment


          • #35
            Originally posted by jkbonfield View Post
            So the Picard warning is incorrect - there aren't two I records next to each other, rather it's I and D being adjacent.
            No, it I think it is correct (*) but not clearly worded.

            Looking at 1H 140M 2I 1M 2I 1D 1I 1D 2M 1I 23M 1D 21M 1I 8M, the bit in bold is between two M operators, and includes two I operators (and two D operators). Thus both "No M or N operator between pair of I operators in CIGAR" applies, and also the other message I've seen from Picard in this context, "No M or N operator between pair of D operators in CIGAR".

            (*) As said earlier, this rule is one of Picard's inventions not in the SAM spec.

            Comment


            • #36
              That also means it will disallow use of P for padding within an insertion. Eg

              Code:
              Ref: ATGC---GAG
              Seq: ATGC---GAG     CIGAR 7M
              Seq: ATGCACTGAG     CIGAR 4M 3I 3M
              Seq: ATGCA-TGAG     CIGAR 4M 1I 1P 1I 3M
              Here the correct multiple alignment is achieved using P. So far though use of this has been limited.

              Even without P though there are alignments which are best mixing I and D. Eg:

              Code:
              Ref: ACGTTT-AAA-CCCACGT
              Seq: ACGTTTTAAACCCCACGT   CIGAR 6M 1I 3M 1I 6M
              Seq: ACGTTTT---CCCCACGT   CIGAR 6M 1I 3D 1I 6M
              This topic has come up recently on the samtools-dev mailing list too, related to using SAM/BAM for denovo assemblies (and so it's very relevant to the 454 output case).

              Comment


              • #37
                Originally posted by jkbonfield View Post
                This topic has come up recently on the samtools-dev mailing list too, related to using SAM/BAM for denovo assemblies (and so it's very relevant to the 454 output case).
                Yes, we should probably continue this discussion there. Start of thread is here:


                Peter

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                12 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                68 views
                0 likes
                Last Post seqadmin  
                Working...
                X