Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • foxyg
    Member
    • May 2010
    • 54

    BWA generating incorrect CIGAR string?

    I algined a single end sample against HG18 reference using the latest BWA. Then I tried to convert the sam file to bam file using samtools,

    I got the following error,
    Parse error at line 119: sequence and quality are inconsistent

    and line 119 looks like
    HWI-EAS266_0011:1:1:6:1607#0 16 12 2662146 37 1S35M * 0 0 GGGAACAAATGTGGGGAGGCAGAGGCAGGTCCCTGA $ $$""####$""$#$"###

    I searched around, seen people talking about this, but no real solution.

    Anyone have any idea?
  • flipwell
    Member
    • Feb 2011
    • 14

    #2
    I have had this error a couple of times as well and found that if I reran sampe/samse and tried to convert again then it was fine

    Comment

    • nntao
      Junior Member
      • Jan 2010
      • 4

      #3
      CIGAR field only contain *|\d+M

      Hi,

      I noticed that the CIGAR string in my bwa mapping output file (paired-end illumina reads against a reference sequence file) contain either * or "\d+M" like "35M" when using -s (-s disable Smith-Waterman for the unmapped mate) for better speed. I thought it only affect unmapped mate. Is it true that only "\d+M" is reported when "-s" option is used for "bwa sampe"? Does it only report matches that cover the whole read length and ignore those with partial matches when using such option?


      Thanks!

      Bob
      Last edited by nntao; 04-30-2011, 07:20 AM. Reason: More testing answered partially own question

      Comment

      • xchen5
        Junior Member
        • Mar 2010
        • 3

        #4
        I have something to share with:
        look at the followings generated by BWA and then Samtools from paired ends, the five reads are identical, but why they mapped on different location and why the cigar are "*" ? (ignor the "N"s, the reference sequence includes a identical region to the read's sequence)



        HWI-ST565_0121:4:2207:1671:63901#ATCACG 181 segment1 19 0 * = 19 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNACTGATAGCCAGACAGCCATCAAAAGGATTCGTTTGGAGGAATCAAAATAAAATCACTAAAAATGA BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB`bbcccccddb_`eeeeegbgggihiihghffiihgfhiiihhiihhfghhgcbhfhfiiiihhhg
        HWI-ST565_0121:4:1108:5261:43887#ATCACG 117 segment1 21 0 * = 21 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNACTGATAGCCAGACAGCCATCAAAAGGATTCGTTTGGAGGAATCAAAATAAAATCACTAAAAATGA BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBcdccccccdddddeeeeeggggghdhiiiiiiiihiihiiihihiiiihiiihgfbihiiifgde^
        HWI-ST565_0121:4:2106:9301:25723#ATCACG 181 segment1 22 0 * = 22 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNACTGATAGCCAGACAGCCATCAAAAGGATTCGTTTGGAGGAATCAAAATAAAATCACTAAAAATGA BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBcdbcccccdbbdbeeeeegggggiiihiiiihhghiiihhiiiiiiiiiiihhhihiiiiifggdX
        HWI-ST565_0121:4:1103:2424:11895#ATCACG 181 segment1 24 0 * = 24 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNACTGATAGCCAGACAGCCATCAAAAGGATTCGTTTGGAGGAATCAAAATAAAATCACTAAAAATGA BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBcdccbb^bbbb__ebaaeggfeggeiiihhhhiiiggihfgcgihiihhehihfebhhiiihggb^
        HWI-ST565_0121:4:2106:3549:50867#ATCACG 117 segment1 25 0 * = 25 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNACTGATAGCCAGACAGCCATCAAAAGGATTCGTTTGGAGGAATCAAAATAAAATCACTAAAAATGA BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB_cb^ZZZbbb]_Za_a]bbgdd^__bcfdghhhffhhhhfccgfcbhfffg`fcaShgagdffbbP

        Comment

        • swbarnes2
          Senior Member
          • May 2008
          • 910

          #5
          Originally posted by xchen5 View Post
          I have something to share with:
          look at the followings generated by BWA and then Samtools from paired ends, the five reads are identical, but why they mapped on different location and why the cigar are "*" ? (ignor the "N"s, the reference sequence includes a identical region to the read's sequence)
          All five reads have the 4 flagged. (181 = 128+32+16+4+1, 117 = 64+32+16+4+1))They are really unmapped, no matter what the rest of the line looks like. Sam specs call for unmapped reads to be given the mapping position of their partner, so the two reads will sort together.

          Comment

          • Brajbio
            Member
            • Jun 2010
            • 20

            #6
            Hi I have bwa-0.5.9/solid2fastq.pl version. I have two files SolF3.csfasta & SolF3_QV.qual which i want to convert in 'fastq'. After running the command as :

            perl solid2fastq.pl Sol SolTest

            I am getting the file SolTest.single.fastq.gz but with no reads in file after i unzip it, whereas i have good and equivalent amount of reads in my input file.Can you explain me the reason if you have any idea.


            Strange to say the same command is working fine with another set of file....
            Last edited by Brajbio; 09-15-2011, 06:09 AM.

            Comment

            • xchen5
              Junior Member
              • Mar 2010
              • 3

              #7
              Originally posted by swbarnes2 View Post
              All five reads have the 4 flagged. (181 = 128+32+16+4+1, 117 = 64+32+16+4+1))They are really unmapped, no matter what the rest of the line looks like. Sam specs call for unmapped reads to be given the mapping position of their partner, so the two reads will sort together.
              thanks swbarners

              but the other question is that those identical reads, (if the "N"s are removed), have identical region in the reference, then why they become unmapped reads?

              thanks in advance for any useful hints

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM
              • SEQadmin2
                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                by SEQadmin2


                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                Introduction

                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                05-22-2026, 06:42 AM
              • SEQadmin2
                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                by SEQadmin2

                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                05-06-2026, 09:04 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Today, 08:59 AM
              0 responses
              9 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              21 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 11:40 AM
              0 responses
              17 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 05-28-2026, 11:40 AM
              0 responses
              30 views
              0 reactions
              Last Post SEQadmin2  
              Working...