Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA generating incorrect CIGAR string?

    I algined a single end sample against HG18 reference using the latest BWA. Then I tried to convert the sam file to bam file using samtools,

    I got the following error,
    Parse error at line 119: sequence and quality are inconsistent

    and line 119 looks like
    HWI-EAS266_0011:1:1:6:1607#0 16 12 2662146 37 1S35M * 0 0 GGGAACAAATGTGGGGAGGCAGAGGCAGGTCCCTGA $ $$""####$""$#$"###

    I searched around, seen people talking about this, but no real solution.

    Anyone have any idea?

  • #2
    I have had this error a couple of times as well and found that if I reran sampe/samse and tried to convert again then it was fine

    Comment


    • #3
      CIGAR field only contain *|\d+M

      Hi,

      I noticed that the CIGAR string in my bwa mapping output file (paired-end illumina reads against a reference sequence file) contain either * or "\d+M" like "35M" when using -s (-s disable Smith-Waterman for the unmapped mate) for better speed. I thought it only affect unmapped mate. Is it true that only "\d+M" is reported when "-s" option is used for "bwa sampe"? Does it only report matches that cover the whole read length and ignore those with partial matches when using such option?


      Thanks!

      Bob
      Last edited by nntao; 04-30-2011, 07:20 AM. Reason: More testing answered partially own question

      Comment


      • #4
        I have something to share with:
        look at the followings generated by BWA and then Samtools from paired ends, the five reads are identical, but why they mapped on different location and why the cigar are "*" ? (ignor the "N"s, the reference sequence includes a identical region to the read's sequence)



        HWI-ST565_0121:4:2207:1671:63901#ATCACG 181 segment1 19 0 * = 19 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNACTGATAGCCAGACAGCCATCAAAAGGATTCGTTTGGAGGAATCAAAATAAAATCACTAAAAATGA BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB`bbcccccddb_`eeeeegbgggihiihghffiihgfhiiihhiihhfghhgcbhfhfiiiihhhg
        HWI-ST565_0121:4:1108:5261:43887#ATCACG 117 segment1 21 0 * = 21 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNACTGATAGCCAGACAGCCATCAAAAGGATTCGTTTGGAGGAATCAAAATAAAATCACTAAAAATGA BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBcdccccccdddddeeeeeggggghdhiiiiiiiihiihiiihihiiiihiiihgfbihiiifgde^
        HWI-ST565_0121:4:2106:9301:25723#ATCACG 181 segment1 22 0 * = 22 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNACTGATAGCCAGACAGCCATCAAAAGGATTCGTTTGGAGGAATCAAAATAAAATCACTAAAAATGA BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBcdbcccccdbbdbeeeeegggggiiihiiiihhghiiihhiiiiiiiiiiihhhihiiiiifggdX
        HWI-ST565_0121:4:1103:2424:11895#ATCACG 181 segment1 24 0 * = 24 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNACTGATAGCCAGACAGCCATCAAAAGGATTCGTTTGGAGGAATCAAAATAAAATCACTAAAAATGA BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBcdccbb^bbbb__ebaaeggfeggeiiihhhhiiiggihfgcgihiihhehihfebhhiiihggb^
        HWI-ST565_0121:4:2106:3549:50867#ATCACG 117 segment1 25 0 * = 25 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNACTGATAGCCAGACAGCCATCAAAAGGATTCGTTTGGAGGAATCAAAATAAAATCACTAAAAATGA BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB_cb^ZZZbbb]_Za_a]bbgdd^__bcfdghhhffhhhhfccgfcbhfffg`fcaShgagdffbbP

        Comment


        • #5
          Originally posted by xchen5 View Post
          I have something to share with:
          look at the followings generated by BWA and then Samtools from paired ends, the five reads are identical, but why they mapped on different location and why the cigar are "*" ? (ignor the "N"s, the reference sequence includes a identical region to the read's sequence)
          All five reads have the 4 flagged. (181 = 128+32+16+4+1, 117 = 64+32+16+4+1))They are really unmapped, no matter what the rest of the line looks like. Sam specs call for unmapped reads to be given the mapping position of their partner, so the two reads will sort together.

          Comment


          • #6
            Hi I have bwa-0.5.9/solid2fastq.pl version. I have two files SolF3.csfasta & SolF3_QV.qual which i want to convert in 'fastq'. After running the command as :

            perl solid2fastq.pl Sol SolTest

            I am getting the file SolTest.single.fastq.gz but with no reads in file after i unzip it, whereas i have good and equivalent amount of reads in my input file.Can you explain me the reason if you have any idea.


            Strange to say the same command is working fine with another set of file....
            Last edited by Brajbio; 09-15-2011, 06:09 AM.

            Comment


            • #7
              Originally posted by swbarnes2 View Post
              All five reads have the 4 flagged. (181 = 128+32+16+4+1, 117 = 64+32+16+4+1))They are really unmapped, no matter what the rest of the line looks like. Sam specs call for unmapped reads to be given the mapping position of their partner, so the two reads will sort together.
              thanks swbarners

              but the other question is that those identical reads, (if the "N"s are removed), have identical region in the reference, then why they become unmapped reads?

              thanks in advance for any useful hints

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              25 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              24 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X