Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • radood
    Junior Member
    • Oct 2019
    • 5

    Wrong bwa alignment?

    Hi,
    Below is a supplementary alignment for one of my reads with a mapping quality score of 43.This should map to chr2: 29446797 in hg19

    03012p9SX:030B 2064 1 29446797 43 36M241H -1 -1 36 CCCCACGTGAACGAGGGAGGGAGGGAGGGTTGGGTG array('B', [32, 37, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 37, 41, 41, 41, 41, 41, 41, 41, 41, 37, 41, 41]) [('SA', 'chr2,42479618,+,238M39S,60,2;'), ('MD', '26A9'), ('RG', 'SAMPLE.sample23'), ('NM', 1), ('AS', 31), ('XS', 23), ('fm', '3_0')]

    Since the read is on the negative strand, below is the forward sequence:
    CACCCAACCCTCCCTCCCTCCCTCGTTCACGTGGGG

    However this sequence does not map to chr2: 29446797. I tried blasting but I think it is too short so I aligned it to the genomic sequence as you can see here
    Image Screen-Shot-2019-10-24-at-12-02-45-PM hosted on ImgBB

    This is not a good alignment at all. Is BWA wrong? Am I missing something?
    Thank you for your input in advance!
  • radood
    Junior Member
    • Oct 2019
    • 5

    #2
    Checking to see if folks have feedback on this. I read that "BWA actually follows the SAM spec and reports Phred scores as MAPQ values.* The calculation is based on the number of optimal (best) alignments found, as well as the number of sub-optimal alignments combined with the Phred scores of the bases which differ between the optimal and sub-optimal alignments."

    Does that mean that the mapq score of a clipped read is a combination of the quality of the 'matched' alignment AND the quality of the alignment of the clipped bases? Perhaps this explains the strange finding I saw above.

    If so, how does one extract the mapping quality of only the 'matched' portion of the read at this location (29446797)?

    Comment

    • r.rosati
      Member
      • Aug 2015
      • 95

      #3
      Hi! Yes you can BLAST it, but since it's a short sequence it's best if you use the `blastn` algorithm instead of the `megablast` one. You can also decrease the word size to 7 to increase sensitivity.

      Here's your search, it'll be available until october 31:


      According to the BLAST results, the sequence matches chr2:29446798-29446833 (hg19).

      ...PS anyways the CIGAR string says "36M" but the alignment to GRCh37.p13 shows one mismatch.

      ...PPS oh, I just noticed - I know where you got it wrong! Bit 16 in the SAM flag does not simply mean "the read is aligned to the reverse strand", it means "the read is represented as its reverse-complement because it aligned to the reverse strand". Sequences in SAM files are always represented on the forward strand. So if bit 16 is 1, as in this case, it means that "CCCC..." is ALREADY the reverse-complement of the actual read. So you don't have to reverse-complement it again. In fact, you can find your sequence in the same screenshot you posted.
      Last edited by r.rosati; 10-29-2019, 09:46 AM.

      Comment

      • radood
        Junior Member
        • Oct 2019
        • 5

        #4
        Wow I do see it now!!! Thank you so very much Rosati. Yes this is super helpful and it makes a lot of sense

        I didn't realize that reads in the SAM file always represent the forward strand. How about cases when I check read.get_forward_sequence in pysam, sometimes the output is the reverse complement of the sequence in the read itself, like this example:

        Read:
        0a023638:0B0B 83 0 9999 0 144M 0 10005 144 CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCCTAACCCTAACCC array('B', [12, 12, 41, 37, 32, 32, 32, 32, 41, 37, 37, 37, 22, 32, 27, 12, 37, 32, 27, 32, 27, 27, 41, 41, 27, 27, 32, 32, 32, 32, 32, 27, 37, 37, 41, 37, 37, 37, 41, 41, 41, 41, 41, 41, 37, 41, 41, 41, 41, 41, 41, 37, 41, 41, 37, 41, 41, 27, 37, 41, 41, 37, 41, 37, 41, 41, 41, 41, 37, 32, 41, 32, 41, 37, 32, 22, 41, 41, 41, 41, 41, 37, 41, 41, 41, 41, 37, 27, 41, 41, 27, 37, 37, 12, 41, 41, 41, 37, 41, 37, 41, 41, 41, 37, 37, 27, 41, 37, 22, 37, 37, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 37, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 37, 41, 37, 27]) [('MD', '0A143'), ('RG', 'HMMGHBBXX.lane0.2P_FMIEx_321'), ('NM', 1), ('AS', 143), ('XS', 137)]

        But read.get_forward_sequence() is:
        GGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAG

        Thank you so much!

        Comment

        Latest Articles

        Collapse

        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


          Here are nine questions we think about, in roughly the order they matter, before...
          Yesterday, 07:11 AM
        • SEQadmin2
          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
          by SEQadmin2


          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
          ...
          06-02-2026, 10:05 AM
        • SEQadmin2
          Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
          by SEQadmin2


          With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


          Introduction

          Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
          05-22-2026, 06:42 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        20 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-09-2026, 11:58 AM
        0 responses
        38 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-05-2026, 10:09 AM
        0 responses
        44 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-04-2026, 08:59 AM
        0 responses
        49 views
        0 reactions
        Last Post SEQadmin2  
        Working...