Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Wrong bwa alignment?

    Hi,
    Below is a supplementary alignment for one of my reads with a mapping quality score of 43.This should map to chr2: 29446797 in hg19

    03012p9SX:030B 2064 1 29446797 43 36M241H -1 -1 36 CCCCACGTGAACGAGGGAGGGAGGGAGGGTTGGGTG array('B', [32, 37, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 37, 41, 41, 41, 41, 41, 41, 41, 41, 37, 41, 41]) [('SA', 'chr2,42479618,+,238M39S,60,2;'), ('MD', '26A9'), ('RG', 'SAMPLE.sample23'), ('NM', 1), ('AS', 31), ('XS', 23), ('fm', '3_0')]

    Since the read is on the negative strand, below is the forward sequence:
    CACCCAACCCTCCCTCCCTCCCTCGTTCACGTGGGG

    However this sequence does not map to chr2: 29446797. I tried blasting but I think it is too short so I aligned it to the genomic sequence as you can see here
    Image Screen-Shot-2019-10-24-at-12-02-45-PM hosted in ImgBB

    This is not a good alignment at all. Is BWA wrong? Am I missing something?
    Thank you for your input in advance!

  • #2
    Checking to see if folks have feedback on this. I read that "BWA actually follows the SAM spec and reports Phred scores as MAPQ values.* The calculation is based on the number of optimal (best) alignments found, as well as the number of sub-optimal alignments combined with the Phred scores of the bases which differ between the optimal and sub-optimal alignments."

    Does that mean that the mapq score of a clipped read is a combination of the quality of the 'matched' alignment AND the quality of the alignment of the clipped bases? Perhaps this explains the strange finding I saw above.

    If so, how does one extract the mapping quality of only the 'matched' portion of the read at this location (29446797)?

    Comment


    • #3
      Hi! Yes you can BLAST it, but since it's a short sequence it's best if you use the `blastn` algorithm instead of the `megablast` one. You can also decrease the word size to 7 to increase sensitivity.

      Here's your search, it'll be available until october 31:


      According to the BLAST results, the sequence matches chr2:29446798-29446833 (hg19).

      ...PS anyways the CIGAR string says "36M" but the alignment to GRCh37.p13 shows one mismatch.

      ...PPS oh, I just noticed - I know where you got it wrong! Bit 16 in the SAM flag does not simply mean "the read is aligned to the reverse strand", it means "the read is represented as its reverse-complement because it aligned to the reverse strand". Sequences in SAM files are always represented on the forward strand. So if bit 16 is 1, as in this case, it means that "CCCC..." is ALREADY the reverse-complement of the actual read. So you don't have to reverse-complement it again. In fact, you can find your sequence in the same screenshot you posted.
      Last edited by r.rosati; 10-29-2019, 09:46 AM.

      Comment


      • #4
        Wow I do see it now!!! Thank you so very much Rosati. Yes this is super helpful and it makes a lot of sense

        I didn't realize that reads in the SAM file always represent the forward strand. How about cases when I check read.get_forward_sequence in pysam, sometimes the output is the reverse complement of the sequence in the read itself, like this example:

        Read:
        0a023638:0B0B 83 0 9999 0 144M 0 10005 144 CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCCTAACCCTAACCC array('B', [12, 12, 41, 37, 32, 32, 32, 32, 41, 37, 37, 37, 22, 32, 27, 12, 37, 32, 27, 32, 27, 27, 41, 41, 27, 27, 32, 32, 32, 32, 32, 27, 37, 37, 41, 37, 37, 37, 41, 41, 41, 41, 41, 41, 37, 41, 41, 41, 41, 41, 41, 37, 41, 41, 37, 41, 41, 27, 37, 41, 41, 37, 41, 37, 41, 41, 41, 41, 37, 32, 41, 32, 41, 37, 32, 22, 41, 41, 41, 41, 41, 37, 41, 41, 41, 41, 37, 27, 41, 41, 27, 37, 37, 12, 41, 41, 41, 37, 41, 37, 41, 41, 41, 37, 37, 27, 41, 37, 22, 37, 37, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 37, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 37, 41, 37, 27]) [('MD', '0A143'), ('RG', 'HMMGHBBXX.lane0.2P_FMIEx_321'), ('NM', 1), ('AS', 143), ('XS', 137)]

        But read.get_forward_sequence() is:
        GGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAG

        Thank you so much!

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        66 views
        0 likes
        Last Post seqadmin  
        Working...
        X