Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Picard ValidateSamFile: Problem with NM tag

    Hello everybody,

    I used bwa and samtools to map reads on a reference genome and thus obtain several bam files, one for each individual.
    I later want to call variants and therefore proceed through the GATK Best Practices.
    I am at the step of Indel Realignment.

    I checked my BAM file with Picard command "ValidateSamFile"

    Code:
    java -Xmx20g -jar /home/grosbalm/Scripts/ValidateSamFile.jar INPUT=Pd115_S1_t2_M1_f_d_RG.bam OUTPUT=out.bam REFERENCE_SEQUENCE=/data3/users/grosbalm/IlluminaData/Ref/AlMssallem/Pdac_ref2013s.fasta/Pdac_ref2013s.fasta
    and obtain a Read group missing error.

    I thus added groups using Picard command "AddOrReplaceReadGroups" :
    Code:
    java -Xmx20g -jar /home/grosbalm/Scripts/AddOrReplaceReadGroups.jar I=../MarkDupli/Pd115_S1_t2_M1_f_d.bam O=Pd115_S1_t2_M1_f_d_RG.bam LB=Pd115 PL=ILLUMINA PU=Seq1 SM=Pd115_S1_t2_M1
    Now I try the validation again but I have this new error for a lot of reads :
    ERROR: Record 415, Read name HWI-ST1206:14:C296WACXX:6:1301:7041:38865, NM tag (nucleotide differences) in file [2] does not match reality [3]

    If I understand correctly NM is the number of mismatch between the read and the reference. So it would mean that the number of mismatch found between the read and the reference and saved in the NM tag is not the real one.

    How is this possible ?
    I am wondering at what step the NM tag is saved ? And the other tags ?
    Are they necessary for calling variants with GATK ?

    Thanks a lot

  • #2
    Hello !

    I have been looking for but still don't find a correct answer.
    I was however wondering if there is a mismatch because the NM tag generated during mapping counts the clipping part while ValidateSanFile doesn't ?

    Muriel

    Comment


    • #3
      You might want to look at that read. Just
      Code:
      samtools view Pd115_S1_t2_M1_f_d_RG.bam | grep "HWI-ST1206:14:C296WACXX:6:1301:7041:38865"
      to see what the alignment looks like. I wonder if this is a chimeric alignment and if ValidateSamFile can't handle that.

      Comment


      • #4
        Here it is :

        HWI-ST1206:14:C296WACXX:6:1301:7041:38865 163 KE332545.1 645 60 7M1I93M = 812 268 ATATTTATTTTTTTTTATAAACTGTTATGTGACTTATTATTGGGAGCATGTTCATGACTTTGATTTGGAAATTCACGATGTGGAAAATTTATTTATTGATT @@@FDFADHHHHHJJJHIJJIJJJIG@FGDDHGIICGICHIHII;CH@DHGGHGGHCDGEHHEHHHFFFFFDEEEEDDDDDCC@AACDCDDDDDDEDDDED X0:i:1 X1:i:0 MD:Z:100 RG:Z:Pd115_MTP1_Seq1 XG:i:1 AM:i:23 NM:i:1 SM:i:37 XM:i:0 XO:i:1 XT:A:U
        HWI-ST1206:14:C296WACXX:6:1301:7041:38865 83 KE332545.1 812 60 101M = 645 -268 ACTGAGGAACTGGTTCCGACACCGTGACCACCGGTGATAGAATAGTGGCGGCACAGGGGTGCGTTTTGCTCTGCGGAGCGGCTCAGTGGAGCGTGAGATTG CDDDCDDCADDDDBDBBA5<2BDCC>9B@9BDCDDDEEFEEDDDDBDDFDFEEHEJIJIIGGIJJJJIJJJJJJIJJJJIIGIIJIJJHHGHHFFFFFCCC X0:i:1 X1:i:1 XA:Z:KE332570.1,+475765,101M,3; MD:Z:98C0G1 RG:Z:Pd115_MTP1_Seq1 XG:i:0 AM:i:23 NM:i:2 SM:i:23 XM:i:2 XN:i:3 XO:i:0 XT:A:U


        What is a chimeric alignement ?

        Thanks !

        Comment


        • #5
          The NH and MD flags disagree, which is probably causing the problem. You might use samtools calmd.

          Comment


          • #6
            Indeed ! I used

            samtools calmd -bAr input.bam reference.fasta > output.bam

            and it solves the problem !

            But still, I want to understand what is going on ? Why the NH and MD disagree ?

            Comment


            • #7
              That I don't know. You might try to track this read through the various steps and try to determine at what point the disagreement occurs. If this is present in the original alignment, then that suggests that there might be an aligner bug (in which case, please do report it to whomever wrote the aligner you're using!).

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Developments in Metagenomics
                by seqadmin





                Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                09-23-2024, 06:35 AM
              • seqadmin
                Understanding Genetic Influence on Infectious Disease
                by seqadmin




                During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                09-09-2024, 10:59 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 10-02-2024, 04:51 AM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 10-01-2024, 07:10 AM
              0 responses
              14 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 09-30-2024, 08:33 AM
              0 responses
              18 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 09-26-2024, 12:57 PM
              0 responses
              16 views
              0 likes
              Last Post seqadmin  
              Working...
              X