Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Picard ValidateSamFile: Problem with NM tag

    Hello everybody,

    I used bwa and samtools to map reads on a reference genome and thus obtain several bam files, one for each individual.
    I later want to call variants and therefore proceed through the GATK Best Practices.
    I am at the step of Indel Realignment.

    I checked my BAM file with Picard command "ValidateSamFile"

    Code:
    java -Xmx20g -jar /home/grosbalm/Scripts/ValidateSamFile.jar INPUT=Pd115_S1_t2_M1_f_d_RG.bam OUTPUT=out.bam REFERENCE_SEQUENCE=/data3/users/grosbalm/IlluminaData/Ref/AlMssallem/Pdac_ref2013s.fasta/Pdac_ref2013s.fasta
    and obtain a Read group missing error.

    I thus added groups using Picard command "AddOrReplaceReadGroups" :
    Code:
    java -Xmx20g -jar /home/grosbalm/Scripts/AddOrReplaceReadGroups.jar I=../MarkDupli/Pd115_S1_t2_M1_f_d.bam O=Pd115_S1_t2_M1_f_d_RG.bam LB=Pd115 PL=ILLUMINA PU=Seq1 SM=Pd115_S1_t2_M1
    Now I try the validation again but I have this new error for a lot of reads :
    ERROR: Record 415, Read name HWI-ST1206:14:C296WACXX:6:1301:7041:38865, NM tag (nucleotide differences) in file [2] does not match reality [3]

    If I understand correctly NM is the number of mismatch between the read and the reference. So it would mean that the number of mismatch found between the read and the reference and saved in the NM tag is not the real one.

    How is this possible ?
    I am wondering at what step the NM tag is saved ? And the other tags ?
    Are they necessary for calling variants with GATK ?

    Thanks a lot

  • #2
    Hello !

    I have been looking for but still don't find a correct answer.
    I was however wondering if there is a mismatch because the NM tag generated during mapping counts the clipping part while ValidateSanFile doesn't ?

    Muriel

    Comment


    • #3
      You might want to look at that read. Just
      Code:
      samtools view Pd115_S1_t2_M1_f_d_RG.bam | grep "HWI-ST1206:14:C296WACXX:6:1301:7041:38865"
      to see what the alignment looks like. I wonder if this is a chimeric alignment and if ValidateSamFile can't handle that.

      Comment


      • #4
        Here it is :

        HWI-ST1206:14:C296WACXX:6:1301:7041:38865 163 KE332545.1 645 60 7M1I93M = 812 268 ATATTTATTTTTTTTTATAAACTGTTATGTGACTTATTATTGGGAGCATGTTCATGACTTTGATTTGGAAATTCACGATGTGGAAAATTTATTTATTGATT @@@FDFADHHHHHJJJHIJJIJJJIG@FGDDHGIICGICHIHII;CH@DHGGHGGHCDGEHHEHHHFFFFFDEEEEDDDDDCC@AACDCDDDDDDEDDDED X0:i:1 X1:i:0 MD:Z:100 RG:Z:Pd115_MTP1_Seq1 XG:i:1 AM:i:23 NM:i:1 SM:i:37 XM:i:0 XO:i:1 XT:A:U
        HWI-ST1206:14:C296WACXX:6:1301:7041:38865 83 KE332545.1 812 60 101M = 645 -268 ACTGAGGAACTGGTTCCGACACCGTGACCACCGGTGATAGAATAGTGGCGGCACAGGGGTGCGTTTTGCTCTGCGGAGCGGCTCAGTGGAGCGTGAGATTG CDDDCDDCADDDDBDBBA5<2BDCC>9B@9BDCDDDEEFEEDDDDBDDFDFEEHEJIJIIGGIJJJJIJJJJJJIJJJJIIGIIJIJJHHGHHFFFFFCCC X0:i:1 X1:i:1 XA:Z:KE332570.1,+475765,101M,3; MD:Z:98C0G1 RG:Z:Pd115_MTP1_Seq1 XG:i:0 AM:i:23 NM:i:2 SM:i:23 XM:i:2 XN:i:3 XO:i:0 XT:A:U


        What is a chimeric alignement ?

        Thanks !

        Comment


        • #5
          The NH and MD flags disagree, which is probably causing the problem. You might use samtools calmd.

          Comment


          • #6
            Indeed ! I used

            samtools calmd -bAr input.bam reference.fasta > output.bam

            and it solves the problem !

            But still, I want to understand what is going on ? Why the NH and MD disagree ?

            Comment


            • #7
              That I don't know. You might try to track this read through the various steps and try to determine at what point the disagreement occurs. If this is present in the original alignment, then that suggests that there might be an aligner bug (in which case, please do report it to whomever wrote the aligner you're using!).

              Comment

              Latest Articles

              Collapse

              • seqadmin
                The Impact of AI in Genomic Medicine
                by seqadmin



                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                02-26-2024, 02:07 PM
              • seqadmin
                Multiomics Techniques Advancing Disease Research
                by seqadmin


                New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                A major leap in the field has
                ...
                02-08-2024, 06:33 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:12 AM
              0 responses
              17 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 02-23-2024, 04:11 PM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 02-21-2024, 08:52 AM
              0 responses
              73 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 02-20-2024, 08:57 AM
              0 responses
              62 views
              0 likes
              Last Post seqadmin  
              Working...
              X