Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Picard ValidateSamFile: Problem with NM tag

    Hello everybody,

    I used bwa and samtools to map reads on a reference genome and thus obtain several bam files, one for each individual.
    I later want to call variants and therefore proceed through the GATK Best Practices.
    I am at the step of Indel Realignment.

    I checked my BAM file with Picard command "ValidateSamFile"

    Code:
    java -Xmx20g -jar /home/grosbalm/Scripts/ValidateSamFile.jar INPUT=Pd115_S1_t2_M1_f_d_RG.bam OUTPUT=out.bam REFERENCE_SEQUENCE=/data3/users/grosbalm/IlluminaData/Ref/AlMssallem/Pdac_ref2013s.fasta/Pdac_ref2013s.fasta
    and obtain a Read group missing error.

    I thus added groups using Picard command "AddOrReplaceReadGroups" :
    Code:
    java -Xmx20g -jar /home/grosbalm/Scripts/AddOrReplaceReadGroups.jar I=../MarkDupli/Pd115_S1_t2_M1_f_d.bam O=Pd115_S1_t2_M1_f_d_RG.bam LB=Pd115 PL=ILLUMINA PU=Seq1 SM=Pd115_S1_t2_M1
    Now I try the validation again but I have this new error for a lot of reads :
    ERROR: Record 415, Read name HWI-ST1206:14:C296WACXX:6:1301:7041:38865, NM tag (nucleotide differences) in file [2] does not match reality [3]

    If I understand correctly NM is the number of mismatch between the read and the reference. So it would mean that the number of mismatch found between the read and the reference and saved in the NM tag is not the real one.

    How is this possible ?
    I am wondering at what step the NM tag is saved ? And the other tags ?
    Are they necessary for calling variants with GATK ?

    Thanks a lot

  • #2
    Hello !

    I have been looking for but still don't find a correct answer.
    I was however wondering if there is a mismatch because the NM tag generated during mapping counts the clipping part while ValidateSanFile doesn't ?

    Muriel

    Comment


    • #3
      You might want to look at that read. Just
      Code:
      samtools view Pd115_S1_t2_M1_f_d_RG.bam | grep "HWI-ST1206:14:C296WACXX:6:1301:7041:38865"
      to see what the alignment looks like. I wonder if this is a chimeric alignment and if ValidateSamFile can't handle that.

      Comment


      • #4
        Here it is :

        HWI-ST1206:14:C296WACXX:6:1301:7041:38865 163 KE332545.1 645 60 7M1I93M = 812 268 ATATTTATTTTTTTTTATAAACTGTTATGTGACTTATTATTGGGAGCATGTTCATGACTTTGATTTGGAAATTCACGATGTGGAAAATTTATTTATTGATT @@@FDFADHHHHHJJJHIJJIJJJIG@FGDDHGIICGICHIHII;CH@DHGGHGGHCDGEHHEHHHFFFFFDEEEEDDDDDCC@AACDCDDDDDDEDDDED X0:i:1 X1:i:0 MD:Z:100 RG:Z:Pd115_MTP1_Seq1 XG:i:1 AM:i:23 NM:i:1 SM:i:37 XM:i:0 XO:i:1 XT:A:U
        HWI-ST1206:14:C296WACXX:6:1301:7041:38865 83 KE332545.1 812 60 101M = 645 -268 ACTGAGGAACTGGTTCCGACACCGTGACCACCGGTGATAGAATAGTGGCGGCACAGGGGTGCGTTTTGCTCTGCGGAGCGGCTCAGTGGAGCGTGAGATTG CDDDCDDCADDDDBDBBA5<2BDCC>9B@9BDCDDDEEFEEDDDDBDDFDFEEHEJIJIIGGIJJJJIJJJJJJIJJJJIIGIIJIJJHHGHHFFFFFCCC X0:i:1 X1:i:1 XA:Z:KE332570.1,+475765,101M,3; MD:Z:98C0G1 RG:Z:Pd115_MTP1_Seq1 XG:i:0 AM:i:23 NM:i:2 SM:i:23 XM:i:2 XN:i:3 XO:i:0 XT:A:U


        What is a chimeric alignement ?

        Thanks !

        Comment


        • #5
          The NH and MD flags disagree, which is probably causing the problem. You might use samtools calmd.

          Comment


          • #6
            Indeed ! I used

            samtools calmd -bAr input.bam reference.fasta > output.bam

            and it solves the problem !

            But still, I want to understand what is going on ? Why the NH and MD disagree ?

            Comment


            • #7
              That I don't know. You might try to track this read through the various steps and try to determine at what point the disagreement occurs. If this is present in the original alignment, then that suggests that there might be an aligner bug (in which case, please do report it to whomever wrote the aligner you're using!).

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Addressing Off-Target Effects in CRISPR Technologies
                by seqadmin






                The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                08-27-2024, 04:44 AM
              • seqadmin
                Selecting and Optimizing mRNA Library Preparations
                by seqadmin



                Sequencing mRNA provides a snapshot of cellular activity, allowing researchers to study the dynamics of cellular processes, compare gene expression across different tissue types, and gain insights into the mechanisms of complex diseases. “mRNA’s central role in the dogma of molecular biology makes it a logical and relevant focus for transcriptomic studies,” stated Sebastian Aguilar Pierlé, Ph.D., Application Development Lead at Inorevia. “One of the major hurdles for...
                08-07-2024, 12:11 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 08-27-2024, 04:40 AM
              0 responses
              16 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 08-22-2024, 05:00 AM
              0 responses
              293 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 08-21-2024, 10:49 AM
              0 responses
              135 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 08-19-2024, 05:12 AM
              0 responses
              124 views
              0 likes
              Last Post seqadmin  
              Working...
              X