Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • dpryan
    replied
    That I don't know. You might try to track this read through the various steps and try to determine at what point the disagreement occurs. If this is present in the original alignment, then that suggests that there might be an aligner bug (in which case, please do report it to whomever wrote the aligner you're using!).

    Leave a comment:


  • MurielGB
    replied
    Indeed ! I used

    samtools calmd -bAr input.bam reference.fasta > output.bam

    and it solves the problem !

    But still, I want to understand what is going on ? Why the NH and MD disagree ?

    Leave a comment:


  • dpryan
    replied
    The NH and MD flags disagree, which is probably causing the problem. You might use samtools calmd.

    Leave a comment:


  • MurielGB
    replied
    Here it is :

    HWI-ST1206:14:C296WACXX:6:1301:7041:38865 163 KE332545.1 645 60 7M1I93M = 812 268 ATATTTATTTTTTTTTATAAACTGTTATGTGACTTATTATTGGGAGCATGTTCATGACTTTGATTTGGAAATTCACGATGTGGAAAATTTATTTATTGATT @@@FDFADHHHHHJJJHIJJIJJJIG@FGDDHGIICGICHIHII;CH@DHGGHGGHCDGEHHEHHHFFFFFDEEEEDDDDDCC@AACDCDDDDDDEDDDED X0:i:1 X1:i:0 MD:Z:100 RG:Z:Pd115_MTP1_Seq1 XG:i:1 AM:i:23 NM:i:1 SM:i:37 XM:i:0 XO:i:1 XT:A:U
    HWI-ST1206:14:C296WACXX:6:1301:7041:38865 83 KE332545.1 812 60 101M = 645 -268 ACTGAGGAACTGGTTCCGACACCGTGACCACCGGTGATAGAATAGTGGCGGCACAGGGGTGCGTTTTGCTCTGCGGAGCGGCTCAGTGGAGCGTGAGATTG CDDDCDDCADDDDBDBBA5<2BDCC>9B@9BDCDDDEEFEEDDDDBDDFDFEEHEJIJIIGGIJJJJIJJJJJJIJJJJIIGIIJIJJHHGHHFFFFFCCC X0:i:1 X1:i:1 XA:Z:KE332570.1,+475765,101M,3; MD:Z:98C0G1 RG:Z:Pd115_MTP1_Seq1 XG:i:0 AM:i:23 NM:i:2 SM:i:23 XM:i:2 XN:i:3 XO:i:0 XT:A:U


    What is a chimeric alignement ?

    Thanks !

    Leave a comment:


  • dpryan
    replied
    You might want to look at that read. Just
    Code:
    samtools view Pd115_S1_t2_M1_f_d_RG.bam | grep "HWI-ST1206:14:C296WACXX:6:1301:7041:38865"
    to see what the alignment looks like. I wonder if this is a chimeric alignment and if ValidateSamFile can't handle that.

    Leave a comment:


  • MurielGB
    replied
    Hello !

    I have been looking for but still don't find a correct answer.
    I was however wondering if there is a mismatch because the NM tag generated during mapping counts the clipping part while ValidateSanFile doesn't ?

    Muriel

    Leave a comment:


  • MurielGB
    started a topic Picard ValidateSamFile: Problem with NM tag

    Picard ValidateSamFile: Problem with NM tag

    Hello everybody,

    I used bwa and samtools to map reads on a reference genome and thus obtain several bam files, one for each individual.
    I later want to call variants and therefore proceed through the GATK Best Practices.
    I am at the step of Indel Realignment.

    I checked my BAM file with Picard command "ValidateSamFile"

    Code:
    java -Xmx20g -jar /home/grosbalm/Scripts/ValidateSamFile.jar INPUT=Pd115_S1_t2_M1_f_d_RG.bam OUTPUT=out.bam REFERENCE_SEQUENCE=/data3/users/grosbalm/IlluminaData/Ref/AlMssallem/Pdac_ref2013s.fasta/Pdac_ref2013s.fasta
    and obtain a Read group missing error.

    I thus added groups using Picard command "AddOrReplaceReadGroups" :
    Code:
    java -Xmx20g -jar /home/grosbalm/Scripts/AddOrReplaceReadGroups.jar I=../MarkDupli/Pd115_S1_t2_M1_f_d.bam O=Pd115_S1_t2_M1_f_d_RG.bam LB=Pd115 PL=ILLUMINA PU=Seq1 SM=Pd115_S1_t2_M1
    Now I try the validation again but I have this new error for a lot of reads :
    ERROR: Record 415, Read name HWI-ST1206:14:C296WACXX:6:1301:7041:38865, NM tag (nucleotide differences) in file [2] does not match reality [3]

    If I understand correctly NM is the number of mismatch between the read and the reference. So it would mean that the number of mismatch found between the read and the reference and saved in the NM tag is not the real one.

    How is this possible ?
    I am wondering at what step the NM tag is saved ? And the other tags ?
    Are they necessary for calling variants with GATK ?

    Thanks a lot

Latest Articles

Collapse

  • seqadmin
    The Impact of AI in Genomic Medicine
    by seqadmin



    Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
    02-26-2024, 02:07 PM
  • seqadmin
    Multiomics Techniques Advancing Disease Research
    by seqadmin


    New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

    A major leap in the field has
    ...
    02-08-2024, 06:33 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 06:12 AM
0 responses
17 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-23-2024, 04:11 PM
0 responses
67 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-21-2024, 08:52 AM
0 responses
73 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-20-2024, 08:57 AM
0 responses
62 views
0 likes
Last Post seqadmin  
Working...
X