Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Kwangwoo
    Junior Member
    • Oct 2013
    • 1

    DEXSeq error: 'SAM optional field tag NH not found'

    Hi all (and Simon & Alejandro, hopefully),

    In my Bam files, optional NH tag was given to only mapped reads (NH:i:n; n>=1). There's no NH:i:0. Here is a part of the sam (PE; sorted by read name):

    H13P7ADXX130822:1:1101:1122:27935 77 * 0 0 * * 0 0 CGGGNGGATGCCTTCTTTATCTTGGATCTTGGCNTTCACATTTTCGATGGTGTCACTGGGCTCCACCTCCAGAGTG CCCF#2ADHHHH
    HJJJJJJJJJJJJIJJJJJJJ#1CFHIIJJJJJJJJJJGHHIJJHGEGGGIGHIIIEHHGDB7? PG:Z:MarkDuplicates.1E RG:Z:H13P7.1
    H13P7ADXX130822:1:1101:1122:27935 141 * 0 0 * * 0 0 NNNCNCCATCGNAAATGTGAAGGCCAAGATCCAGGATAAAGAAGGCATCCCTCCCGACCAGCAGAGGCTCATCTTT ###2#2:?@@@#
    4@@@@@@@@@@@@@@@@????8?????????????????1>?###################### PG:Z:MarkDuplicates.1E RG:Z:H13P7.1
    H13P7ADXX130822:1:1101:1124:37458 1609 3 40500185 255 43M2670N33M = 40500185 0 AGTGNATGCAGCTCACTGATTTCATCCTCAAGTTTCCGCACAGTGCCCACCAGAAGTATGTCCGACAA
    GCCTGGCA <<<@#2@((=?>7=<<6::?=9@=>>@<9><98)9@@6???8?33;>?>9;7::>6.6>?6)8=?8?8??;=0=;7 PG:Z:MarkDuplicates.1E RG:Z:H13P7.1 NH:i:1 NM:i:1 UQ:i:2 XS:A:+


    I think DEXSeq-count.py doesn't like this format showing an error message below:


    Traceback (most recent call last):
    File "/home/kkim/softwares/R_packages/DEXSeq/python_scripts/dexseq_count.py", line 225, in <module>
    rs = map_read_pair( af, ar )
    File "/home/kkim/softwares/R_packages/DEXSeq/python_scripts/dexseq_count.py", line 134, in map_read_pair
    if af != None and af.optional_field("NH") > 1:
    File "_HTSeq.pyx", line 1399, in HTSeq._HTSeq.SAM_Alignment.optional_field (src/_HTSeq.c:26483)
    KeyError: 'SAM optional field tag NH not found'
    Actually, this bam file has worked well with HTSeq-count without an error. The manual of HTSeq says ......If the aligner does not set this field, multiply aligned reads will be counted multiple times. ... I think HTSeq can handle my format.

    Do you have any suggestion for avoiding this error message by dexseq-count? What if I grep the sam file by "NH" before running dexseq-count?? There was no error but I cannot figure out how does it affect to my result.

    Thanks in advance for your reply.


    Kwangwoo
    Last edited by Kwangwoo; 10-31-2013, 11:44 AM.
  • areyes
    Senior Member
    • Aug 2010
    • 165

    #2
    Hi Kwangwoo,

    Thanks for your post. This is a recent change introduced to the python script to make sure to only consider uniquely mapped reads.

    Previously, the scripts were checking for a good alignment quality (e.g. more than 10) in the sam file, however we noticed that some aligners were giving very good alignment qualities to multiple mapping reads. If a read mapping multiple times should have a good quality score is debatable.

    Therefore we decided to introduce this change and instead check for the NH flag, and a read will only be considered if its value is 1... but I did not know it was an optional flag? What aligner are you using?

    A quick fix for you would be just to prefilter your reads for those with a unique alignment, e.g. NH:i:1. But I would first make sure that all the uniquely aligned reads have the NH flag!

    Alejandro

    Comment

    • Jose Garcia
      Junior Member
      • Jul 2013
      • 4

      #3
      NH not found

      Same error for me. We are using Soapsplice and as Alejandro was saying, nothing happened with the old dexseq_count.py.
      Jose



      Originally posted by areyes View Post
      Hi Kwangwoo,

      Thanks for your post. This is a recent change introduced to the python script to make sure to only consider uniquely mapped reads.

      Previously, the scripts were checking for a good alignment quality (e.g. more than 10) in the sam file, however we noticed that some aligners were giving very good alignment qualities to multiple mapping reads. If a read mapping multiple times should have a good quality score is debatable.

      Therefore we decided to introduce this change and instead check for the NH flag, and a read will only be considered if its value is 1... but I did not know it was an optional flag? What aligner are you using?

      A quick fix for you would be just to prefilter your reads for those with a unique alignment, e.g. NH:i:1. But I would first make sure that all the uniquely aligned reads have the NH flag!

      Alejandro

      Comment

      • dejavu2010
        Member
        • Jan 2012
        • 21

        #4
        where can i find the old python script?

        Comment

        • Richard Finney
          Senior Member
          • Feb 2009
          • 701

          #5
          FYI: I run a custom program (getnh) to slap the NH tag on ...
          samtools view original.bam | ./getnh | python dexseq_count.py -f sam --paired=yes hg19.gff - outputcounts.txt 2> errfile

          This takes the bam file with missing nh tags and outputs as a sam file with new NH tags which gets piped into python dexseq_count.py

          /*
          how to compile : gcc -Wall -O2 -o getnh getnh.c
          */


          #include <stdio.h>
          #include <stdlib.h>
          #include <string.h>

          int main()
          {
          char *z;
          char s[5000];

          s[0] = (char)0;
          while (gets(s))
          {
          z = strstr(s,"NH:i");
          if (z)
          {
          printf("%s\n",s);
          s[0] = (char)0;
          continue;
          }
          printf("%s\tNH:i:1\n",s);
          s[0] = (char)0;
          }
          return 0;
          }

          Comment

          • areyes
            Senior Member
            • Aug 2010
            • 165

            #6
            where can i find the old python script?
            From old versions of DEXSeq, for instance:

            The package is focused on finding differential exon usage using RNA-seq exon counts between samples with different experimental designs. It provides functions that allows the user to make the necessary statistical tests based on a model that uses the negative binomial distribution to estimate the variance between biological replicates and generalized linear models for testing. The package also provides functions for the visualization and exploration of the results.


            But I would rather add the NH flag, rather than using old, not-mantained software!

            Comment

            • jrounds
              Junior Member
              • Nov 2014
              • 3

              #7
              I aligned with bowtie2, and according to what I read it does not set this flag.

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM
              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Today, 05:37 AM
              0 responses
              5 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-26-2026, 11:10 AM
              0 responses
              16 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              50 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              110 views
              0 reactions
              Last Post SEQadmin2  
              Working...