Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • dmsoanes
    Junior Member
    • Jul 2012
    • 2

    HTseq count - all reads are "alignment_not_unique"

    I have been having problems using HTseq-count.
    I’ll go through my protocol
    1: Take tophat output (accepted_hits.bam), sort them by read and convert to bam file.

    samtools sort -n mst12_t14_2_accepted_hits.bam mst12_t14_2_accepted_hits_read_sorted

    samtools view mst12_t14_2_accepted_hits_read_sorted.bam > mst12_t14_2_accepted_hits_read_sorted.sam


    2: Use HTseq-count to count reads for each gene using GFF file used for tophat alignment

    htseq-count -m union -s no -t exon -i gene_id mst12_t14_2_accepted_hits_read_sorted.sam magnaporthe_oryzae_70-15_8_transcripts.gff > test.txt

    There were no error messages but when I looked at the output, the count for every gene is zero and all the reads are in “alignment_not_unique” category. Reads in this category supposedly have NH > 1 in SAM file, but the reads I can see all have NH=1.
    Below is a sample of the SAM file and the GFF.

    D3P26HQ1:1290TA0ACXX:2:1101:1108:73648 73 supercont8.4 4401713 50 35M * 0 0
    CCTTGATGTGGACGCGCCCNGCGACCTTGCTGGCG ?=@BDDDDCF>D>EGG6FF#1:?DHDB@DH9D>;F AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:19G15 YT:Z:UU NH:i:1
    D3P26HQ1:1290TA0ACXX:2:1101:1109:25458 89 supercont8.4 1740219 50 35M * 0 0
    AGCGAATGACGAGCGNCGCTATCATGATAACGTCA ?091D?)1:?<<333#@2EDE<DC<>C;?2=41?: AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:15A19 YT:Z:UU NH:i:1
    D3P26HQ1:1290TA0ACXX:2:1101:1112:73199 99 supercont8.1 3567120 50 36M = 3567285 210
    TGCACAGAGAGATGGTTGCNTATATATATGCAAAAA ?<;ADDDD+<C;DE@2A@@#33A1CBEEC999CDDD AS:i:-1 XN:i:0 XM:i:1 XO:i:0
    XG:i:0 NM:i:1 MD:Z:19A16 YT:Z:UU NH:i:1
    D3P26HQ1:1290TA0ACXX:2:1101:1112:73199 147 supercont8.1 3567285 50 45M = 3567120 -210 CACAGACAAAGCGGCTCGTCTTGGCGCGGGCACGCATTGGCTCGG @CBHE?C?EGA@A;-(.9DDED0BD@@;<)B2DA<3HD80+BB?@ AS:i:-5 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:15G6T22 YT:Z:UU NH:i:1
    D3P26HQ1:1290TA0ACXX:2:1101:1113:7390 89 supercont8.6 1611568 50 34M * 0 0 AGTACGTCTCGGCTNTCCAGAAGGCCAAGATCAC ?1CC?)1C:+33#3BC:EECA++A?BBB+??? AS:i:-3 XM:i:2 XO:i:0 XG:i:0 MD:Z:2G11T19 NM:i:2 NH:i:1

    supercont8.5 MG8_CALLGENES_FINAL_1 start_codon 4470284 4470286 . - 0 gene_id "MGG_00005"; transcript_id "MGG_00005T0";
    supercont8.5 MG8_CALLGENES_FINAL_1 stop_codon 4469688 4469690 . - 0 gene_id "MGG_00005"; transcript_id "MGG_00005T0";
    supercont8.5 MG8_CALLGENES_FINAL_1 exon 4470173 4470286 . - . gene_id "MGG_00005"; transcript_id "MGG_00005T0";
    supercont8.5 MG8_CALLGENES_FINAL_1 CDS 4470173 4470286 . - 0 gene_id "MGG_00005"; transcript_id "MGG_00005T0";
    supercont8.5 MG8_CALLGENES_FINAL_1 exon 4470034 4470120 . - . gene_id "MGG_00005"; transcript_id "MGG_00005T0";
    supercont8.5 MG8_CALLGENES_FINAL_1 CDS 4470034 4470120 . - 0 gene_id "MGG_00005"; transcript_id "MGG_00005T0";
    supercont8.5 MG8_CALLGENES_FINAL_1 exon 4469273 4469969 . - . gene_id "MGG_00005"; transcript_id "MGG_00005T0";
    supercont8.5 MG8_CALLGENES_FINAL_1 CDS 4469691 4469969 . - 0 gene_id "MGG_00005"; transcript_id "MGG_00005T0";


    Any ideas?

    Cheers, Darren
  • Simon Anders
    Senior Member
    • Feb 2010
    • 995

    #2
    I'd guess you are using the version of two weeks ago, which contained a bug misreading the "NH" tag. Try the newest version.

    Comment

    • dmsoanes
      Junior Member
      • Jul 2012
      • 2

      #3
      Thanks for your swift reponse Simon, I have installed the latest version (HTSeq-0.5.3p9) and everything works fine.

      Comment

      Latest Articles

      Collapse

      • SEQadmin2
        Nine Things a Sample Prep Scientist Thinks About Before Sequencing
        by SEQadmin2


        I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

        Here are nine questions we think about, in roughly the order they matter, before...
        06-18-2026, 07:11 AM
      • SEQadmin2
        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
        by SEQadmin2


        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
        ...
        06-02-2026, 10:05 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, Today, 05:37 AM
      0 responses
      5 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-26-2026, 11:10 AM
      0 responses
      16 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-17-2026, 06:09 AM
      0 responses
      50 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-09-2026, 11:58 AM
      0 responses
      109 views
      0 reactions
      Last Post SEQadmin2  
      Working...