Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • all_your_base
    Member
    • Mar 2012
    • 40

    HTSeq not working with Bowtie2 .SAM

    Hi all,

    I am having a weird problem with my Bowtie2 .SAM output for use with HTseq to count reads that correspond to genes in a .gff file.

    Usually, I can just feed my Bowtie1 .SAM into HTseq using the following command:

    htseq-count -m union -s no -t gene -i ID -o myOutput.sam myInput.sam organism.gff


    However, after switching to Bowtie2 and running the same command, I get gigabytes of this:


    Warning: Read HWI-ST1234:350WK3ACXX:6:1101:1780:2126/1 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
    Warning: Read HWI-ST1234:350WK3ACXX:6:1101:1780:2126/2 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
    Warning: Read HWI-ST1234:350WK3ACXX:6:1101:1671:2238/1 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
    Warning: Read HWI-ST1234:350WK3ACXX:6:1101:1671:2238/2 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
    Warning: Read HWI-ST1234:350WK3ACXX:6:1101:2011:2134/1 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
    Warning: Read HWI-ST1234:350WK3ACXX:6:1101:2011:2134/2 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)


    According to other forums, this usually happens when the SAM isn't sorted by read ID, so that htseq can't find the two halves of a paired-end read. However, I tried sorting my SAM in multiple ways, such as:

    sort -k1 myfile.sam > myfile_sorted.sam


    I still get the same error! Any help or suggestions are greatly appreciated
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    If any of those reads are multimapped, then using the command line sort command will not do what you want. Use samtools sort -n

    Comment

    • all_your_base
      Member
      • Mar 2012
      • 40

      #3
      @dpryan,

      Thanks for the reply, but can you please explain your answer? How does the samtools sort command differ than unix sort?

      Also, since Bowtie2 produces a SAM file by default, to use SAMtools sort, do I have to first convert to BAM, then sort, then convert back to SAM?

      Thanks...

      Comment

      • kmcarr
        Senior Member
        • May 2008
        • 1181

        #4
        The problem is the /1 and /2 in your read names. The SAM specification indicates that the names of paired reads be identical. SAM identifies read 1 or read 2 by the FLAG bits. Remove the /1 & /2 from the names in your SAM files and repeat your analysis.

        Comment

        • all_your_base
          Member
          • Mar 2012
          • 40

          #5
          @kmcarr

          Wonderful, I trimmed the /1 and /2 off my reads and made sure the mates were next to each other after sorting, and HTSEQ runs fine without the previous error messages.

          Quick question...
          After processing a few thousands reads, HTSEQ reports the following error:

          Warning: Malformed SAM line: MRNM != '*' although flag bit &0x0008 set
          Warning: Malformed SAM line: RNAME != '*' although flag bit &0x0004 set

          This is from raw Bowtie2 output; the only modifications were my /1 and /2 trimming and sorting.

          Anyone have an idea where these errors are coming from??

          Thanks!

          Comment

          • dpryan
            Devon Ryan
            • Jul 2011
            • 3478

            #6
            RNAME and MRNM are the name of the chromosome (or scaffold or whatever) to which the current read and its mate (for MRNM) map. Since the flags indicate that the reads are unmapped, it's just complaining that there's stuff here instead of an *, meaning "Not available". I don't recall ever seeing that with bowtie2, only bwa. You can normally ignore such warnings.

            Comment

            • all_your_base
              Member
              • Mar 2012
              • 40

              #7
              @dpryan,

              Thanks for all your help. My analysis is working well now

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM
              • SEQadmin2
                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                by SEQadmin2


                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                Introduction

                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                05-22-2026, 06:42 AM
              • SEQadmin2
                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                by SEQadmin2

                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                05-06-2026, 09:04 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Yesterday, 08:59 AM
              0 responses
              14 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              22 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 11:40 AM
              0 responses
              19 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 05-28-2026, 11:40 AM
              0 responses
              32 views
              0 reactions
              Last Post SEQadmin2  
              Working...