Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Simon Anders
    Senior Member
    • Feb 2010
    • 995

    #31
    Originally posted by tangx_2010 View Post
    1. How does HTSeq count pair-end reads by genes when both singleton and paired reads are presented? If the count is based on fragment, then paired two reads are counted by one, and singleton also counted by one, right?
    Exactly. A paired read is only counted if both ends map to the same gene, and then, it is counted once, not twice, to this gene.

    2. How does HTSeq treats multiple mapped reads? Are they just counted several times by different genes?
    This is a bit an issue, as different aligners have different ways of reporting multiple hits, and the SAM specification is frustratingly unclear on this.

    In general, a multiply aligned read should be discarded. (Imagine, genes A and B have partial sequence identity. If A is differentially expressed and B is not, any read originating from A that matches to both A and B will let B appear as differentially expressed, too, if it is counted for both. Hence, the prudent strategy is to only count reads that map uniquely to a gene.)

    For now, HTSeq looks for the "NH" optional flag. If it indicates that more than one alignment is reported, the read is not counted. If you use the "--minaqual" option, you can also cause all reads with low alignment quality to be skipped, which is another way how some aligners tag multiple alignments. If neither of the two works for you, you should pre-filter the SAM file. It is easy to write such a filtering script with HTSeq.

    Comment

    • jameslz
      Member
      • Nov 2009
      • 20

      #32
      Originally posted by Simon Anders View Post
      Exactly. A paired read is only counted if both ends map to the same gene, and then, it is counted once, not twice, to this gene.



      This is a bit an issue, as different aligners have different ways of reporting multiple hits, and the SAM specification is frustratingly unclear on this.

      In general, a multiply aligned read should be discarded. (Imagine, genes A and B have partial sequence identity. If A is differentially expressed and B is not, any read originating from A that matches to both A and B will let B appear as differentially expressed, too, if it is counted for both. Hence, the prudent strategy is to only count reads that map uniquely to a gene.)

      For now, HTSeq looks for the "NH" optional flag. If it indicates that more than one alignment is reported, the read is not counted. If you use the "--minaqual" option, you can also cause all reads with low alignment quality to be skipped, which is another way how some aligners tag multiple alignments. If neither of the two works for you, you should pre-filter the SAM file. It is easy to write such a filtering script with HTSeq.
      I have a question:
      The samtools flagstat result of the bam:

      8236963 + 0 in total (QC-passed reads + QC-failed reads)
      0 + 0 duplicates
      8236963 + 0 mapped (100.00%:-nan%)
      8236963 + 0 paired in sequencing
      4144407 + 0 read1
      4092556 + 0 read2
      6061528 + 0 properly paired (73.59%:-nan%)
      7724492 + 0 with itself and mate mapped
      512471 + 0 singletons (6.22%:-nan%)
      0 + 0 with mate mapped to a different chr
      0 + 0 with mate mapped to a different chr (mapQ>=5)

      the line : 7724492 + 0 with itself and mate mapped (with itself : that both the forward and reverse are mapped, http://i.seqanswers.com/questions/80...lagstat-output)

      dose HTSeq handle such paired reads?

      Comment

      • tangx_2010
        Junior Member
        • Mar 2011
        • 5

        #33
        pair-end strand specific RNA-seq

        Originally posted by Simon Anders View Post
        Exactly. A paired read is only counted if both ends map to the same gene, and then, it is counted once, not twice, to this gene.
        Thank you for your reply. You are such a good man with patience. And there is another problem. I am runing HTSeq with pair-end and strand-specific RNA-seq data. It's very strange that most of the reads (> 90%) are counted as ambiguous. However, when I take the data as non strand-specific, HTSeq works very well. Waiting for your reply.
        sincerely tangx

        Comment

        • edge
          Senior Member
          • Sep 2009
          • 199

          #34
          Hi quinlana,

          I try the following command:
          Code:
          time samtools -f 0x2 /tophat_out/accepted_hits.bam | coverageBed -abam  /tophat_out/accepted_hits.bam -b transcripts.gtf > transcripts.proper.cov &
          Unfortunately, it gives the following error message:
          Code:
          BgzfStream ERROR: read block failed - invalid block header
          BamHeader ERROR: could not read magic number
          Do you mind to explain a little bit more about the correct command?
          Kindly point out the error that I did in my example.
          Thanks.

          Comment

          • edge
            Senior Member
            • Sep 2009
            • 199

            #35
            Hi Auction,

            Is it this should be the correct command in order to identify the raw count of each transcript ?
            Code:
            samtools [B]view[/B] -f 0x2 /tophat_out/accepted_hits.bam | coverageBed -abam  /tophat_out/accepted_hits.bam -b transcripts.gtf > transcripts.proper.cov

            Comment

            • Simon Anders
              Senior Member
              • Feb 2010
              • 995

              #36
              Originally posted by tangx_2010 View Post
              Thank you for your reply. You are such a good man with patience. And there is another problem. I am runing HTSeq with pair-end and strand-specific RNA-seq data. It's very strange that most of the reads (> 90%) are counted as ambiguous. However, when I take the data as non strand-specific, HTSeq works very well. Waiting for your reply.
              sincerely tangx
              This is weird. Are you sure, you mean 'ambiguous', and not 'no_feature'?

              Comment

              • oliviera
                Member
                • Apr 2010
                • 31

                #37
                multiple mapped miRNA with HTseq/DEseq

                In general, a multiply aligned read should be discarded. (Imagine, genes A and B have partial sequence identity. If A is differentially expressed and B is not, any read originating from A that matches to both A and B will let B appear as differentially expressed, too, if it is counted for both. Hence, the prudent strategy is to only count reads that map uniquely to a gene.)
                Dear Simon,
                I hope this thread will still find you well.
                I am mapping reads from small RNAseq and would like to use HTseq/ DEseq. As miRNA map very often to more than one genomic location I would like to know your suggestions to deal with these multiple mapped reads with HTSeq. For what I saw on your previous feedback, you suggest to remove the multiple mapped reads but in my case that would mean that 80% of the reads are discarded... Could you please advise some methodology to use HTseq with miRNA data?

                Olivier

                Comment

                • Simon Anders
                  Senior Member
                  • Feb 2010
                  • 995

                  #38
                  Sorry, I've never worked myself with miRNA-Seq data. From what I've heard, a common approach is to first compile a list of all miRNA sequences in the genome, then find for each read a miRNA in the list. In other words: skip the alignment against the genome completely, instead just compare reads with miRNAs. Of course, htseq-count is not suitable for this, but with some Python knowledge, you can write your own script with HTSeq.

                  Comment

                  • Baoqing
                    Member
                    • Jan 2013
                    • 24

                    #39
                    HT_SEQ installation

                    Hi, Simon

                    I was trying to install HTseq on my local ubuntu machine by following your instruction below:


                    However, there are some error messages there. could you help me to get around this?

                    Thanks for your help!

                    bq@bq-VirtualBox:~/Desktop/apps/HTSeq-0.5.4p3$ ls
                    build_it doc HTSeq.egg-info MANIFEST.in README setup.cfg src
                    clean HTSeq LICENSE PKG-INFO scripts setup.py VERSION
                    bq@bq-VirtualBox:~/Desktop/apps/HTSeq-0.5.4p3$ python setup.py install --user
                    Could not import 'setuptools', falling back to 'distutils'.

                    running install
                    running build
                    running build_py
                    creating build
                    creating build/lib.linux-x86_64-2.7
                    creating build/lib.linux-x86_64-2.7/HTSeq
                    copying HTSeq/__init__.py -> build/lib.linux-x86_64-2.7/HTSeq
                    copying HTSeq/_HTSeq_internal.py -> build/lib.linux-x86_64-2.7/HTSeq
                    copying HTSeq/StepVector.py -> build/lib.linux-x86_64-2.7/HTSeq
                    copying HTSeq/_version.py -> build/lib.linux-x86_64-2.7/HTSeq
                    creating build/lib.linux-x86_64-2.7/HTSeq/scripts
                    copying HTSeq/scripts/__init__.py -> build/lib.linux-x86_64-2.7/HTSeq/scripts
                    copying HTSeq/scripts/qa.py -> build/lib.linux-x86_64-2.7/HTSeq/scripts
                    copying HTSeq/scripts/count.py -> build/lib.linux-x86_64-2.7/HTSeq/scripts
                    running build_ext
                    building 'HTSeq._HTSeq' extension
                    creating build/temp.linux-x86_64-2.7
                    creating build/temp.linux-x86_64-2.7/src
                    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -c src/_HTSeq.c -o build/temp.linux-x86_64-2.7/src/_HTSeq.o -w
                    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro build/temp.linux-x86_64-2.7/src/_HTSeq.o -o build/lib.linux-x86_64-2.7/HTSeq/_HTSeq.so
                    building 'HTSeq._StepVector' extension
                    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/StepVector_wrap.cxx -o build/temp.linux-x86_64-2.7/src/StepVector_wrap.o -w
                    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for Ada/C/ObjC but not for C++ [enabled by default]
                    g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro build/temp.linux-x86_64-2.7/src/StepVector_wrap.o -o build/lib.linux-x86_64-2.7/HTSeq/_StepVector.so
                    running build_scripts
                    creating build/scripts-2.7
                    copying and adjusting scripts/htseq-qa -> build/scripts-2.7
                    copying and adjusting scripts/htseq-count -> build/scripts-2.7
                    changing mode of build/scripts-2.7/htseq-qa from 664 to 775
                    changing mode of build/scripts-2.7/htseq-count from 664 to 775
                    running install_lib
                    creating /home/bq/.local/lib/python2.7/site-packages/HTSeq
                    copying build/lib.linux-x86_64-2.7/HTSeq/_StepVector.so -> /home/bq/.local/lib/python2.7/site-packages/HTSeq
                    copying build/lib.linux-x86_64-2.7/HTSeq/_HTSeq.so -> /home/bq/.local/lib/python2.7/site-packages/HTSeq
                    copying build/lib.linux-x86_64-2.7/HTSeq/StepVector.py -> /home/bq/.local/lib/python2.7/site-packages/HTSeq
                    copying build/lib.linux-x86_64-2.7/HTSeq/_version.py -> /home/bq/.local/lib/python2.7/site-packages/HTSeq
                    copying build/lib.linux-x86_64-2.7/HTSeq/_HTSeq_internal.py -> /home/bq/.local/lib/python2.7/site-packages/HTSeq
                    creating /home/bq/.local/lib/python2.7/site-packages/HTSeq/scripts
                    copying build/lib.linux-x86_64-2.7/HTSeq/scripts/qa.py -> /home/bq/.local/lib/python2.7/site-packages/HTSeq/scripts
                    copying build/lib.linux-x86_64-2.7/HTSeq/scripts/count.py -> /home/bq/.local/lib/python2.7/site-packages/HTSeq/scripts
                    copying build/lib.linux-x86_64-2.7/HTSeq/scripts/__init__.py -> /home/bq/.local/lib/python2.7/site-packages/HTSeq/scripts
                    copying build/lib.linux-x86_64-2.7/HTSeq/__init__.py -> /home/bq/.local/lib/python2.7/site-packages/HTSeq
                    byte-compiling /home/bq/.local/lib/python2.7/site-packages/HTSeq/StepVector.py to StepVector.pyc
                    byte-compiling /home/bq/.local/lib/python2.7/site-packages/HTSeq/_version.py to _version.pyc
                    byte-compiling /home/bq/.local/lib/python2.7/site-packages/HTSeq/_HTSeq_internal.py to _HTSeq_internal.pyc
                    byte-compiling /home/bq/.local/lib/python2.7/site-packages/HTSeq/scripts/qa.py to qa.pyc
                    byte-compiling /home/bq/.local/lib/python2.7/site-packages/HTSeq/scripts/count.py to count.pyc
                    byte-compiling /home/bq/.local/lib/python2.7/site-packages/HTSeq/scripts/__init__.py to __init__.pyc
                    byte-compiling /home/bq/.local/lib/python2.7/site-packages/HTSeq/__init__.py to __init__.pyc
                    running install_scripts
                    creating /home/bq/.local/bin
                    copying build/scripts-2.7/htseq-qa -> /home/bq/.local/bin
                    copying build/scripts-2.7/htseq-count -> /home/bq/.local/bin
                    changing mode of /home/bq/.local/bin/htseq-qa to 775
                    changing mode of /home/bq/.local/bin/htseq-count to 775
                    running install_egg_info
                    Writing /home/bq/.local/lib/python2.7/site-packages/HTSeq-0.5.4p3.egg-info
                    bq@bq-VirtualBox:~/Desktop/apps$ python
                    Python 2.7.3 (default, Aug 1 2012, 05:14:39)
                    [GCC 4.6.3] on linux2
                    Type "help", "copyright", "credits" or "license" for more information.
                    >>> import HTSeq
                    >>>
                    Last edited by Baoqing; 06-09-2013, 12:48 PM.

                    Comment

                    • Simon Anders
                      Senior Member
                      • Feb 2010
                      • 995

                      #40
                      I don't see any error messages.

                      "Could not import 'setuptools', falling back to 'distutils'." -- Well, that's what a fall-back is for: If you don't have setuptools, it uses distutils which works as well.

                      Comment

                      • Baoqing
                        Member
                        • Jan 2013
                        • 24

                        #41
                        sorry, just start to pick up the python, any reminder message freak me out.

                        Just set up the path right for htseq-count

                        then run it with my bam files:
                        samtools view accepted_hits.bam | htseq-count - gene.gtf > counts.txt

                        While the program was running, there was a warning "claimed to have an aligned mate which could be found. (Is the sam file properly sorted?)

                        Does this warning "complaining that my file is not pair mated, or do i have to add other arguments for this line to run?

                        Thanks a lot for your help!

                        Best,
                        Last edited by Baoqing; 06-09-2013, 02:56 PM.

                        Comment

                        • rboettcher
                          Member
                          • Oct 2010
                          • 71

                          #42
                          Your bam/sam file needs to be name sorted for HTSeq-count, see "samtools sort -n".

                          Regards

                          Comment

                          • Baoqing
                            Member
                            • Jan 2013
                            • 24

                            #43
                            Thank you, I did sort my bam files though. I have the program to finished running, and it turned out a dataset was produced, does that mean everything is good? Should I just ignore the warning message?

                            Best,

                            Comment

                            • dpryan
                              Devon Ryan
                              • Jul 2011
                              • 3478

                              #44
                              Originally posted by Baoqing View Post
                              Thank you, I did sort my bam files though. I have the program to finished running, and it turned out a dataset was produced, does that mean everything is good? Should I just ignore the warning message?

                              Best,
                              See this thread, that's likely what's going on.

                              Comment

                              • Baoqing
                                Member
                                • Jan 2013
                                • 24

                                #45
                                awesome. The name sorted bam file worked perfect! Really appreciate it!

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, Today, 08:59 AM
                                0 responses
                                9 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 12:03 PM
                                0 responses
                                21 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 11:40 AM
                                0 responses
                                17 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                30 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...