Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • dglemay
    Member
    • Feb 2011
    • 16

    htseq-count performance

    Hello,

    While using the Tophat --> htseq-count --> DESeq pipeline, I'm finding that htseq-count slams my machine (8 GB RAM). I'm guessing that since it is using all memory, and even most swap memory, that perhaps I could improve the performance by breaking down the task somehow. I'm using version 0.5.3p3 with these options:

    htseq-count -m intersection-nonempty -s no -t CDS -i gene_id -o htseq_{$fprefix}.sam sorted_{$fprefix}.sam hg19_EnsGene.gff

    Would it help if I separated the input files by chromosome ?

    Thanks,
    Danielle
  • Simon Anders
    Senior Member
    • Feb 2010
    • 995

    #2
    Something went wrong here. htseq-count never uses much memory, because it reads the data read for read. Only the content of the relevant lines in the GFF file are kept in memory, and this can be nowhere near several GB. Please double-check that it really was htseq-count that filled up your machine and that your files are sane. Maybe something is wrong with the GFF file, so that HTSeq chokes on trying to read it.

    Comment

    • dglemay
      Member
      • Feb 2011
      • 16

      #3
      Hi Simon,
      Thanks so much for responding. I was hoping you would see this thread.
      I have rebooted my machine and run only my script. The process using all of the memory is python...and this is the only thing using python. Just to be sure, I've killed everything, rebooted, and started only the htseq-count call and as it is running, the memory used gradually climbs and climbs.
      The script is outputting warnings that look like this:
      Read HWI-ST623:0:2:1101:11808:178924:0:2:1 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)

      So...what may be happening is that htseq-count is storing more and more reads in memory as it tries to find the mate.

      I am sorting the sam file but I'm a newbie at this so it is possible I'm doing something wrong. The options I'm using are

      samtools sort -n bamfile.bam
      samtools view -h sorted_bamfile.bam > sorted_samfile.sam

      Here is the full script:
      foreach bamfile (./TopHat_output/*accepted_hits.bam)
      set fprefix = `echo $bamfile:t:r | sed 's/accepted_hits//'`
      samtools sort -n $bamfile sorted_{$fprefix}
      samtools view -h sorted_{$fprefix}.bam >
      ./HTSEQcount_output/sorted_{$fprefix}.sam
      /home/dglemay/work/tools/HTSeq-0.5.3p3/scripts/htseq-count -m
      intersection-nonempty -s no -t CDS -i gene_id -o
      ./HTSEQcount_output/htseq_{$fprefix}.sam
      ./HTSEQcount_output/sorted_{$fprefix}.sam ./hg19/hg19_EnsGene.gff >&!
      ./HTSEQcount_output/log_{$fprefix}.txt
      grep ENS ./HTSEQcount_output/log_{$fprefix}.txt >
      ./count_data/counts_{$fprefix}.txt
      # cleanup
      rm ./HTSEQcount_output/sorted_{$fprefix}.sam
      samtools view -bSl ./HTSEQcount_output/htseq_{$fprefix}.sam >
      ./HTSEQcount_output/htseq_{$fprefix}.bam
      end

      Thank you for reading,
      Danielle

      Comment

      • emilyjia2000
        Member
        • May 2011
        • 59

        #4
        Hi Simon,

        Is HTseq-count possible to count UTR? I tried it and got nothing. If it works on UTR, any particular aspect I have to pay attention to?

        THanks

        Comment

        • dglemay
          Member
          • Feb 2011
          • 16

          #5
          Ah!

          Should be
          samtools sort
          not
          samtools sort -n

          @emilyjia2000: you probably need to start a new thread

          Comment

          • labunit
            Member
            • Sep 2010
            • 10

            #6
            Does the memory usage climb the entire time or just at the beginning? What is the file size of your GFF?

            If it is alignment related, you would see an initial increase of memory consumption as you start the tool, as the GFF is read. Then, depending on your SAM file, the memory consumption, if what you say is true, would start to increase again. Can you observe this behavior?
            I am guessing you are using a *nix OS. Just open another Console window and enter "top". You'll be shown a detailed list of processes and their consumption of resources.

            Please correct me if I am wrong.

            Comment

            • emilyjia2000
              Member
              • May 2011
              • 59

              #7
              try picard sort, it works on me.

              Comment

              • Simon Anders
                Senior Member
                • Feb 2010
                • 995

                #8
                Originally posted by emilyjia2000 View Post
                try picard sort, it works on me.
                Sorry, but as author of HTSeq, I would like to say, just for the record: HTSeq works as well for nearly everybody, and it is designed to work with little memory. I have no clue what is wrong here but I am very sure that there must be something very strange with dglemay's input files.

                Comment

                • xuguorong
                  Member
                  • Feb 2010
                  • 27

                  #9
                  Hi Simon,

                  I am very wondering the paired-end sorting problem before using HTSeq.
                  I read many posts about this issue, but no standard and complete thread explain it.
                  At first, I sort my paired-end BAM file with the command,
                  samtools sort -n my.bam my.sort

                  Then, I convert the BAM to SAM,
                  samtools view my.sort.bam > my.sort.sam

                  finally, I run HTSeq to get the counts,
                  htseq-counts --stranded=no --mode=intersection-nonempty -t exon -i gene_id my.sort.sam annotation.gtf > output.txt

                  But I still got a lot error messages that HTSeq cannot find the other aligned mate(Is the SAM file properly sorted). Someone said that we still need to sort the SAM file again. If I sort SAM again, then how to sort it? Still sorted by name or other sorting method?
                  Could you explain it more detailed?

                  Thanks a lot!

                  Comment

                  Latest Articles

                  Collapse

                  • SEQadmin2
                    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by SEQadmin2


                    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                    Here are nine questions we think about, in roughly the order they matter, before...
                    06-18-2026, 07:11 AM
                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    06-02-2026, 10:05 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 06-26-2026, 11:10 AM
                  0 responses
                  13 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-17-2026, 06:09 AM
                  0 responses
                  48 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-09-2026, 11:58 AM
                  0 responses
                  107 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-05-2026, 10:09 AM
                  0 responses
                  125 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...