Unconfigured Ad

**GenomicSeq** · 07-20-2023, 06:54 AM

Hello Perseo

It looks like there might be an issue with the compatibility of the files, as you suggested, but it's also possible that there might be something wrong with the actual GTF or SAM files.

HTSeq-count expects the GTF file to have entries for "exon" features which then should be grouped by "gene_id". In your GTF file snippet you've shared, the feature type used is 'gene' and 'CDS', but not 'exon'. This is a common cause for errors as HTSeq-count needs 'exon' features to correctly count reads.

Another potential issue could be related to the structure of the GTF file. HTSeq-count requires a very specific format, and it might be that your GTF file is not perfectly adhering to this. Make sure that your GTF file follows the correct structure.

Also, the SAM file should ideally have been sorted and indexed, even though it's not always necessary. It's a good practice to have this done before using it with HTSeq-count. Use samtools to sort and index your SAM file.

Additionally, I think that the '-i gene_id -t CDS' in your script may cause issues. The '-i' option specifies the attribute to be used as feature ID. 'gene_id' is fine if your GTF file uses that, but '-t' specifies the feature type, which should ideally be 'exon'.

Here's an example of how your command should look like:

htseq-count -f sam -i gene_id -r pos -t exon test_v2.sam ncbi_dataset_v2/ncbi_dataset/data/GCA_000007565.2/genomic.gtf > test_v2.count

I would also recommend you try using a different Python environment, with a version compatible with HTSeq (for example Python 3.7 or 3.8). Sometimes the error you're getting can occur due to a conflict in Python environments.

Let me know if any of these suggestions help, or if you need further assistance. Good luck!

**Perseo** · 07-20-2023, 08:22 AM

Hello

Thank you very much for your answer

"HTSeq-count expects the GTF file to have entries for "exon" features"

Well the file does contain exon features in the third column but you can find them in some rows further down:

AE015451.2 Genbank exon 171389 172910 . + . gene_id "PP_16SA"; transcript_id "unassigned_transcript_168"; locus_tag "PP_16SA"; note "Evidence 1b : Function experimentally demonstrated in the studied species"; product "16S ribosomal RNA"; transcript_biotype "rRNA"; exon_number "1";

AE015451.2 Genbank gene 173203 176104 . + . gene_id "PP_23SA"; transcript_id ""; gbkey "Gene"; gene_biotype "rRNA"; locus_tag "PP_23SA";

AE015451.2 Genbank transcript 173203 176104 . + . gene_id "PP_23SA"; transcript_id "unassigned_transcript_169"; gbkey "rRNA"; locus_tag "PP_23SA"; product "23S ribosomal RNA"; transcript_biotype "rRNA";

AE015451.2 Genbank exon 173203 176104 . + . gene_id "PP_23SA"; transcript_id "unassigned_transcript_169"; locus_tag "PP_23SA"; product "23S ribosomal RNA"; transcript_biotype "rRNA"; exon_number "1";

AE015451.2 Genbank gene 176237 176356 . + . gene_id "PP_5SA"; transcript_id ""; gbkey "Gene"; gene_biotype "rRNA"; locus_tag "PP_5SA";

AE015451.2 Genbank transcript 176237 176356 . + . gene_id "PP_5SA"; transcript_id "unassigned_transcript_170"; gbkey "rRNA"; locus_tag "PP_5SA"; product "5S ribosomal RNA"; transcript_biotype "rRNA";

AE015451.2 Genbank exon 176237 176356 . + . gene_id "PP_5SA"; transcript_id "unassigned_transcript_170"; locus_tag "PP_5SA"; product "5S

"Make sure that your GTF file follows the correct structure."

The file looks fine... at least it has all the entries from the example described in the web page: http://mblab.wustl.edu/GTF22.html

At least by looking seems fine. Is there any deeper way to check if the GFT file is in correct form, a command or script?

"the SAM file should ideally have been sorted and indexed"

I sorted my sam:

samtools sort -o sorted_testv2.sam test_v2.sam

I got:

@HD VN:1.0 SO:coordinate @SQ SN:AE015451.2 LN:6181873 @PG ID:hisat2 PN:hisat2 VN:2.1.0 CL:"/var/bin/hisat2-2.1.0/hisat2-align-s --wrapper basic-0 -x genome_pseudomonas_putida_v2 -S test_v2.sam --summary-file test_v2.sam.summary -1 ../NG-33893_Exp2_PP_25h_b_lib709417_10278_4_1_val_1.fq -2 ../NG-33893_Exp2_PP_25h_b_lib709417_10278_4_2_val_2.fq" A00604:565:HFWKVDSX7:4:1257:6686:25817 99 AE015451.2 1 60 24S126M = 41 213 GTTTGGCGTGCGTATTTCCATCACAACTGCTCCTCGGAAGTCGACCAACAAGTCAGCTATGACTTGGCATAATTTGTGCCGACAAAATGCGCGCAGAGTATAGGGGTGGATTAACCCCTATTCAACTCTTCGGTAGTGATTTCCGACTTC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

Doesn´t the indexing works on BAM files only?? Do I need to transform my SAM? should I try it and then running a BAM instead??

"but '-t' specifies the feature type, which should ideally be 'exon'."

I though that -t uses that argument (ie: CDS) and all features of other type are ignored.. Thus, I wanted to run only coding DNA sequences...

Anyway I have run the code as you suggested:

htseq-count -f sam -i gene_id -r pos -t exon sorted_testv2.sam ncbi_dataset_v2/ncbi_dataset/data/GCA_000007565.2/genomic.gtf > test_v2.count

I got the same fricking error:

<built-in function delete__Pair_long_obj> returned a result with an error set

[Exception type: SystemError, raised in _HTSeq_internal.py:43]

I checked my version of python:

python --version

Python 3.7.2

the htseq version is:

htseq-count --version

2.0.3

So the version should be compatible right??

Any other idea?

Thank you very much for your time and help

Kind regards

**GenomicSeq** · 07-26-2023, 05:13 AM

Sorry for the delay! I just noticed you messaged me back.

It's clear that you've done some extensive troubleshooting, but I'll suggest a few more ideas. I'm not sure if they'll work but I'll suggest them anyway.

First, I would try converting the SAM to BAM. Even though both SAM and BAM formats are compatible with htseq-count, BAM is more memory efficient. Since your error seems to be related to a memory issue in Python, I'm thinking it might help to convert your SAM file to a sorted and indexed BAM file.

Code:

samtools view -S -b test_v2.sam > test_v2.bam
samtools sort test_v2.bam -o sorted_test_v2.bam
samtools index sorted_test_v2.bam

Then you can use the sorted BAM file in your htseq-count command:

Code:

htseq-count -f bam -r pos -t exon -i gene_id sorted_test_v2.bam ncbi_dataset_v2/ncbi_dataset/data/GCA_000007565.2/genomic.gtf > test_v2.count

The next thing to do in addition to a visual inspection, is verify the integrity of your GTF file.

One such tool is `gtfToGenePred` from UCSC, which you can use as follows:

Code:

gtfToGenePred ncbi_dataset_v2/ncbi_dataset/data/GCA_000007565.2/genomic.gtf stdout | genePredCheck stdin

If the GTF file has any structural issues, this command should raise an error.

From here, I would check compatibility between Python and HTSeq. You are using Python 3.7.2 and HTSeq 2.0.3. This should technically be fine, but sometimes there can be incompatibility issues that aren't immediately obvious.

You could try creating a new Python environment with a different Python version (for example, 3.8 or 3.9) and install HTSeq in this new environment. For managing Python environments, you can use tools like `conda` or `virtualenv`.

Code:

conda create -n htseq_env python=3.8
conda activate htseq_env
pip install HTSeq

And then, use the htseq-count within this new environment.

The last thing I'll suggest if you are still experiencing issues with htseq-count, you may consider trying alternative tools such as featureCounts from the Subread package, which serves a similar purpose. It might work better for your case.

Hope at least one of these suggestions helps! Let me know if you need more assistance.

Topics	Statistics	Last Post
Study Captures the First Moments of DNA Replication by SEQadmin2 Started by SEQadmin2, 07-24-2026, 12:17 PM	0 responses 34 views 0 reactions	Last Post by SEQadmin2 07-24-2026, 12:17 PM
Chemotherapy Leaves Detectable DNA Signatures in Childhood Tumors by SEQadmin2 Started by SEQadmin2, 07-23-2026, 11:41 AM	0 responses 25 views 0 reactions	Last Post by SEQadmin2 07-23-2026, 11:41 AM
Single-Cell Atlases Skew Toward European Ancestry, Analysis Finds by SEQadmin2 Started by SEQadmin2, 07-20-2026, 11:10 AM	0 responses 217 views 0 reactions	Last Post by SEQadmin2 07-20-2026, 11:10 AM
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, 07-13-2026, 10:26 AM	0 responses 79 views 0 reactions	Last Post by SEQadmin2 07-13-2026, 10:26 AM

Unconfigured Ad

Help with htseq -count read counts

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News