Hello
I am doing a transcriptome analysis on Pseudomonas putida and I have been trying to do a read count using Htseq -count. The program always give an error. I have tried different genome references (fna) and annotation files (gtf ang gff) but it does not work.
The mapping works fine it seems but the read count doesnt.
I have run the script:
htseq-count -i gene_id -t CDS test_v2.sam ncbi_dataset_v2/ncbi_dataset/data/GCA_000007565.2/genomic.gtf > test_v2.count
I got the error:
built-in function delete__Pair_long_obj> returned a result with an error set
[Exception type: SystemError, raised in _HTSeq_internal.py:43]
If I include other options of htseq-count like "-s no" or "-m intersection-nonempty" Is the same error.
If I run only:
htseq-count test_v2.sam ncbi_dataset_v2/ncbi_dataset/data/GCA_000007565.2/genomic.gtf > test_v2.count
The same error.
So it must be the files compatibility? is there something wrong with the Pseudomonas putida gtf file?
This is an example of a couple of lines in the gtf file:
AE015451.2 Genbank gene 147 1019 . - . gene_id "PP_0001"; transcript_id ""; gbkey "Gene"; gene "parB"; gene_biotype "protein_coding"; locus_tag "PP_0001";
AE015451.2 Genbank CDS 150 1019 . - 0 gene_id "PP_0001"; transcript_id "unassigned_transcript_1"; gbkey "CDS"; gene "parB"; locus_tag "PP_0001"; note "Evidence 2a : Function of homologous gene experimentally demonstrated in an other organism; PubMedId : 16306995, 17462018, 17729285; Product type pf : putative factor; Biologicalprocesses : Replicate"; product "probable chromosome-partitioning protein"; protein_id "AAN65635.1"; transl_table "11"; exon_number "1";
This is how the fna file (I used for doing the mapping and creating the sam file) looks like:
>AE015451.2 Pseudomonas putida KT2440 complete genome
AACTGCTCCTCGGAAGTCGACCAACAAGTCAGCTATGACTTGGCATAATTTGTGCCGACAAAATGCGCGCAGAGTATAGG
GGTGGATTAACCCCTATTCAACTCTTCGGTAGTGATTTCCGACTTCACGCTACAACAGGAATTGTTTCAGCGGATGTGAG
CGAGCACGCCTTGCAACTCGTCAAGCGAGTTGTAGCGAATAACCAACTGGCCTTTGCCCTTGTTGCCATGACGGATCTGC
ACGGCCGAGCCCAGGCGCTCTGCGAGCCGCTGTTCAAGGCGTGCGATATCCGGATCAGGTTTGCTCGGTTCGACCGGATC
This is how the sam file looks like:
@HD VN:1.0 SO:unsorted
@SQ SN:AE015451.2 LN:6181873
@PG ID:hisat2 PN:hisat2 VN:2.1.0 CL:"/var/bin/hisat2-2.1.0/hisat2-align-s --wrapper basic-0 -x genome_pseudomonas_putida_v2 -S test_v2.sam --summary-file test_v2.sam.summary -1 ../NG-33893_Exp2_PP_25h_b_lib709417_10278_4_1_val_1.fq -2 ../NG-33893_Exp2_PP_25h_b_lib709417_10278_4_2_val_2.fq"
A00604:565:HFWKVDSX7:4:1101:4689:1000 83 AE015451.2 179677 1 150M1S = 174221 -5608 AGGAGGTTGGCTTAGAAGCAGCCACCCTTTAAAGAAAGCGTAATAGCTCACTAGTCGAGTCGGCCTGCGCGGAAGATGTAACGGGGCTCAAACCATGCACCGAAGCTACGGGTATCATCTTATGATGATGCGGTAGAGGAGCGTTCTGTAN FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF# AS:i:-1 ZS:i:-1 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:150 YS:i:-1 YT:Z:CP NH:i:5
What I am missing in my code ? is the id of the genomic feature incorrect?? Are the sam and gtf incompatible?
Thank you very much for your help.
I am doing a transcriptome analysis on Pseudomonas putida and I have been trying to do a read count using Htseq -count. The program always give an error. I have tried different genome references (fna) and annotation files (gtf ang gff) but it does not work.
The mapping works fine it seems but the read count doesnt.
I have run the script:
htseq-count -i gene_id -t CDS test_v2.sam ncbi_dataset_v2/ncbi_dataset/data/GCA_000007565.2/genomic.gtf > test_v2.count
I got the error:
built-in function delete__Pair_long_obj> returned a result with an error set
[Exception type: SystemError, raised in _HTSeq_internal.py:43]
If I include other options of htseq-count like "-s no" or "-m intersection-nonempty" Is the same error.
If I run only:
htseq-count test_v2.sam ncbi_dataset_v2/ncbi_dataset/data/GCA_000007565.2/genomic.gtf > test_v2.count
The same error.
So it must be the files compatibility? is there something wrong with the Pseudomonas putida gtf file?
This is an example of a couple of lines in the gtf file:
AE015451.2 Genbank gene 147 1019 . - . gene_id "PP_0001"; transcript_id ""; gbkey "Gene"; gene "parB"; gene_biotype "protein_coding"; locus_tag "PP_0001";
AE015451.2 Genbank CDS 150 1019 . - 0 gene_id "PP_0001"; transcript_id "unassigned_transcript_1"; gbkey "CDS"; gene "parB"; locus_tag "PP_0001"; note "Evidence 2a : Function of homologous gene experimentally demonstrated in an other organism; PubMedId : 16306995, 17462018, 17729285; Product type pf : putative factor; Biologicalprocesses : Replicate"; product "probable chromosome-partitioning protein"; protein_id "AAN65635.1"; transl_table "11"; exon_number "1";
This is how the fna file (I used for doing the mapping and creating the sam file) looks like:
>AE015451.2 Pseudomonas putida KT2440 complete genome
AACTGCTCCTCGGAAGTCGACCAACAAGTCAGCTATGACTTGGCATAATTTGTGCCGACAAAATGCGCGCAGAGTATAGG
GGTGGATTAACCCCTATTCAACTCTTCGGTAGTGATTTCCGACTTCACGCTACAACAGGAATTGTTTCAGCGGATGTGAG
CGAGCACGCCTTGCAACTCGTCAAGCGAGTTGTAGCGAATAACCAACTGGCCTTTGCCCTTGTTGCCATGACGGATCTGC
ACGGCCGAGCCCAGGCGCTCTGCGAGCCGCTGTTCAAGGCGTGCGATATCCGGATCAGGTTTGCTCGGTTCGACCGGATC
This is how the sam file looks like:
@HD VN:1.0 SO:unsorted
@SQ SN:AE015451.2 LN:6181873
@PG ID:hisat2 PN:hisat2 VN:2.1.0 CL:"/var/bin/hisat2-2.1.0/hisat2-align-s --wrapper basic-0 -x genome_pseudomonas_putida_v2 -S test_v2.sam --summary-file test_v2.sam.summary -1 ../NG-33893_Exp2_PP_25h_b_lib709417_10278_4_1_val_1.fq -2 ../NG-33893_Exp2_PP_25h_b_lib709417_10278_4_2_val_2.fq"
A00604:565:HFWKVDSX7:4:1101:4689:1000 83 AE015451.2 179677 1 150M1S = 174221 -5608 AGGAGGTTGGCTTAGAAGCAGCCACCCTTTAAAGAAAGCGTAATAGCTCACTAGTCGAGTCGGCCTGCGCGGAAGATGTAACGGGGCTCAAACCATGCACCGAAGCTACGGGTATCATCTTATGATGATGCGGTAGAGGAGCGTTCTGTAN FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF# AS:i:-1 ZS:i:-1 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:150 YS:i:-1 YT:Z:CP NH:i:5
What I am missing in my code ? is the id of the genomic feature incorrect?? Are the sam and gtf incompatible?
Thank you very much for your help.

Comment