Seqanswers Leaderboard Ad

**nickloman** · 04-22-2010, 06:10 AM

Nice one, looks interesting! I like the look of the quality plots.

Do you have any plans to try and integrate with the Biopython project? There's some obvious overlap that I can see.

**Thomas Doktor** · 04-22-2010, 06:55 AM

I'm trying to run htseq-qa, but get the following error:

Traceback (most recent call last):
File "/usr/local/bin/htseq-qa", line 5, in <module>
pkg_resources.run_script('HTSeq==0.4.2-p1', 'htseq-qa')
File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 489, in run_script
if dist.key not in keys2:
File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 1207, in run_script

File "/usr/local/lib/python2.6/dist-packages/HTSeq-0.4.2_p1-py2.6-linux-x86_64.egg/EGG-INFO/scripts/htseq-qa", line 3, in <module>
import HTSeq.scripts.qa
File "/usr/local/lib/python2.6/dist-packages/HTSeq-0.4.2_p1-py2.6-linux-x86_64.egg/HTSeq/__init__.py", line 8, in <module>
from _HTSeq import *
ImportError: /usr/local/lib/python2.6/dist-packages/HTSeq-0.4.2_p1-py2.6-linux-x86_64.egg/HTSeq/_HTSeq.so: undefined symbol: PyUnicodeUCS2_DecodeUTF8

I've installed the dependencies and I'm running Python 2.6.4 - any idea what I could be doing wrong?

**dawe** · 04-22-2010, 07:00 AM

Originally posted by Simon Anders View Post

Hi

I would like to advertise the release of HTSeq, a Python framework to process and analyse high-throughput sequencing (HTS) data.

Nice one! Trying it ASAP!

d

**Thomas Doktor** · 04-22-2010, 07:16 AM

The error I encountered was resolved by compiling the package from source, thanks for the help and the package Simon.

**spenthil** · 04-22-2010, 09:31 AM

Perfect!

Side note - pip is an awesome easy_install replacement that works great with virtualenv. To use it to install: `pip install HTseq`

**lvaruzza** · 04-22-2010, 12:54 PM

Does this package supports reads in SOLiD Color Space?

**Siva** · 04-22-2010, 10:38 PM

Hi Simon
In HTSeq count is it ok to use Bowtie SAM output against the corresponding Cufflinks generated GTF file? I get an error message regarding the Cufflinks GTF file saying CUFF1.0 doesn't contain gene_id attribute. I feel I must supply a standard GFF/GTF annotation file from the database. I am not sure a comprehensive GTF file exists for maize.

**Simon Anders** · 04-23-2010, 01:31 AM

Hi Siva

Originally posted by Siva View Post

In HTSeq count is it ok to use Bowtie SAM output against the corresponding Cufflinks generated GTF file? I get an error message regarding the Cufflinks GTF file saying CUFF1.0 doesn't contain gene_id attribute.

Thanks for the report, I've just investigated and corrected. The real issue is that in the cufflinks GTF file, the genes have no strand information, and hence, htseq-count has to be called with the '--stranded=no' information.

Sorry for the misleading error message. I've just uploaded a fix, which will now display the more helpful message Feature CUFF.1 at chr1:[1047,1108)/. does not have strand information but you are running htseq-count in stranded mode. Use '--stranded=no'..

Cheers
Simon

**Simon Anders** · 04-23-2010, 01:37 AM

Hi

Originally posted by lvaruzza View Post

Does this package supports reads in SOLiD Color Space?

Yes and no. ;-)

There are no specific facilities for color space yet, but HTSeq should nevertheless be helpful to work with SOLiD data. For example, you can use the FastaReader and FastqReader classes can also be used to read in color-space files, and the GFF_Reader class can deal with the GFF files output by the WTAP aligner.

I don't have much experience with SOLiD data, but I'd be interested to collaborate with a SOLiD-using bioinformatician to fill in the gaps in order to make HTSeq really useful for colour-space data. I think not much is missing for this.

Cheers
Simon

**Melissa** · 04-25-2010, 06:59 PM

Cool. Thanks. Will be back with questions.

**tcezard** · 04-26-2010, 03:39 AM

I believe I found a bug or I did not understand the documentation.
maybe there is a better place for bug report but I couldn't find a mailing list of bug tracker
My uderstanding of the Sequence trimming is ti looks for a match (allowing mismatches) between the leftpart of the read and the right part of the adapter. but it seems to fail finding a match as long as the adapter.

Here are a few command that reproduce the issue:

Code:

>>> from HTSeq import Sequence
>>> read=Sequence('ACACGTTCGATATCCCGTATGCAACGGACCCGGCAGGAAACCGGCTGTGGG')
>>> adapter1=Sequence('ACACGT')
>>> adapter2=Sequence('AACACGT')
>>> print read.seq.startswith(adapter1.seq)
True
>>> print read.seq.startswith(adapter2.seq)
False
>>> print read.trim_left_end(adapter1)
ACACGTTCGATATCCCGTATGCAACGGACCCGGCAGGAAACCGGCTGTGGG
>>> print read.trim_left_end(adapter2)
TCGATATCCCGTATGCAACGGACCCGGCAGGAAACCGGCTGTGGG

**Simon Anders** · 04-26-2010, 04:07 AM

Hi tcezard

thanks for the bug report and the nice code example. I've found and fixed the bug and uploaded a new release path, version now 0.4.2-p3.

If you find further bugs, just send me an e-mail.

Cheers
Simon (anders at embl dot de )

**fennan** · 04-28-2010, 01:33 AM

Nice work. I will start using it. I'll report my experience!

**Thomas Doktor** · 04-29-2010, 09:30 AM

I'm trying to use htseq-count version 0.4.2-p3 on a sam file produced by TopHat and a hg19 Ensembl GTF file. I'm analysing the reads in non-stranded mode and looking for exons in the gene_id features. The script runs for a while and outputs several warnings about reads incorrectly flagged as proper pairs, but then exits with the following error:

Error: 'tuple' object has no attribute 'read'
[Exception type: AttributeError, raised in count.py:100]

Is this an error in my sam file and if so how can I identify the read in question?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 29 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 25 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

HTSeq: A Python framework to work with high-throughput sequencing data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News