Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • tcezard
    replied
    I believe I found a bug or I did not understand the documentation.
    maybe there is a better place for bug report but I couldn't find a mailing list of bug tracker
    My uderstanding of the Sequence trimming is ti looks for a match (allowing mismatches) between the leftpart of the read and the right part of the adapter. but it seems to fail finding a match as long as the adapter.

    Here are a few command that reproduce the issue:
    Code:
    >>> from HTSeq import Sequence
    >>> read=Sequence('ACACGTTCGATATCCCGTATGCAACGGACCCGGCAGGAAACCGGCTGTGGG')
    >>> adapter1=Sequence('ACACGT')
    >>> adapter2=Sequence('AACACGT')
    >>> print read.seq.startswith(adapter1.seq)
    True
    >>> print read.seq.startswith(adapter2.seq)
    False
    >>> print read.trim_left_end(adapter1)
    ACACGTTCGATATCCCGTATGCAACGGACCCGGCAGGAAACCGGCTGTGGG
    >>> print read.trim_left_end(adapter2)
    TCGATATCCCGTATGCAACGGACCCGGCAGGAAACCGGCTGTGGG
    Last edited by tcezard; 04-26-2010, 03:41 AM. Reason: add code output

    Leave a comment:


  • Melissa
    replied
    Cool. Thanks. Will be back with questions.

    Leave a comment:


  • Simon Anders
    replied
    Hi

    Originally posted by lvaruzza View Post
    Does this package supports reads in SOLiD Color Space?
    Yes and no. ;-)

    There are no specific facilities for color space yet, but HTSeq should nevertheless be helpful to work with SOLiD data. For example, you can use the FastaReader and FastqReader classes can also be used to read in color-space files, and the GFF_Reader class can deal with the GFF files output by the WTAP aligner.

    I don't have much experience with SOLiD data, but I'd be interested to collaborate with a SOLiD-using bioinformatician to fill in the gaps in order to make HTSeq really useful for colour-space data. I think not much is missing for this.

    Cheers
    Simon

    Leave a comment:


  • Simon Anders
    replied
    Hi Siva

    Originally posted by Siva View Post
    In HTSeq count is it ok to use Bowtie SAM output against the corresponding Cufflinks generated GTF file? I get an error message regarding the Cufflinks GTF file saying CUFF1.0 doesn't contain gene_id attribute.
    Thanks for the report, I've just investigated and corrected. The real issue is that in the cufflinks GTF file, the genes have no strand information, and hence, htseq-count has to be called with the '--stranded=no' information.

    Sorry for the misleading error message. I've just uploaded a fix, which will now display the more helpful message Feature CUFF.1 at chr1:[1047,1108)/. does not have strand information but you are running htseq-count in stranded mode. Use '--stranded=no'..

    Cheers
    Simon

    Leave a comment:


  • Siva
    replied
    Hi Simon
    In HTSeq count is it ok to use Bowtie SAM output against the corresponding Cufflinks generated GTF file? I get an error message regarding the Cufflinks GTF file saying CUFF1.0 doesn't contain gene_id attribute. I feel I must supply a standard GFF/GTF annotation file from the database. I am not sure a comprehensive GTF file exists for maize.

    Leave a comment:


  • lvaruzza
    replied
    Does this package supports reads in SOLiD Color Space?

    Leave a comment:


  • spenthil
    replied
    Perfect!

    Side note - pip is an awesome easy_install replacement that works great with virtualenv. To use it to install: `pip install HTseq`

    Leave a comment:


  • Thomas Doktor
    replied
    The error I encountered was resolved by compiling the package from source, thanks for the help and the package Simon.

    Leave a comment:


  • dawe
    replied
    Originally posted by Simon Anders View Post
    Hi

    I would like to advertise the release of HTSeq, a Python framework to process and analyse high-throughput sequencing (HTS) data.
    Nice one! Trying it ASAP!

    d

    Leave a comment:


  • Thomas Doktor
    replied
    I'm trying to run htseq-qa, but get the following error:
    Traceback (most recent call last):
    File "/usr/local/bin/htseq-qa", line 5, in <module>
    pkg_resources.run_script('HTSeq==0.4.2-p1', 'htseq-qa')
    File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 489, in run_script
    if dist.key not in keys2:
    File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 1207, in run_script

    File "/usr/local/lib/python2.6/dist-packages/HTSeq-0.4.2_p1-py2.6-linux-x86_64.egg/EGG-INFO/scripts/htseq-qa", line 3, in <module>
    import HTSeq.scripts.qa
    File "/usr/local/lib/python2.6/dist-packages/HTSeq-0.4.2_p1-py2.6-linux-x86_64.egg/HTSeq/__init__.py", line 8, in <module>
    from _HTSeq import *
    ImportError: /usr/local/lib/python2.6/dist-packages/HTSeq-0.4.2_p1-py2.6-linux-x86_64.egg/HTSeq/_HTSeq.so: undefined symbol: PyUnicodeUCS2_DecodeUTF8
    I've installed the dependencies and I'm running Python 2.6.4 - any idea what I could be doing wrong?

    Leave a comment:


  • nickloman
    replied
    Nice one, looks interesting! I like the look of the quality plots.

    Do you have any plans to try and integrate with the Biopython project? There's some obvious overlap that I can see.

    Leave a comment:


  • HTSeq: A Python framework to work with high-throughput sequencing data

    Hi

    I would like to advertise the release of HTSeq, a Python framework to process and analyse high-throughput sequencing (HTS) data.

    With the many short-read aligners available now, HTS analysis seems simple. In practice, however, data often needs to be converted, tweaked, filtered, or otherwise pre-processed before they can be given to the aligner, and the results require similar processing to do the statistical analysis one needs.

    HTSeq is meant to render such tasks easy and convenient, and so act as a "glue" between aligners and other existing tools.

    Some examples of typical use cases for HTSeq:
    • Quality assessment of reads: Check the dependence of the proportions of base calls and quality scores on the position in the reads, stratify by alignment status.
    • Counting: How many reads fall onto each exon, or each gene? For such tasks, you may want to design and implement rules on how to deal with overlapping features or ambiguous assignments.
    • Calculating coverage: HTSeq helps you not only to produce a Wiggle file for visualization in a genome browser, but also to do customized statistics on this.
    • Multiple alignments: Many aligners can output multiple alignments for each read, but what to do with this? HTSeq makes it easy to implement post-processing to choose the right alignment according to your criteria.
    • Adapter trimming: In miRNA-Seq, you often sequence into the adapter at the other end and need to cut this off before aligning. In multiplexed sequencing, you may need to cut off and sort by the mutiplex tag.


    Have a look and give it a try: http://www-huber.embl.de/users/anders/HTSeq/

    To use HTSeq you only need a basic understanding of Python, as can be obtained by reading the first few chapters of a Python book.
    For users without programming knowledge, stand-alone scripts for common tasks are provided: htseq-count to count the overlap of reads with features (such as exons), htseq-qa to get a quick overview of the quality of your sequencing run, and htseq-bedgraph (coming soon) to convert an alignment file into a Bedgraph Wiggle file for visualization with a genome browser.

    For programmers, HTSeq has been designed to keep thing simple:
    • All classes have extensive reference documentation, and a tutorial demonstrates their use.
    • All supported file formats (Fasta, Fastq, SAM, SolexaPipeline files, GFF, GTF, etc.) can be read in a loop, providing an object describing one record at a time to the loop body. This object describes the data in a convenient and consistent way.
    • The 'GenomicArray' class is the Swiss army knife of HTSeq. It is a container that can efficiently store anything that has a position on the genome: integer number to represent coverage, objects with feature data to represent exons, sets of objects to handle overlapping features, etc.


    Please let me know what you think of it.

    Cheers
    Simon

Latest Articles

Collapse

  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM
  • seqadmin
    Techniques and Challenges in Conservation Genomics
    by seqadmin



    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

    Avian Conservation
    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
    03-08-2024, 10:41 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 03-27-2024, 06:37 PM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-27-2024, 06:07 PM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-22-2024, 10:03 AM
0 responses
53 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-21-2024, 07:32 AM
0 responses
69 views
0 likes
Last Post seqadmin  
Working...
X