HTSeq: A Python framework to work with high-throughput sequencing data

Simon Anders replied

07-16-2010, 03:24 AM
Originally posted by townway View Post

There is not binary version available, would you give a link to download ?

For which operating system?
Leave a comment:
townway replied

07-15-2010, 07:33 AM
I check the website of HTSeq

HTSeq

http://pypi.python.org/pypi/HTSeq

A framework to process and analyze data from high-throughput sequencing (HTS) assays

There is not binary version available, would you give a link to download ?

Thank you

Wei
Leave a comment:
Simon Anders replied

07-14-2010, 08:43 AM
Hi

Originally posted by rkusko View Post

I've used HTSeq-counts to extract raw counts from Cufflinks output, and it worked correctly for 40/42 of my samples. For some reason, two samples raised the error

Could you maybe send me (be e-mail to anders(at)embl(dot)de) some excerpts from the data that produce the error? Then I could investigate.

Cheers
Simon
Leave a comment:
rkusko replied

07-13-2010, 12:21 PM
Hi Simon,
I've used HTSeq-counts to extract raw counts from Cufflinks output, and it worked correctly for 40/42 of my samples. For some reason, two samples raised the error:

Error: 'generator' object has no attribute 'get_line_number_string'
[Exception type: AttributeError, raised in count.py:126]

I haven't been able to find anything wrong with the data in those two samples. Any ideas? Thank you!
Leave a comment:
yh253 replied

06-05-2010, 06:11 AM
Hi Simon,

Thanks very much for your reply! Quoted from BWA:"Internally BWA concatenates all reference sequences into one long sequence. A read may be mapped to the junction of two adjacent reference sequences. In this case, BWA will flag the read as unmapped, but you will see position, CIGAR and all the tags". This has happened to me that some reads mapped to the end of chrY and the beginning of chrM by using BWA. However, I didn't see these mappings by using Bowtie. This might arise from mapping strategy adopted by BWA, which converts "N" in the reference sequence into random bases. During mapping, BWA must have converted "N"s at the end of ChrY into random bases which happened to match the beginning bases of some reads.

Cheers,
Yuan

Last edited by yh253; 06-05-2010, 06:17 AM.
Leave a comment:
Simon Anders replied

06-05-2010, 05:53 AM
Hi Abhijit

Originally posted by gen2prot View Post

I made a Drosophila Gene Index file and ran all the solexa reads against this. I have a 5.4 GB sam file. Can I use the Htseq-qa program on this sam file? or will this crash since the subject to which the alignment was done are genes and not chromosomes.

Just try it out. But it should work. htseq-count does not care what you have aligned against.

Secondly how can I get the read count in my case. The GTF file that I have has the start and end coordinates wrt to the genome. Therefore this won't work with the sam output. Any suggestions?

So, basically, wherever you usually have a chromosome name (i.e., in the third column of your SAM file) you now have a gene name, and you want to count how many reads fall onto each gene? Why don't you just cut out this third column and count how often each gene appears there?

Of course, your approach has other dangers. For example, how does your aligner handle multiple transcripts? If the same exon appears in several transcripts, the aligner might think it is a repeat.

Cheers
Simon
Leave a comment:
Simon Anders replied

06-05-2010, 05:49 AM
Hi Yuan

Originally posted by yh253 View Post

I used BWA to do the alignment. If a read mapped to the chromosomal junction (end of one chromosome and beginning of another chromosome), BWA will produce a Flag = 4, but you can still see the other tags: "chr", "CIGAR","MAPQ", etc. in the sam file. This causes a problem for htseq-qa in that it can't process an alignment with flag=4, but mapping position does "NOT" equal "*". One option for me is to pre-process my sam file by excluding all such alignments before giving to htseq-qa, but I am wondering is it possible to turn off this requirement in htseq-qa so that I don't have to change my sam file in advance?Thank you very much for your advice in advance!

I've changed the SAM parser such that it now only writes a warning rather than stopping with an error if this case is encountered. Please try again with version 0.4.4p2.

However, I don't quite understand how such SAM lines come about. What is a chromosomal junction? How could a read get mapped partly to one and partly to another chromosome?

Cheers
Simon
Leave a comment:
gen2prot replied

06-02-2010, 06:12 PM
Hi Simon,

I made a Drosophila Gene Index file and ran all the solexa reads against this. I have a 5.4 GB sam file. Can I use the Htseq-qa program on this sam file? or will this crash since the subject to which the alignment was done are genes and not chromosomes. Secondly how can I get the read count in my case. The GTF file that I have has the start and end coordinates wrt to the genome. Therefore this won't work with the sam output. Any suggestions?

thanks
Abhijit
Leave a comment:
yh253 replied

05-31-2010, 05:11 AM
Hi Simon,

I used BWA to do the alignment. If a read mapped to the chromosomal junction (end of one chromosome and beginning of another chromosome), BWA will produce a Flag = 4, but you can still see the other tags: "chr", "CIGAR","MAPQ", etc. in the sam file. This causes a problem for htseq-qa in that it can't process an alignment with flag=4, but mapping position does "NOT" equal "*". One option for me is to pre-process my sam file by excluding all such alignments before giving to htseq-qa, but I am wondering is it possible to turn off this requirement in htseq-qa so that I don't have to change my sam file in advance?Thank you very much for your advice in advance!

Yuan
Leave a comment:
gen2prot replied

05-27-2010, 02:25 PM
Got it to work... Coffee helps.
Leave a comment:
Simon Anders replied

05-27-2010, 01:54 PM
Hi Abhijit

Originally posted by gen2prot View Post

I have a mac 10.6, intel and python 2.6. Matplotlib is unavailable for 10.6, py 2.6.... as far as I know.

Yours seems to be a pretty standard configuration, so it would be apity if matplotlib was not available. If you have Xcode installed, building from source might work. Just try:

Code:

wget http://sourceforge.net/projects/matplotlib/files/matplotlib/matplotlib-0.99.1/matplotlib-0.99.1.2.tar.gz/download tar -xzvf matplotlib-0.99.1.2.tar.gz cd matplotlib-0.99.1.1/ python setup.py build sudo python setup.py install

Simon
Leave a comment:
gen2prot replied

05-27-2010, 12:56 PM
Hi Simon,

I have a mac 10.6, intel and python 2.6. Matplotlib is unavailable for 10.6, py 2.6.... as far as I know. I give up. The website (matplotlib) asks me to tinker around with System files which I don't want to tamper with. Thanks anyway for your help.

Abhijit
Leave a comment:
gen2prot replied

05-25-2010, 07:16 AM
Hi Simon,

I am having difficulty in running the htseq-qa script. I think I have installed HTSeq correctly since I get no error message for "import HTSeq" command. Then on giving the "htseq-qa -t sam accepted.sam" command, I get a Syntax error. I have given the following export command in Unix

export PYTHONPATH=$PYTHONPATH:/Library/Python/2.6/

Is this wrong? On giving the command "whereis python", I get /usr/local/python. I am confused.

Thank you
Abhijit
Leave a comment:
yh253 replied

05-19-2010, 07:41 AM
Hi Simon,

I am afraid not. Actually I was not aware of this issue until now.I will try to figure out how to convert the quality strings first then, and give a feedback on this later.Thanks again for your prompt reply!

Yuan
Leave a comment:
Simon Anders replied

05-19-2010, 07:31 AM
Hi Yuan

Originally posted by yh253 View Post

ValueError: Too large quality value encountered.

Usually, SAM files don't contain quality values exceeding 40. When you aligned your _sequence.txt file, did you convert the quality strings from Solexa to Sanger scale? If not, BWA probably might not have found the optimal alignments, and htseq-qa gets confused, too.

If you did, and the large quality values are legitimate, I'd be interested to see your SAM file.

Simon
Leave a comment:

Previous 1 8 9 10 11 12 13 template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
- Channel: Articles
Yesterday, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News