Header Leaderboard Ad
Collapse
HTSeq: A Python framework to work with high-throughput sequencing data
Collapse
Announcement
Collapse
No announcement yet.
X
-
Plans, yes -- but I'm so overwhelmed with other things that it might take a while till I get to that. Sorry.
-
Hi Simon,
We have been encountering an error with htseq-count (v. 0.6.1p1) on alignment files that have SAM v.1.4 tags.
The specific error is
Code:Unknown CIGAR code 'X' encountered
Are there plans to add support for SAM v.1.4 tags to htseq-count? For now we have been working around this by generating SAM v.1.3 tags.
Thanks.
Leave a comment:
-
From the looks of it, the read locations are zero-based and open-ended on the right, so don't include the "end" location in the list of base locations. For an end location of -2, that's a bit more concerning, otherwise it's just business as usual for how these things are done.
Leave a comment:
-
I don't know where to report bugs so I posted here.
I think the start_d and end_d feature of GenomicIntervals have bugs.
With a SAM file below as sample.sam:
read1 0 chr 1 40 7M * 0 0 ATGGCGT AAAAAAA
read2 16 chr 1 40 7M * 0 0 ATGGCGT AAAAAAA
and:
>>> read1,read2 = list(itertools.islice(HTSeq.SAM_Reader('sample.sam'),2))
>>> read1
<SAM_Alignment object: Read 'read' aligned to chr:[0,7)/+>
>>> read2
<SAM_Alignment object: Read 'read2' aligned to chr:[0,7)/->
>>> read1.iv.start,read1.iv.end,read1.iv.start_d,read1.iv.end_d
(0, 7, 0, 7)
>>> read2.iv.start,read2.iv.end,read2.iv.start_d,read2.iv.end_d
(0, 7, 6, -1)
the end_d of read2 ended with a negative coordinate! This behavior is mentioned in document, but I think it is a bug rather than a feature.
Leave a comment:
-
i just got a result from ht-seq. It showed that my interesting gene has 7 counts in the alignment. However, from the view of IGV, i could easily identify much more counts than 7 on this gene. My alignment is from STAR, and i used more stringent parameters to control the multiple alignment, which means there should not be any multiple aligned reads in the output. I am really confused about this.
Any suggestion?Attached Files
Leave a comment:
-
Hi,
I'm having trouble installing HTSeq.
I pretty much followed the instructions, but when I try to run it, I get the following error:
.local/lib/python2.7/site-packages/HTSeq-0.6.1-py2.7-linux-x86_64.egg/HTSeq/_HTSeq.so: undefined symbol: PyUnicodeUCS2_DecodeUTF8
Any help is appreciated!
Leave a comment:
-
Originally posted by superpyrin View PostMultiprocessing can work only with objects that can be pickled. SAM_Alignment cannot be pickled. I suspect this may be the reason it does not work.
Objects must implement __getstate__ and __setstate__ functions in order to be pickled/unpickled. Would it be difficult to implement these functions?
All one needs to do is take all the slots defined for the class in _HTSeq.SAM_Alignment, pack them into a tuple for __getstate__ and write them back for __setstate__.
Leave a comment:
-
Originally posted by Simon Anders View PostYes, the C code is machine generated, but if you look at the pyx files which it is generated from, it should be clearer. Have a look at:
http://www-huber.embl.de/users/ander...c/contrib.html
Objects must implement __getstate__ and __setstate__ functions in order to be pickled/unpickled. Would it be difficult to implement these functions?
Leave a comment:
-
Yes, the C code is machine generated, but if you look at the pyx files which it is generated from, it should be clearer. Have a look at:
Leave a comment:
-
Originally posted by Simon Anders View PostYou asked me this before, but Ididn't reply, right? Sorry about that, I was a bit overwhelmed with mails.
The bad news is: I have no clue why it does not work; I have never worked with the multiprocessing package. But I agree that it would be nice if this worked.
Maybe somebody else here has some idea?
Leave a comment:
-
You asked me this before, but Ididn't reply, right? Sorry about that, I was a bit overwhelmed with mails.
The bad news is: I have no clue why it does not work; I have never worked with the multiprocessing package. But I agree that it would be nice if this worked.
Maybe somebody else here has some idea?
Leave a comment:
-
Running HTSeq in parallel
Hello,
I am trying to process mapped reads in parallel. However, when using pool of workers (multiprocessing package), I get following error:
Traceback (most recent call last):
File "testmp.py", line 15, in <module>
out = pool.map(repr, iter(sa), chunksize=1)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 228, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 531, in get
raise self._value
AttributeError: 'NoneType' object has no attribute 'name'
Running in serial fashion (using just built-in 'map' function) works fine.
Do you know what can be wrong here?
Thank you.
---
Here is a simple script to reproduce the error. I am using HTSeq ver 0.6.1, Python 2.7.3, 64bit ubuntu 12.04
import HTSeq
from multiprocessing import Pool
# this works for me
map(repr, sa)
# this does not work
pool = Pool(processes=1)
sa = HTSeq.SAM_Reader('test.sam')
out = pool.map(repr, sa, chunksize=1)
print list(out)
Leave a comment:
-
Yep, that output looks good.
samtools sort -n gives the same result as Unix sort -k 1,1 on the SAM file.
Leave a comment:
-
Originally posted by fanli View PostThis sorts your alignment by genomic position. You want to sort by read name:
thanks for your reply! Meanwhile I had done samtools sort -n on my .bam file, samtools view -h to create a .sam file and run HTSeq again.
I guess it worked, as the sterror file created was way way smaller, and contains this:
100000 sam line pairs processed.
200000 sam line pairs processed.
300000 sam line pairs processed.
500000 sam line pairs processed.
600000 sam line pairs processed.
700000 sam line pairs processed.
800000 sam line pairs processed.
900000 sam line pairs processed.
1000000 sam line pairs processed.
1100000 sam line pairs processed.
1200000 sam line pairs processed.
1300000 sam line pairs processed.
1400000 sam line pairs processed.
1500000 sam line pairs processed.
1600000 sam line pairs processed.
1700000 sam line pairs processed.
1800000 sam line pairs processed.
1900000 sam line pairs processed.
2000000 sam line pairs processed.
2100000 sam line pairs processed.
2200000 sam line pairs processed.
2300000 sam line pairs processed.
2500000 sam line pairs processed.
[...]
31936941 sam line pairs processed.
So I guess this is correct, right?
So I guess my question is: does it make any difference practically if you sort the .bam and then make it .sam or if you sort a .sam file with the command you suggested? I am too ignorant to appreciate the difference!
Thanks again!!!
Leave a comment:
Latest Articles
Collapse
-
Differential Expression and Data Visualization: Recommended Tools for Next-Level Sequencing Analysisby seqadmin
After covering QC and alignment tools in the first segment and variant analysis and genome assembly in the second segment, we’re wrapping up with a discussion about tools for differential gene expression analysis and data visualization. In this article, we include recommendations from the following experts: Dr. Mark Ziemann, Senior Lecturer in Biotechnology and Bioinformatics, Deakin University; Dr. Medhat Mahmoud Postdoctoral Research Fellow at Baylor College of Medicine;...-
Channel: Articles
05-23-2023, 12:26 PM -
-
by seqadmin
Continuing from our previous article, we share variant analysis and genome assembly tools recommended by our experts Dr. Medhat Mahmoud, Postdoctoral Research Fellow at Baylor College of Medicine, and Dr. Ming "Tommy" Tang, Director of Computational Biology at Immunitas and author of From Cell Line to Command Line.
Variant detection and analysis tools
Mahmoud classifies variant detection work into two main groups: short variants (<50...-
Channel: Articles
05-19-2023, 10:03 AM -
-
by seqadmin
With new tools and computational resources being released regularly, it can be hard to determine which are best suited for the analysis process and which older tools continue to be maintained. In an effort to assist the sequencing community, we interviewed three highly skilled bioinformaticians about their recommended tools for several important analysis applications.
Quality control and preprocessing tools
“Garbage in, garbage out” is a popular...-
Channel: Articles
05-16-2023, 10:11 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Exploring French-Canadian Ancestry: Insights into Migration, Settlement Patterns, and Genetic Structure
by seqadmin
Started by seqadmin, 05-26-2023, 09:22 AM
|
0 responses
8 views
0 likes
|
Last Post
by seqadmin
05-26-2023, 09:22 AM
|
||
Started by seqadmin, 05-24-2023, 09:49 AM
|
0 responses
9 views
0 likes
|
Last Post
by seqadmin
05-24-2023, 09:49 AM
|
||
Introducing ProtVar: A Web Tool for Contextualizing and Interpreting Human Missense Variation in Proteins
by seqadmin
Started by seqadmin, 05-23-2023, 07:14 AM
|
0 responses
28 views
0 likes
|
Last Post
by seqadmin
05-23-2023, 07:14 AM
|
||
Started by seqadmin, 05-18-2023, 11:36 AM
|
0 responses
113 views
0 likes
|
Last Post
by seqadmin
05-18-2023, 11:36 AM
|
Leave a comment: