Hi,
I'm having the same issue with one of my SAM files when using htseq-count.
For some reason, one of my SAM files raises the error:
Error occured in line 38586430 of file RNA7_sorted.sam.
Error: ("invalid literal for int() with base 10: ''", 'line 38586430 of file RNA7_sorted.sam')
[Exception type: ValueError, raised in _HTSeq.pyx:1166]
I am using HTSeq-0.5.3p9
Can you help please, thanks, alig
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
new cigar operator
--Hi,
i think htseq-count doesn't understand the new CIGAR operators X and =
I have this error when reading a sam file:
Error occured when reading first line of sam file.
Error: ("Unknown CIGAR code 'X' encountered.", 'line 29 of file AB4_AR013.sam'
[Exception type: ValueError, raised in _HTSeq.pyx:1163]
Laurent --
Leave a comment:
-
Hi Simon,
I have a question regarding my data and numbers of counts resulting from htseq-count.
I have ~45m SE 80bp Illumina reads aligning to reference. I then remove duplicates (using Picard MarkDuplicates) ending up with ~25m reads. When I run htseq-count on the original aligned SAM and the duplicate-removed SAM I get ~23m, 8.8m counts.
The proportion of 'retained' reads (following duplicate removal) is ~0.53, whereas the proportion of 'retained' counts is ~0.39. I see this as quite a large discrepancy. Another way of looking at this is sum of counts over total reads in original and duplicate-removed. This gives the same proportions, ~0.5 in original SAM, ~0.37 in duplicate-removed.
I cannot see why the duplicate-removed SAM should give less counts than the original: if anything I would think it would give more because the duplicates would only give a single count (for highest quality if I understand htseq-count correctly). Is there an obvious answer or have you seen this before? I am using the 'union' mode with same reference as I aligned to, non-stranded data, all other parameters are default. I looked at the SAM flag and counted 0, 16 which give the same ratios for aligned vs. duplicate-removed reads. I would be interested to know your thoughts.
Thanks,
Bruce.
Leave a comment:
-
Originally posted by crazyhottommy View PostIt looks like the HTSeq_example_data.tgz link is broken...can you please fix it?
Thanks for reporting it.
Leave a comment:
-
Hi Simon,
It looks like the HTSeq_example_data.tgz link is broken...can you please fix it?
Thanks!
Tommy
Leave a comment:
-
tsspos
the lines of tsspos are as follows:
<GenomicPosition object '2':49762399, strand '-'>, <GenomicPosition object '2':26280379, strand '-'>, <GenomicPosition object '6':42236683, strand '+'>, <GenomicPosition object '6':47528008, strand '-'>, <GenomicPosition object '4':154836045, strand '+'>, <GenomicPosition object '2':136717343, strand '+'>, <GenomicPosition object '15':73649839, strand '-'>, <GenomicPosition object '7':148651406, strand '-'>, <GenomicPosition object '3':29912426, strand '-'>, <GenomicPosition object '15':91602775, strand '+'>, <GenomicPosition object '7':133709879, strand '-'>, <GenomicPosition object '4':101516627, strand '-'>, <GenomicPosition object '7':16456750, strand '-'>, <GenomicPosition object '3':90262860, strand '-'>,
however, i am getting plot of wincvg showing all values are not zero.Attached FilesLast edited by sumanex; 02-02-2013, 11:24 PM.
Leave a comment:
-
Originally posted by sumanex View Post>>> tsspos = set()
>>> for feature in gtffile:
... if feature.type == "exon" and feature.attr["exon_number"] == "1":
... tsspos.add( feature.iv.start_d_as_pos )
...
>>> p = HTSeq.GenomicPosition( "chr1", 133363801, "+" )
>>> p
<GenomicPosition object 'chr1':133363801, strand '+'>
>>> p in tsspos
False
>>> list( coverage[window] )
>>> import numpy
...
>>> for p in tsspos:
... window = HTSeq.GenomicInterval( p.chrom, p.pos - halfwinwidth, p.pos + halfwinwidth, "." )
... wincvg = numpy.fromiter( coverage[window], dtype='i', count=2*halfwinwidth )
... if p.strand == "+":
... profile += wincvg
... else:
... profile += wincvg[::-1]
...
Traceback (most recent call last):
...
IndexError: start too small
Leave a comment:
-
HtSeq TSS plots
Hi,
I am following the TSS plots code in HTSeq in ubunut 12.04 but getting error:
1) in step p in tsspos answer is false. but getting a wincvg plot in the matplot window. the gtf file is for mouse release64 ensembl.
Python 2.7.3 (default, Aug 1 2012, 05:16:07)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import HTSeq
>>> bamfile = HTSeq.BAM_Reader( "galaxy58.bam" )
>>> bamfile
<HTSeq.BAM_Reader object at 0xb721afac>
>>> gtffile = HTSeq.GFF_Reader( "Mus_musculus.NCBIM37.64.gtf" )
>>> coverage = HTSeq.GenomicArray( "auto", stranded=False, typecode="i" )
>>> coverage
<HTSeq._HTSeq.GenomicArray object at 0x9065b0c>
>>> for almnt in bamfile:
... if almnt.aligned:
... coverage[ almnt.iv ] += 1
...
>>> import itertools
>>> for feature in itertools.islice( gtffile, 100):
... if feature.type == "exon" and feature.attr["exon_number"] == "1":
... print feature.attr["gene_id"], feature.attr["transcript_id"], feature.iv.start_d_as_pos
...
ENSMUSG00000000702 ENSMUST00000105216 NT_166433:11954/+
ENSMUSG00000078423 ENSMUST00000105217 NT_166433:28586/+
ENSMUSG00000078424 ENSMUST00000105218 NT_166433:47744/+
ENSMUSG00000078424 ENSMUST00000105219 NT_166433:47927/+
ENSMUSG00000071964 ENSMUST00000096694 NT_166402:12542/+
ENSMUSG00000091539 ENSMUST00000165255 18:3123464/-
ENSMUSG00000063889 ENSMUST00000151311 18:3327588/-
>>> tsspos = set()
>>> for feature in gtffile:
... if feature.type == "exon" and feature.attr["exon_number"] == "1":
... tsspos.add( feature.iv.start_d_as_pos )
...
>>>
>>> p = HTSeq.GenomicPosition( "chr1", 133363801, "+" )
>>> p
<GenomicPosition object 'chr1':133363801, strand '+'>
>>> p in tsspos
False
>>> halfwinwidth = 3000
>>> window = HTSeq.GenomicInterval( p.chrom, p.pos - halfwinwidth, p.pos + halfwinwidth, "." )
>>> window
<GenomicInterval object 'chr1', [133360801,133366801), strand '.'>
>>> list( coverage[window] )
>>> import numpy
>>> wincvg = numpy.fromiter( coverage[window], dtype='i', count=2*halfwinwidth )
>>> wincvg
array([0, 0, 0, ..., 0, 0, 0])
>>> from matplotlib import pyplot
>>> pyplot.plot( wincvg )
[<matplotlib.lines.Line2D object at 0x10c8122c>]
>>> pyplot.show()
and 2) Index error start too small
>>> profile = numpy.zeros( 2*halfwinwidth, dtype='i' )
>>> profile
array([0, 0, 0, ..., 0, 0, 0])
^
>>> for p in tsspos:
... window = HTSeq.GenomicInterval( p.chrom, p.pos - halfwinwidth, p.pos + halfwinwidth, "." )
... wincvg = numpy.fromiter( coverage[window], dtype='i', count=2*halfwinwidth )
... if p.strand == "+":
... profile += wincvg
... else:
... profile += wincvg[::-1]
...
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "_HTSeq.pyx", line 520, in HTSeq._HTSeq.GenomicArray.__getitem__ (src/_HTSeq.c:10593)
File "_HTSeq.pyx", line 370, in HTSeq._HTSeq.ChromVector.__getitem__ (src/_HTSeq.c:7538)
IndexError: start too small
any help is appreciated
suman
Leave a comment:
-
matplot coverage
Hi,
I had to change the coordinates to chr1, 133363801,134605170
of cvg vector to get a decent plot.
thanks
suman
Leave a comment:
-
Originally posted by sumanexHowever, the following command is not showing any coverage only the coordinates and a blue horizotal line at 0.
>>> import matplotlib
>>> from matplotlib import pyplot
>>> pyplot.plot( list( cvg[ HTSeq.GenomicInterval( "chr2", 2000000, 3000000, "+" ) ] ) )
[<matplotlib.lines.Line2D object at 0x104751ac>]
>>> pyplot.show()
Do have to provide a gtf file also for showing the coverage.
http://www-huber.embl.de/users/ander...-full-coverage
Leave a comment:
-
End location of read in HTSeq-count
Hi Simon,
I am a user of the HTSeq-count.
It's a wonderful tool, and I really love it.
Nevertheless, I have a small question about it.
I am wondering how dose HTSeq-count define the end location (right site end) of a read from RNA-seq sam file?
I am not very familiar with Pathon, so I cannot well understand the source code.
For single-end strand-specific RNA-seq sam file, is the end location defined by the CIGAR?
That is, End= Start + ( No. of Ms) + (No. of Ns) + (No. of Ds) ?
For paired-end strand-specific RNA-seq sam file, how does HTSeq-count handle this?
Is the start location (left site) defined as the smaller one of the two start loci within a pair?
And the end location (right site end) defined as the larger one of the two end sites within a pair?
Many thanks and best regards,
Jerry
Leave a comment:
-
You are supposed to pass first the SAM file, then the GFF file.
Leave a comment:
-
Hello Dr. Simon,
I tried to use HTseq_count function to find no of reads.
Data of D. melanogaster; ref genome from iGenomes (Ensembl).
Command typed >>
htseq-count -m intersection-nonempty -s no -t exon -i gene_id -o samout -q /home/ajay/bin2/binp/dmel-all-r5.48.gff /home/ajay/bin2/binp/C2_R1_thout/outc2r1.sam
Error occured in line 1 of file /home/ajay/bin2/binp/C2_R1_thout/outc2r1.sam.
Error: need more than 3 values to unpack
[Exception type: ValueError, raised in __init__.py:214]
Please suggest me where I am going wrong.
Ajay
Leave a comment:
-
Clang errors when trying to build HTseq
Hi All
I have a mac (OSx10.8.2) and get the following error when trying to build HT seq (0.5.3p9). Looks like a gcc issue...Any suggestions please?
running build
running build_py
running build_ext
building 'HTSeq._StepVector' extension
clang -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch x86_64 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c src/StepVector_wrap.cxx -o build/temp.macosx-10.8-intel-2.7/src/StepVector_wrap.o -w
clang++ -bundle -undefined dynamic_lookup -Wl,-F. -arch i386 -arch x86_64 build/temp.macosx-10.8-intel-2.7/src/StepVector_wrap.o src/step_vector.h -o build/lib.macosx-10.8-intel-2.7/HTSeq/_StepVector.so
clang: warning: treating 'c-header' input as 'c++-header' when in C++ mode, this behavior is deprecated
clang: error: cannot use 'precompiled-header' output with multiple -arch options
clang: error: cannot specify -o when generating multiple output files
error: command 'clang++' failed with exit status 1
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.
Long-Read Sequencing
Long-read sequencing has seen remarkable advancements,...-
Channel: Articles
12-02-2024, 01:49 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 07:45 AM
|
0 responses
9 views
0 likes
|
Last Post
by seqadmin
Today, 07:45 AM
|
||
Started by seqadmin, Yesterday, 07:59 AM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
Yesterday, 07:59 AM
|
||
Newborn Genomic Screening Shows Promise in Reducing Infant Mortality and Hospitalization
by seqadmin
Started by seqadmin, 12-09-2024, 08:22 AM
|
0 responses
9 views
0 likes
|
Last Post
by seqadmin
12-09-2024, 08:22 AM
|
||
Started by seqadmin, 12-02-2024, 09:29 AM
|
0 responses
175 views
0 likes
|
Last Post
by seqadmin
12-02-2024, 09:29 AM
|
Leave a comment: