Unconfigured Ad

**ssharma** · 09-29-2011, 06:11 AM

@joro

Here are the 3 lines from gff file.
Middle one is 2169

NC_003911.11 RefSeq gene 176231 176306 . + . locus_tag=SPO_tRNA-Gln-1;db_xref=GeneID:3196480
NC_003911.11 RefSeq exon 176231 176306 . + . gbkey=tRNA;locus_tag=SPO_tRNA-Gln-1;product=tRNA-Gln;db_xref=GeneID:3196480;exon_number=1
NC_003911.11 RefSeq gene 176312 176389 . + . locus_tag=SPO_tRNA-Pro-2;db_xref=GeneID:3196280

**Simon Anders** · 09-29-2011, 06:20 AM

ssharma:

This line does not have a 'gene_id' attribute, but neither do the others. Please note that htseq-counts expects by default a GTF file. Yours is not one. (See this post for details.) See also the htseq-count documentation, especially then options '-t' and '-i'.

**ssharma** · 09-29-2011, 10:45 AM

Thanks Simon,
So if i want to use gff file i need to mention the feature type to be used (3rd column) with -t option.

So this is what i did: (i want to use 'gene')

htseq-count -t gene CL1_forBWA_left_DSS3.sam NC_003911.gff
but now i am getting following error:

Error occured in line 6 of file NC_003911.gff.
Error: Feature NC_003911.11:gidA does not contain a 'gene_id' attribute
[Exception type: SystemExit, raised in count.py:55]

These are few lines from my gff:

##gff-version 3
#!gff-spec-version 1.14
#!source-version NCBI C++ formatter 0.2
##Type DNA NC_003911.11
NC_003911.11 RefSeq source 1 4109442 . + . organism=Ruegeria pomeroyi DSS-3;mol_type=genomic DNA;strain=DSS-3;db_xref=taxon:246200
NC_003911.11 RefSeq gene 1 1869 . + . ID=NC_003911.11:gidA;locus_tag=SPO0001;db_xref=GeneID:3194636
NC_003911.11 RefSeq CDS 1 1866 . + 0 ID=NC_003911.11:gidA:unknown_transcript_1;Parent=NC_003911.11:gidA;locus_tag=SPO0001;note=GidA%3B glucose-inhibited cell division protein A%3B involved in the 5-carboxymethylaminomethyl modification %28mnm%285%29s%282%29U%29 of the wobble uridine base in some tRNAs;transl_table=11;product=tRNA uridine 5-carboxymethylaminomethyl modification enzyme GidA;protein_id=YP_165274.1;db_xref=GI:56694929;db_xref=GeneID:3194636;exon_number=1

**Simon Anders** · 09-29-2011, 10:55 PM

Well, there is no attribute called "gene_id" in your GFF file, or is there? The attributes are the entries in the last column, and their name is the part before the "=" sign. Try '-i ID'.

**ssharma** · 09-30-2011, 08:56 AM

No, there is no gene_id.
I even tried -i ID option and nothing is working.
Is there any way i can convert gff file to gtf or can directly get GTF file for my Genome?

Thanks
Shalabh

**songyj** · 10-09-2011, 01:06 AM

Prequisites
To use HTSeq, you need at least version 2.5 of Python (Python 3 does not work yet),

Python 3 does not work yet....ooh no!!

**gringer** · 10-25-2011, 04:18 AM

For those who weren't already aware of it, HTSeq seems to be able to read BAM files now, using pysam:

http://www-huber.embl.de/users/anders/HTSeq/doc/tss.html

edit: on that note, is there any way I can get the Query name from a BAM file line using the BAM_Reader interface? I need this because it contains an experiment ID, and I would like to split my output based on that ID. My current way to get around it is to use pysam directly. Looking at the HTSeq source code showed me how to convert the output into something that could be used for coverage analysis:

Code:

bamFile = pysam.Samfile(bamFileName, "rb")
coverage = dict()
totalCount = 0
hitCount = 0
for read in bamFile:
    totalCount += 1
    if(not(read.is_unmapped)):
        experiment = read.qname[:read.qname.find(".")]
        hitCount += 1
        ## Taken from HTSeq source code
        chrom = bamFile.getrname(read.tid)
        strand = "-" if read.is_reverse else "+"
        iv = HTSeq.GenomicInterval( chrom, read.pos, read.aend, strand )
        if(not(experiment in coverage)):
            coverage[experiment] = (
                HTSeq.GenomicArray("auto", stranded = True, typecode = "i"))
        (coverage[experiment])[iv] += 1

For what it's worth, my BAM file contains query names like '<experiment>.<id>'. The reads were mapped to the yeast genome using tophat, and I'm now trying to extract out coverages for each gene / exon (including n bases up or downstream) on a per-experiment basis.

**gringer** · 02-28-2012, 08:05 AM

Originally posted by gringer View Post

For those who weren't already aware of it, HTSeq seems to be able to read BAM files now, using pysam

After discovering that pysam doesn't work so well using python 2.4 (which is installed on the Illumina server I use for analysis), I had another look at Simon's code for reading BAM files using SAMtools. The resulting method I came up with uses yield to process the samtools view output, which makes it just a little bit faster (seems to be about an order of magnitude faster than the previous samtools view hack, and possibly a teensy bit faster than pysam). There's no reading of the advanced SAM features (like split reads), and I took out the region-based search, but it might be good enough for a few other people:

Code:

def BAM_Reader( bamfile, samtools_exec="samtools" ):
    # derived from Simon Anders' code from
    # http://seqanswers.com/forums/showpost.php?p=22114&postcount=46

   """Get Reader object for part of a BAM file

   bamfile -- a string, either a file name or an URL to a BAM file
   region -- a GenomicInterval
   samtools_exec -- file name (with path, if necessary)
      of the 'samtools' executable

   Returns a SAM_Reader object.
   """

   cmd_line = ( samtools_exec , "view", bamfile )

   samtools_view = subprocess.Popen( cmd_line,
      stdout = subprocess.PIPE )

   for line in samtools_view.stdout:
      fields = line.split('\t')
      #columns from http://samtools.sourceforge.net/SAM1.pdf
      read = HTSeq.SequenceWithQualities(name = fields[1-1], seq = fields[10-1],
                                          qualstr = fields[11-1])
      mapStrand = '+'
      if(int(fields[2-1]) & 0x10 == 0x10):
         mapStrand = '-'
      if(int(fields[2-1]) & 0x4 == 0x4):
         iv = None
      else:
         iv = HTSeq.GenomicInterval(
            chrom = fields[3-1], start = int(fields[4-1]) - 1,
            end = int(fields[4-1]) - 1 + len(fields[10-1]),
            strand = mapStrand)
      yield HTSeq.Alignment(read, iv)

I've also attached a full script that uses this code for generating 3-column count files and coverage graphs with experiment labels.

Attached Files

sam2profile.py.txt (10.2 KB, 33 views)

**dvanic** · 08-28-2012, 07:04 PM

For those who weren't already aware of it, HTSeq seems to be able to read BAM files now, using pysam:

Question: Is there an easy way to use a bam file as an input fot htseq-count now???

**Delphine** · 08-30-2012, 12:42 AM

Hi,

I usely use a command line like this, if this can help:
I use samtools to read the bam file and with the pipe, I use htseq-count. The "-" call the output of the samtools view and pass it to htseq-count

samtools view file.bam | python2.6 htseq-count - file.gtf -q > htseq_output &

Delphine

**KHubbard** · 09-27-2012, 12:08 PM

Trouble with HTSeq install

Trying to install HTSeq onto a machine running Windows7 and I got the following error:

C:\Users\ArrayStar>cd C:\HTSeq-0.5.3p9

C:\HTSeq-0.5.3p9>PATH=%PATH%;C:\Python27;C:\MinGW\bin

C:\HTSeq-0.5.3p9>python setup.py build --compiler=mingw32
Could not import 'setuptools', falling back to 'distutils'.
running build
running build_py
running build_ext
building 'HTSeq._HTSeq' extension
C:\MinGW\bin\gcc.exe -mno-cygwin -mdll -O -Wall -IC:\Python27\lib\site-packages\
numpy\core\include -IC:\Python27\include -IC:\Python27\PC -c src/_HTSeq.c -o bui
ld\temp.win-amd64-2.7\Release\src\_htseq.o -w
cc1.exe: error: unrecognized command line option '-mno-cygwin'
error: command 'gcc' failed with exit status 1

C:\HTSeq-0.5.3p9>

Any help with what I'm doing wrong and what I can do to fix it? Thanks!

**mdk308** · 10-10-2012, 10:35 AM

KHubbard - not sure but it looks like you may not have cygwin installed?

**Siva** · 01-12-2013, 02:20 PM

Clang errors when trying to build HTseq

Hi All
I have a mac (OSx10.8.2) and get the following error when trying to build HT seq (0.5.3p9). Looks like a gcc issue...Any suggestions please?

running build
running build_py
running build_ext
building 'HTSeq._StepVector' extension
clang -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch x86_64 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c src/StepVector_wrap.cxx -o build/temp.macosx-10.8-intel-2.7/src/StepVector_wrap.o -w
clang++ -bundle -undefined dynamic_lookup -Wl,-F. -arch i386 -arch x86_64 build/temp.macosx-10.8-intel-2.7/src/StepVector_wrap.o src/step_vector.h -o build/lib.macosx-10.8-intel-2.7/HTSeq/_StepVector.so
clang: warning: treating 'c-header' input as 'c++-header' when in C++ mode, this behavior is deprecated
clang: error: cannot use 'precompiled-header' output with multiple -arch options
clang: error: cannot specify -o when generating multiple output files
error: command 'clang++' failed with exit status 1

**Siva** · 01-12-2013, 03:47 PM

Ok I got an answer from another thread; I have now installed it successfully
Check this in case you are grappling with a similar problem

installing htseq on a mac - SEQanswers

http://seqanswers.com/forums/showthread.php?t=23768

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

**ajay87** · 01-21-2013, 02:54 AM

Hello Dr. Simon,
I tried to use HTseq_count function to find no of reads.
Data of D. melanogaster; ref genome from iGenomes (Ensembl).
Command typed >>
htseq-count -m intersection-nonempty -s no -t exon -i gene_id -o samout -q /home/ajay/bin2/binp/dmel-all-r5.48.gff /home/ajay/bin2/binp/C2_R1_thout/outc2r1.sam

Error occured in line 1 of file /home/ajay/bin2/binp/C2_R1_thout/outc2r1.sam.
Error: need more than 3 values to unpack
[Exception type: ValueError, raised in __init__.py:214]

Please suggest me where I am going wrong.
Ajay

Topics	Statistics	Last Post
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 30 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 37 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 42 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 64 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM

Unconfigured Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News