Unconfigured Ad

**Lagzxadr** · 01-16-2014, 12:26 AM

Dear Simon,
I met a problem when using the HTSeq count. How can I fix the error? Thanks a lot!
huoxj@ubuntu:/host/ubuntu$ htseq-count -s no -i ID Hxj3TAN_hits.bam Zv9.gff > Hxj4count.txt
Error occured when processing GFF file (line 2 of file Zv9.gff):
invalid literal for int() with base 10: '+'
[Exception type: ValueError, raised in __init__.py:223]

Originally posted by Simon Anders View Post

Hi

I noticed this bug myself just yesterday and fixed it. Please try again with version 0.4.3-p4 and tell me whether this solves the issue.

Cheers
Simon

**Simon Anders** · 01-16-2014, 01:02 AM

Please post the beginning of your GFF file, to see whether there really is a '+' in line 2.

**Lagzxadr** · 01-16-2014, 04:01 AM

#bin name chrom strand txStart txEnd cdsStart cdsEnd exonCoun
1 NM_131426 chr1 + 50321633 50410568 50322024
1 NM_001110522 chr1 - 58701200 58722813 58701200
9 NM_001143751 chr1 + 6072450 6331842 6072675 6331842 11

Originally posted by Simon Anders View Post

Please post the beginning of your GFF file, to see whether there really is a '+' in line 2.

**Simon Anders** · 01-16-2014, 04:04 AM

This does not at all look like a GFF file to me. No wonder that it does not work.

**dpryan** · 01-16-2014, 04:15 AM

Originally posted by Lagzxadr View Post

#bin name chrom strand txStart txEnd cdsStart cdsEnd exonCoun
1 NM_131426 chr1 + 50321633 50410568 50322024
1 NM_001110522 chr1 - 58701200 58722813 58701200
9 NM_001143751 chr1 + 6072450 6331842 6072675 6331842 11

That's genePred format from UCSC!

The conversion procedure is detailed here

**Lagzxadr** · 01-16-2014, 04:38 AM

oo... I failed to generate a gff file from UCSC. I can only download a gff3 file from ncbi. I ran the HTSeq on the gff3 and my bam file. but no Feature counted. I think it is because the ID form ncbi gff3 cannot be matched to the IDs in bam, which was mapped with ucsc basement. can u give some suggestion? Should I use the gff3 from ncbi or where can I get a ucsc gff?

Originally posted by Simon Anders View Post

This does not at all look like a GFF file to me. No wonder that it does not work.

**dpryan** · 01-16-2014, 04:47 AM

When downloading that table from the UCSC table browser, just change the "output format" drop-down box to "GTF - gene transfer format".

**Lagzxadr** · 01-16-2014, 05:07 AM

Yes. Got it. Thanks a lot! then how to generate a gff?

Originally posted by dpryan View Post

When downloading that table from the UCSC table browser, just change the "output format" drop-down box to "GTF - gene transfer format".

**dpryan** · 01-16-2014, 05:32 AM

Just use the GTF.

**Lagzxadr** · 01-16-2014, 06:56 AM

Finally done. Thank u very much!

Originally posted by dpryan View Post

Just use the GTF.

**dlepe** · 07-21-2014, 08:16 AM

Hi Simon,
I'm analyzing SOLID data using bowtie for mapping and htseq for quantification. The thing is when I used the --stranded parameter (I tried it just to familiarize myself with htseq) I get very similar numbers whether I set it to yes or no.

For example for my 001_02.count file when --stranded=yes
__no_feature 278195
__ambiguous 26690
__too_low_aQual 0
__not_aligned 0
__alignment_not_unique 0

For example for my 001_02.count file when --stranded=no
__no_feature 255213
__ambiguous 115445
__too_low_aQual 0
__not_aligned 0
__alignment_not_unique 0

Since my protocol wasn't stranded I should be losing half the counts when --stranded=yes but as you can see this was not the case.. I tried the same for some Illumina data I have access to and got this, which I think its alright.

stranded=yes __no_feature 9381365
stranded=no __no_feature 492513

So after struggling with this for a while the only thing I found was that the sam files for the SOLID data only have two different flags 0 or 16, which I'm guessing is not enough information for htseq?

707_1366_1065 16 Chr1 1078 255 28M * 0 0 CCCCCCCCCCCCCACCCCCCAAATTGAG [\L!2_______UBL__ZU!"_______ XA:i:2 MD:Z:28 NM:i:0 CM:i:2
42_176_82 0 Chr1 4868 255 73M * 0 0 GGCGGTCAGTGGCTGAGTGACTATATCGACCTGCAACAGCAAGTTCCTTACTTGGCACCTTATGAAAATGAGT ___________________________________________UU______ZY^________^Z[_^^__\KM XA:i:0 MD:Z:73 NM:i:0 CM:i:0

so my question is, are the results I'm getting for the SOLID data with --stranded=no reliable?

**apredeus** · 07-23-2014, 07:12 PM

I've had a quick question about this picture:

does it matter for "ambiguos" reads if they land on the right strand? I.e. for cases shown in the two last cases, if gene A and gene B are on opposite strands, and the library is stranded, there is no ambiguity actually. Is that taken into consideration?

Thank you in advance!

**gringer** · 07-23-2014, 07:37 PM

Originally posted by apredeus View Post

does it matter for "ambiguous" reads if they land on the right strand? I.e. for cases shown in the two last cases, if gene A and gene B are on opposite strands, and the library is stranded, there is no ambiguity actually. Is that taken into consideration?

It is taken into consideration if it's set in the command-line options (on by default). For stranded counting, a read will only be considered ambiguous if the exon on the corresponding strand is shared by multiple genes (this can happen). Otherwise, an ambiguous count includes genes on the opposite strand as well (much more likely).

**apredeus** · 07-23-2014, 08:29 PM

so if two genes' exons overlap but the genes are on the opposite strands, the read landing in the overlap would NOT be considered ambiguous?

**gringer** · 07-23-2014, 09:04 PM

Correct, assuming that all the right command line options had been set up.

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, Yesterday, 06:09 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 Yesterday, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 37 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 42 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News