Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Lagzxadr
    Junior Member
    • Jan 2014
    • 5

    Dear Simon,
    I met a problem when using the HTSeq count. How can I fix the error? Thanks a lot!
    huoxj@ubuntu:/host/ubuntu$ htseq-count -s no -i ID Hxj3TAN_hits.bam Zv9.gff > Hxj4count.txt
    Error occured when processing GFF file (line 2 of file Zv9.gff):
    invalid literal for int() with base 10: '+'
    [Exception type: ValueError, raised in __init__.py:223]



    Originally posted by Simon Anders View Post
    Hi



    I noticed this bug myself just yesterday and fixed it. Please try again with version 0.4.3-p4 and tell me whether this solves the issue.

    Cheers
    Simon

    Comment

    • Simon Anders
      Senior Member
      • Feb 2010
      • 995

      Please post the beginning of your GFF file, to see whether there really is a '+' in line 2.

      Comment

      • Lagzxadr
        Junior Member
        • Jan 2014
        • 5

        #bin name chrom strand txStart txEnd cdsStart cdsEnd exonCoun
        1 NM_131426 chr1 + 50321633 50410568 50322024
        1 NM_001110522 chr1 - 58701200 58722813 58701200
        9 NM_001143751 chr1 + 6072450 6331842 6072675 6331842 11
        Originally posted by Simon Anders View Post
        Please post the beginning of your GFF file, to see whether there really is a '+' in line 2.

        Comment

        • Simon Anders
          Senior Member
          • Feb 2010
          • 995

          This does not at all look like a GFF file to me. No wonder that it does not work.

          Comment

          • dpryan
            Devon Ryan
            • Jul 2011
            • 3478

            Originally posted by Lagzxadr View Post
            #bin name chrom strand txStart txEnd cdsStart cdsEnd exonCoun
            1 NM_131426 chr1 + 50321633 50410568 50322024
            1 NM_001110522 chr1 - 58701200 58722813 58701200
            9 NM_001143751 chr1 + 6072450 6331842 6072675 6331842 11
            That's genePred format from UCSC! The conversion procedure is detailed here

            Comment

            • Lagzxadr
              Junior Member
              • Jan 2014
              • 5

              oo... I failed to generate a gff file from UCSC. I can only download a gff3 file from ncbi. I ran the HTSeq on the gff3 and my bam file. but no Feature counted. I think it is because the ID form ncbi gff3 cannot be matched to the IDs in bam, which was mapped with ucsc basement. can u give some suggestion? Should I use the gff3 from ncbi or where can I get a ucsc gff?
              Originally posted by Simon Anders View Post
              This does not at all look like a GFF file to me. No wonder that it does not work.

              Comment

              • dpryan
                Devon Ryan
                • Jul 2011
                • 3478

                When downloading that table from the UCSC table browser, just change the "output format" drop-down box to "GTF - gene transfer format".

                Comment

                • Lagzxadr
                  Junior Member
                  • Jan 2014
                  • 5

                  Yes. Got it. Thanks a lot! then how to generate a gff?
                  Originally posted by dpryan View Post
                  When downloading that table from the UCSC table browser, just change the "output format" drop-down box to "GTF - gene transfer format".

                  Comment

                  • dpryan
                    Devon Ryan
                    • Jul 2011
                    • 3478

                    Just use the GTF.

                    Comment

                    • Lagzxadr
                      Junior Member
                      • Jan 2014
                      • 5

                      Finally done. Thank u very much!
                      Originally posted by dpryan View Post
                      Just use the GTF.

                      Comment

                      • dlepe
                        Junior Member
                        • Jul 2014
                        • 8

                        Hi Simon,
                        I'm analyzing SOLID data using bowtie for mapping and htseq for quantification. The thing is when I used the --stranded parameter (I tried it just to familiarize myself with htseq) I get very similar numbers whether I set it to yes or no.

                        For example for my 001_02.count file when --stranded=yes
                        __no_feature 278195
                        __ambiguous 26690
                        __too_low_aQual 0
                        __not_aligned 0
                        __alignment_not_unique 0

                        For example for my 001_02.count file when --stranded=no
                        __no_feature 255213
                        __ambiguous 115445
                        __too_low_aQual 0
                        __not_aligned 0
                        __alignment_not_unique 0

                        Since my protocol wasn't stranded I should be losing half the counts when --stranded=yes but as you can see this was not the case.. I tried the same for some Illumina data I have access to and got this, which I think its alright.

                        stranded=yes __no_feature 9381365
                        stranded=no __no_feature 492513

                        So after struggling with this for a while the only thing I found was that the sam files for the SOLID data only have two different flags 0 or 16, which I'm guessing is not enough information for htseq?

                        707_1366_1065 16 Chr1 1078 255 28M * 0 0 CCCCCCCCCCCCCACCCCCCAAATTGAG [\L!2_______UBL__ZU!"_______ XA:i:2 MD:Z:28 NM:i:0 CM:i:2
                        42_176_82 0 Chr1 4868 255 73M * 0 0 GGCGGTCAGTGGCTGAGTGACTATATCGACCTGCAACAGCAAGTTCCTTACTTGGCACCTTATGAAAATGAGT ___________________________________________UU______ZY^________^Z[_^^__\KM XA:i:0 MD:Z:73 NM:i:0 CM:i:0

                        so my question is, are the results I'm getting for the SOLID data with --stranded=no reliable?

                        Comment

                        • apredeus
                          Senior Member
                          • Jul 2012
                          • 151

                          I've had a quick question about this picture:



                          does it matter for "ambiguos" reads if they land on the right strand? I.e. for cases shown in the two last cases, if gene A and gene B are on opposite strands, and the library is stranded, there is no ambiguity actually. Is that taken into consideration?

                          Thank you in advance!

                          Comment

                          • gringer
                            David Eccles (gringer)
                            • May 2011
                            • 845

                            Originally posted by apredeus View Post
                            does it matter for "ambiguous" reads if they land on the right strand? I.e. for cases shown in the two last cases, if gene A and gene B are on opposite strands, and the library is stranded, there is no ambiguity actually. Is that taken into consideration?
                            It is taken into consideration if it's set in the command-line options (on by default). For stranded counting, a read will only be considered ambiguous if the exon on the corresponding strand is shared by multiple genes (this can happen). Otherwise, an ambiguous count includes genes on the opposite strand as well (much more likely).

                            Comment

                            • apredeus
                              Senior Member
                              • Jul 2012
                              • 151

                              so if two genes' exons overlap but the genes are on the opposite strands, the read landing in the overlap would NOT be considered ambiguous?

                              Comment

                              • gringer
                                David Eccles (gringer)
                                • May 2011
                                • 845

                                Correct, assuming that all the right command line options had been set up.

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                  by SEQadmin2


                                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                                  Here are nine questions we think about, in roughly the order they matter, before...
                                  Today, 07:11 AM
                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, Yesterday, 06:09 AM
                                0 responses
                                16 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-09-2026, 11:58 AM
                                0 responses
                                37 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-05-2026, 10:09 AM
                                0 responses
                                42 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-04-2026, 08:59 AM
                                0 responses
                                49 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...