Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dear Simon,
    I met a problem when using the HTSeq count. How can I fix the error? Thanks a lot!
    huoxj@ubuntu:/host/ubuntu$ htseq-count -s no -i ID Hxj3TAN_hits.bam Zv9.gff > Hxj4count.txt
    Error occured when processing GFF file (line 2 of file Zv9.gff):
    invalid literal for int() with base 10: '+'
    [Exception type: ValueError, raised in __init__.py:223]



    Originally posted by Simon Anders View Post
    Hi



    I noticed this bug myself just yesterday and fixed it. Please try again with version 0.4.3-p4 and tell me whether this solves the issue.

    Cheers
    Simon

    Comment


    • Please post the beginning of your GFF file, to see whether there really is a '+' in line 2.

      Comment


      • #bin name chrom strand txStart txEnd cdsStart cdsEnd exonCoun
        1 NM_131426 chr1 + 50321633 50410568 50322024
        1 NM_001110522 chr1 - 58701200 58722813 58701200
        9 NM_001143751 chr1 + 6072450 6331842 6072675 6331842 11
        Originally posted by Simon Anders View Post
        Please post the beginning of your GFF file, to see whether there really is a '+' in line 2.

        Comment


        • This does not at all look like a GFF file to me. No wonder that it does not work.

          Comment


          • Originally posted by Lagzxadr View Post
            #bin name chrom strand txStart txEnd cdsStart cdsEnd exonCoun
            1 NM_131426 chr1 + 50321633 50410568 50322024
            1 NM_001110522 chr1 - 58701200 58722813 58701200
            9 NM_001143751 chr1 + 6072450 6331842 6072675 6331842 11
            That's genePred format from UCSC! The conversion procedure is detailed here

            Comment


            • oo... I failed to generate a gff file from UCSC. I can only download a gff3 file from ncbi. I ran the HTSeq on the gff3 and my bam file. but no Feature counted. I think it is because the ID form ncbi gff3 cannot be matched to the IDs in bam, which was mapped with ucsc basement. can u give some suggestion? Should I use the gff3 from ncbi or where can I get a ucsc gff?
              Originally posted by Simon Anders View Post
              This does not at all look like a GFF file to me. No wonder that it does not work.

              Comment


              • When downloading that table from the UCSC table browser, just change the "output format" drop-down box to "GTF - gene transfer format".

                Comment


                • Yes. Got it. Thanks a lot! then how to generate a gff?
                  Originally posted by dpryan View Post
                  When downloading that table from the UCSC table browser, just change the "output format" drop-down box to "GTF - gene transfer format".

                  Comment


                  • Just use the GTF.

                    Comment


                    • Finally done. Thank u very much!
                      Originally posted by dpryan View Post
                      Just use the GTF.

                      Comment


                      • Hi Simon,
                        I'm analyzing SOLID data using bowtie for mapping and htseq for quantification. The thing is when I used the --stranded parameter (I tried it just to familiarize myself with htseq) I get very similar numbers whether I set it to yes or no.

                        For example for my 001_02.count file when --stranded=yes
                        __no_feature 278195
                        __ambiguous 26690
                        __too_low_aQual 0
                        __not_aligned 0
                        __alignment_not_unique 0

                        For example for my 001_02.count file when --stranded=no
                        __no_feature 255213
                        __ambiguous 115445
                        __too_low_aQual 0
                        __not_aligned 0
                        __alignment_not_unique 0

                        Since my protocol wasn't stranded I should be losing half the counts when --stranded=yes but as you can see this was not the case.. I tried the same for some Illumina data I have access to and got this, which I think its alright.

                        stranded=yes __no_feature 9381365
                        stranded=no __no_feature 492513

                        So after struggling with this for a while the only thing I found was that the sam files for the SOLID data only have two different flags 0 or 16, which I'm guessing is not enough information for htseq?

                        707_1366_1065 16 Chr1 1078 255 28M * 0 0 CCCCCCCCCCCCCACCCCCCAAATTGAG [\L!2_______UBL__ZU!"_______ XA:i:2 MD:Z:28 NM:i:0 CM:i:2
                        42_176_82 0 Chr1 4868 255 73M * 0 0 GGCGGTCAGTGGCTGAGTGACTATATCGACCTGCAACAGCAAGTTCCTTACTTGGCACCTTATGAAAATGAGT ___________________________________________UU______ZY^________^Z[_^^__\KM XA:i:0 MD:Z:73 NM:i:0 CM:i:0

                        so my question is, are the results I'm getting for the SOLID data with --stranded=no reliable?

                        Comment


                        • I've had a quick question about this picture:



                          does it matter for "ambiguos" reads if they land on the right strand? I.e. for cases shown in the two last cases, if gene A and gene B are on opposite strands, and the library is stranded, there is no ambiguity actually. Is that taken into consideration?

                          Thank you in advance!

                          Comment


                          • Originally posted by apredeus View Post
                            does it matter for "ambiguous" reads if they land on the right strand? I.e. for cases shown in the two last cases, if gene A and gene B are on opposite strands, and the library is stranded, there is no ambiguity actually. Is that taken into consideration?
                            It is taken into consideration if it's set in the command-line options (on by default). For stranded counting, a read will only be considered ambiguous if the exon on the corresponding strand is shared by multiple genes (this can happen). Otherwise, an ambiguous count includes genes on the opposite strand as well (much more likely).

                            Comment


                            • so if two genes' exons overlap but the genes are on the opposite strands, the read landing in the overlap would NOT be considered ambiguous?

                              Comment


                              • Correct, assuming that all the right command line options had been set up.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                32 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                37 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                31 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                53 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X