Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • PhD1990
    Junior Member
    • Jan 2014
    • 3

    problem with HTSeq

    hi everyone

    I'm trying to start to use python/HTSeq to try to analyse RNA seq data.
    I'm following a tour through HTSeq but i m having a weird problem

    i can import HTSeq
    and read in a file with the HTSeq.FastqReader
    i can get a name of a read with read.name
    but when i type read.qual python just automatically restart and i have to start over

    does anyone know why this is and how i cna solve this problem?

    thank you
  • Wolfgang Huber
    Senior Member
    • Aug 2009
    • 109

    #2
    Dear PhD1990

    it's good that you report having a problem. Probably you need to be more specific for someone to be able to help you. Can you provide a

    - reproducible example (i.e. a self-contained piece of code and, if needed, data for others to reproduce your problem)
    - a statement of what the problem is that you experience (any error messages, warnings etc.)
    - an overview over your system (OS, Python version).

    Kind regards
    Wolfgang
    Wolfgang Huber
    EMBL

    Comment

    • sindrle
      Senior Member
      • Aug 2013
      • 266

      #3
      HTSeq: Very few counts recognised

      Hi!
      Ive seen a lot of threads on this, but I can't figure it out. I got 16-60 millions single end reads in each library. Ive used Tophat 2 with UCSC GTF file for hg19.

      This is my code:

      samtools view accepted_hits.bam | \
      htseq-count -m intersection-nonempty -s no -a 10 \
      - UCSC/hg19/genes.gtf \
      > Out.txt

      Here is a typical result, its propotional to the library size:

      no_feature 7013689
      ambiguous 269370
      too_low_aQual 0
      not_aligned 0
      alignment_not_unique 6645341

      How come i get on average 25 - 50% reads that is "no_feature",
      "ambiguous" or "alignment_not_unique".

      This is RNAseq, and if I must visually inspect, how to precede?

      Comment

      • PhD1990
        Junior Member
        • Jan 2014
        • 3

        #4
        thanks + second question

        hi everyone

        thank you so much for helping me
        i have found the problem by the way in the tutorial they say you chould download a vcredist x86 2010 version but now i downloaded 2012 and it wordks perfectly

        i have a second question though.

        Now the tutorial is working for me i still have one really weird problem. to count reads you should download exon information from internet? (ensembl or something) but in the tutorial they give a gtf file and that works perfectly, but on internet i can only find gff3 files for for example E coli strains. How do you use these because i see that the content is different from the gtf file?

        is there a standard format? of a place where i can find exon information in gtf version?

        thanks
        grtz

        Sara

        Comment

        • bruce01
          Senior Member
          • Mar 2011
          • 160

          #5
          Hi Sara,

          you can use GFF3 format in HTSeq, you just need to specify the feature (3rd column) using -t flag as it may be different from default which I think is 'gene_id'. For example '-t gene'. Otherwise you can use a conversion script to make a GTF from GFF3, there are a few around in various scripting languages, or I can PM you one I use if you want.

          Bruce.

          Comment

          • PhD1990
            Junior Member
            • Jan 2014
            • 3

            #6
            hi Bruce

            that would be really nice if you could send me such a script

            thank you so much

            Sara

            Comment

            • Simon Anders
              Senior Member
              • Feb 2010
              • 995

              #7
              Originally posted by sindrle View Post
              Hi!
              Ive seen a lot of threads on this, but I can't figure it out. I got 16-60 millions single end reads in each library. Ive used Tophat 2 with UCSC GTF file for hg19.

              [...]

              How come i get on average 25 - 50% reads that is "no_feature",
              "ambiguous" or "alignment_not_unique".
              Is this a GTF file created with UCSC's table browser? If so: These do not work. There is a bug in the Table Browser server, which causes all the gene IDs to contain not the gene ID but the transcript ID.

              Please use a GTF file from another source.

              Simon

              Comment

              Latest Articles

              Collapse

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              19 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              27 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              38 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              61 views
              0 reactions
              Last Post SEQadmin2  
              Working...