Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • berkeley_2014
    Junior Member
    • May 2014
    • 3

    HTSeq problem for STAR aligned files

    Hi everybody,

    I am trying to analyze RNAseq samples. I aligned all samples using STAR. I used following command:

    STAR --genomeDir STAR_genome --genomeLoad NoSharedMemory --runThreadN 4 --readFilesIn Input1.fastq Input2.fastq --outFileNamePrefix outpotfile --outReadsUnmapped Fastx --outFilterIntronMotifs RemoveNoncanonical --outFilterType BySJout --outSAMstrandField intronMotif --sjdbGTFfile --sjdbOverhang 99

    I have pair-end data. I used fasta file from UCSC(hg19) and GTF file from UCSC table browser. I got .sam aligned file and I used samtools to process this file for HTSeq.

    My HTSeq command is:
    htseq-count -r name -s reverse -t exon -i gene_name -m intersection-nonempty Input.sam hg19.ensemble.gtf > Output.counts

    I used ensemble GTF file.

    After running HTSeq, I get:

    __no_feature 21082900
    __ambiguous 528743
    __too_low_aQual 0
    __not_aligned 0
    __alignment_not_unique 8353302

    What is the problem? I do not get any counts for features.
    Since, I used ensemble GTF file, could that be the problem as I used UCSC genome and GTF for alignment and ensemble GTF for HTSeq.
    Your help is really appreciated. Thanks.
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    Have you tried HT-seq with the UCSC GTF file? Ensembl GTF files tend to have additional annotation that may be computationally generated (and not validated).

    Another option would be to try featureCounts from the Subread package.

    BTW: Are you using samples that have been QC'ed before alignment with STAR?

    Comment

    • berkeley_2014
      Junior Member
      • May 2014
      • 3

      #3
      Hi GenoMax,

      I am going to run with UCSC GTF to see if that works. I haven't done any QC. I have directly used fastq files to align to hg19 using STAR.
      Thanks. Will try out the option you mentioned.

      Comment

      • alexdobin
        Senior Member
        • Feb 2009
        • 161

        #4
        Hi @berkeley_2014, GenoMax

        the most likely cause is as follows.
        ENSEMBL files typically do not have 'chr' in chromosome names, while UCSC files do not - so ht-seq cannot recognize any of the chromosomes. As GenoMax suggested, using UCSC GTF with ht-seq should solve the problem.
        Alternatively, you can re-generate STAR genome with ENSEMBL fasta and gtf.

        Cheers
        Alex

        Comment

        • shi
          Wei Shi
          • Feb 2010
          • 236

          #5
          featureCounts automatically matches up the chromosome names with "chr" included and without. It even allows you to supply an alias file including the mapping between chromosomal names used in the annotation and chromosomal names included in the mapping results.

          Comment

          • berkeley_2014
            Junior Member
            • May 2014
            • 3

            #6
            Thanks everybody. I am going to do alignment with ensemble genome and gtf. And will try out featureCOunts as well. I did run htseq with ucsc gtf. I do get result with feature but no_feature is very high as well. I do not know if that is usual.

            Comment

            Latest Articles

            Collapse

            • GATTACAT
              Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by GATTACAT
              Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
              07-01-2026, 11:43 AM
            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 07-02-2026, 11:08 AM
            0 responses
            11 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-30-2026, 05:37 AM
            0 responses
            14 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-26-2026, 11:10 AM
            0 responses
            20 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            54 views
            0 reactions
            Last Post SEQadmin2  
            Working...