Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • HTSeq problem for STAR aligned files

    Hi everybody,

    I am trying to analyze RNAseq samples. I aligned all samples using STAR. I used following command:

    STAR --genomeDir STAR_genome --genomeLoad NoSharedMemory --runThreadN 4 --readFilesIn Input1.fastq Input2.fastq --outFileNamePrefix outpotfile --outReadsUnmapped Fastx --outFilterIntronMotifs RemoveNoncanonical --outFilterType BySJout --outSAMstrandField intronMotif --sjdbGTFfile --sjdbOverhang 99

    I have pair-end data. I used fasta file from UCSC(hg19) and GTF file from UCSC table browser. I got .sam aligned file and I used samtools to process this file for HTSeq.

    My HTSeq command is:
    htseq-count -r name -s reverse -t exon -i gene_name -m intersection-nonempty Input.sam hg19.ensemble.gtf > Output.counts

    I used ensemble GTF file.

    After running HTSeq, I get:

    __no_feature 21082900
    __ambiguous 528743
    __too_low_aQual 0
    __not_aligned 0
    __alignment_not_unique 8353302

    What is the problem? I do not get any counts for features.
    Since, I used ensemble GTF file, could that be the problem as I used UCSC genome and GTF for alignment and ensemble GTF for HTSeq.
    Your help is really appreciated. Thanks.

  • #2
    Have you tried HT-seq with the UCSC GTF file? Ensembl GTF files tend to have additional annotation that may be computationally generated (and not validated).

    Another option would be to try featureCounts from the Subread package.

    BTW: Are you using samples that have been QC'ed before alignment with STAR?

    Comment


    • #3
      Hi GenoMax,

      I am going to run with UCSC GTF to see if that works. I haven't done any QC. I have directly used fastq files to align to hg19 using STAR.
      Thanks. Will try out the option you mentioned.

      Comment


      • #4
        Hi @berkeley_2014, GenoMax

        the most likely cause is as follows.
        ENSEMBL files typically do not have 'chr' in chromosome names, while UCSC files do not - so ht-seq cannot recognize any of the chromosomes. As GenoMax suggested, using UCSC GTF with ht-seq should solve the problem.
        Alternatively, you can re-generate STAR genome with ENSEMBL fasta and gtf.

        Cheers
        Alex

        Comment


        • #5
          featureCounts automatically matches up the chromosome names with "chr" included and without. It even allows you to supply an alias file including the mapping between chromosomal names used in the annotation and chromosomal names included in the mapping results.

          Comment


          • #6
            Thanks everybody. I am going to do alignment with ensemble genome and gtf. And will try out featureCOunts as well. I did run htseq with ucsc gtf. I do get result with feature but no_feature is very high as well. I do not know if that is usual.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Recent Advances in Sequencing Analysis Tools
              by seqadmin


              The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
              05-06-2024, 07:48 AM
            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 05-14-2024, 07:03 AM
            0 responses
            15 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-10-2024, 06:35 AM
            0 responses
            37 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-09-2024, 02:46 PM
            0 responses
            45 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-07-2024, 06:57 AM
            0 responses
            39 views
            0 likes
            Last Post seqadmin  
            Working...
            X