Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • HTSeq problem for STAR aligned files

    Hi everybody,

    I am trying to analyze RNAseq samples. I aligned all samples using STAR. I used following command:

    STAR --genomeDir STAR_genome --genomeLoad NoSharedMemory --runThreadN 4 --readFilesIn Input1.fastq Input2.fastq --outFileNamePrefix outpotfile --outReadsUnmapped Fastx --outFilterIntronMotifs RemoveNoncanonical --outFilterType BySJout --outSAMstrandField intronMotif --sjdbGTFfile --sjdbOverhang 99

    I have pair-end data. I used fasta file from UCSC(hg19) and GTF file from UCSC table browser. I got .sam aligned file and I used samtools to process this file for HTSeq.

    My HTSeq command is:
    htseq-count -r name -s reverse -t exon -i gene_name -m intersection-nonempty Input.sam hg19.ensemble.gtf > Output.counts

    I used ensemble GTF file.

    After running HTSeq, I get:

    __no_feature 21082900
    __ambiguous 528743
    __too_low_aQual 0
    __not_aligned 0
    __alignment_not_unique 8353302

    What is the problem? I do not get any counts for features.
    Since, I used ensemble GTF file, could that be the problem as I used UCSC genome and GTF for alignment and ensemble GTF for HTSeq.
    Your help is really appreciated. Thanks.

  • #2
    Have you tried HT-seq with the UCSC GTF file? Ensembl GTF files tend to have additional annotation that may be computationally generated (and not validated).

    Another option would be to try featureCounts from the Subread package.

    BTW: Are you using samples that have been QC'ed before alignment with STAR?

    Comment


    • #3
      Hi GenoMax,

      I am going to run with UCSC GTF to see if that works. I haven't done any QC. I have directly used fastq files to align to hg19 using STAR.
      Thanks. Will try out the option you mentioned.

      Comment


      • #4
        Hi @berkeley_2014, GenoMax

        the most likely cause is as follows.
        ENSEMBL files typically do not have 'chr' in chromosome names, while UCSC files do not - so ht-seq cannot recognize any of the chromosomes. As GenoMax suggested, using UCSC GTF with ht-seq should solve the problem.
        Alternatively, you can re-generate STAR genome with ENSEMBL fasta and gtf.

        Cheers
        Alex

        Comment


        • #5
          featureCounts automatically matches up the chromosome names with "chr" included and without. It even allows you to supply an alias file including the mapping between chromosomal names used in the annotation and chromosomal names included in the mapping results.

          Comment


          • #6
            Thanks everybody. I am going to do alignment with ensemble genome and gtf. And will try out featureCOunts as well. I did run htseq with ucsc gtf. I do get result with feature but no_feature is very high as well. I do not know if that is usual.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Best Practices for Single-Cell Sequencing Analysis
              by seqadmin



              While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
              06-06-2024, 07:15 AM
            • seqadmin
              Latest Developments in Precision Medicine
              by seqadmin



              Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

              Somatic Genomics
              “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
              05-24-2024, 01:16 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 06-07-2024, 06:58 AM
            0 responses
            177 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 06-06-2024, 08:18 AM
            0 responses
            215 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 06-06-2024, 08:04 AM
            0 responses
            180 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 06-03-2024, 06:55 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Working...
            X