Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • No counts from HTSEq

    So I've just used the pre-built GRCh38 bowtie2 index to map paired end reads using Tophat. Approximately 70% of the reads mapped (~11000000).
    Code:
    tophat -o pilot_S10.5 ../bowtieIndex/GCA_000001405.15_GRCh38_n
    o_alt_analysis_set.fna.bowtie_index pilot_S10_L005_R1_001.fastq pilot_S10_L005_R2_001.fastq
    These were then sorted by name (samtools sort -n). I then downloaded the Homo_sapiens.GRCh38.88.gtf file from Ensembl and ran the sorted reads through HTSeq:

    Code:
    htseq-count -f bam -t gene --stranded=yes accepted.sorted.bam /projects/comm00007/rnaSeqData/Homo_sapiens.GRCh38.88.gtf  > sample.count
    This has resulted in zero reads mapping. I get ~1/3rd of reads registering as no feature and the rest register as alignment not unique. I also get nothing if I ask to count exon rather that gene. When I load a sample of the reads up into Seqmonk though, I get numerous reads mapping to genes. Any thoughts as to why?

    The only thing that I can think of is that the bowtie index file isn't the same assembly as the gtf file. I'm in the process of downloading corresponding fa file (Homo_sapiens.GRCh38.dna.toplevel.fa.gz I think) from ensembl and trying to align everything again. I'm not particularly sanguine though as I'm assuming the gtf that Seqmonk uses is coming from ensembl as well, whilst my alignment is using the bowtie2 pre-built index.

    Cheers
    Ben.

  • #2
    If you are just starting out then do not use TopHat. It is no longer the state of art for RNAseq data analysis.

    You could use any other splice aware aligner or if you want to stay in the "family" then HISAT2/StringTie is the current recommended software from the same folks who developed TopHat.

    Comment


    • #3
      If your reference, indexs and annotations do not match exactly (in terms of gene names) then you are not going to get the counting to work. For counting also consider using featureCounts. Much faster, can produce count matrix from multiple BAM files and can take non-sorted BAM's.

      Comment


      • #4
        I'm not just starting out. I used the exact same pipeline to process a timcourse about a year-ish ago - one of the reasons I was thinking that the gtf/genome files, as you say, might not be matching. In an odeal world, there'd be an associated gtf file alongside the pre-generated bowtie indexes.

        Thanks for pointing me towards HISAT2, will investigate/align. Though, just because it's no longer cutting edge, doesn't mean that Tophat is now useless. The alignment should at least, be reasonable. Moving to HISAT2 will leave me with the same conundrum of not being sure that the pre-built indexes are the same as the gtf file I get from Ensembl. Unless I go and build one myself that is.

        So now it just gets strange. I had found featurecounts and ran it yesterday. It's giving me a very small proportion of reads mapping to genes and a large proportion being multi-mapped. Which leaves me with two possibilities. Either a) there's something unexpected going on in my data or b) given that it's using the same gtf that htseq-count used to produce zero counts, there's something odd going on in the gtf/genome file combo. I'm guessing a), but am curious as to why htseq wasn't/isn't working.

        Comment


        • #5
          Originally posted by tirohia View Post
          I'm not just starting out. I used the exact same pipeline to process a timcourse about a year-ish ago - one of the reasons I was thinking that the gtf/genome files, as you say, might not be matching. In an odeal world, there'd be an associated gtf file alongside the pre-generated bowtie indexes.

          You can get those from Illumina iGenomes site. The bundle contains matching sequence, annotations, indexes the whole bit.

          Thanks for pointing me towards HISAT2, will investigate/align. Though, just because it's no longer cutting edge, doesn't mean that Tophat is now useless.
          Fair point. Authors of TopHat have this note on their site now.
          ---------------------------------------
          Please note that TopHat has entered a low maintenance, low support stage as it is now largely superseded by HISAT2 which provides the same core functionality (i.e. spliced alignment of RNA-Seq reads), in a more accurate and much more efficient way.
          ----------------------------------------

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Exploring the Dynamics of the Tumor Microenvironment
            by seqadmin




            The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
            07-08-2024, 03:19 PM
          • seqadmin
            Exploring Human Diversity Through Large-Scale Omics
            by seqadmin


            In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
            06-25-2024, 06:43 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 07-10-2024, 07:30 AM
          0 responses
          30 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 07-03-2024, 09:45 AM
          0 responses
          201 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 07-03-2024, 08:54 AM
          0 responses
          212 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 07-02-2024, 03:00 PM
          0 responses
          194 views
          0 likes
          Last Post seqadmin  
          Working...
          X