Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • MariaF
    Junior Member
    • Mar 2018
    • 4

    Reads are 0 in htseq-count

    Hello,

    I have a trouble using htseq-count, I used TopHat to aling my data, but when I tried to count the number of reads all the genes present 0 read.

    I checked the chromosome names in ./gtf file: cut -f 1 -d " " ~/HG19/genes.gtf | cut -f 1 | sort | uniq So all of them are started with "chr". The same result was accepted by bowtie2-inspect --names ~/HG19/genome. (in .gtf file, I've seen the numbers like "chrHSCHR9_2_CTG3" and don't know what it is).

    1) programs/tophat2 -p 4 -r 50 -G ~/HG19/genes.gtf -o tst_output HG19/genome tst_fastq Result: accepted_hits.bam deletions.bed junctions.bed prep_reads.info align_summary.txt insertions.bed logs unmapped.bam

    samtools view -H ./accepted_hits.bam:
    @HD VN:1.0 SO:coordinate @SQ SN:chr1 LN:249250621 @SQ SN:chr10 LN:135534747 @SQ SN:chr11 LN:135006516 @SQ SN:chr12 LN:133851895 @SQ SN:chr13 LN:115169878 @SQ SN:chr14 LN:107349540 @SQ SN:chr15 LN:102531392 @SQ SN:chr16 LN:90354753 @SQ SN:chr17 LN:81195210 @SQ SN:chr18 LN:78077248 @SQ SN:chr19 LN:59128983 @SQ SN:chr2 LN:243199373 @SQ SN:chr20 LN:63025520 @SQ SN:chr21 LN:48129895 @SQ SN:chr22 LN:51304566 @SQ SN:chr3 LN:198022430 @SQ SN:chr4 LN:191154276 @SQ SN:chr5 LN:180915260 @SQ SN:chr6 LN:171115067 @SQ SN:chr7 LN:159138663 @SQ SN:chr8 LN:146364022 @SQ SN:chr9 LN:141213431 @SQ SN:chrM LN:16571 @SQ SN:chrX LN:155270560 @SQ SN:chrY LN:59373566 @PG ID:TopHat VN:2.1.1 CLrograms/tophat -p 4 -r 50 -G ~/HG19/genes.gtf -o tst_output HG19/genome tst.fastq
    less align_summary.txt:
    Reads: Input : 99 Mapped : 76 (76.8% of input) of these: 1 ( 1.3%) have multiple alignments (0 have >20) 76.8% overall read mapping rate.
    2) samtools view -h tst_output/accepted_hits.bam | htseq-count - ~/HG19/genes.gtf > tst_count.txt tail tst_count.txt:
    ENST00000610276 0 ENST00000610277 0 ENST00000610278 0 ENST00000610279 0 ENST00000610280 0 __no_feature 40 __ambiguous 27 __too_low_aQual 0 __not_aligned 0 __alignment_not_unique 15
    I have no idea what should I do, so could you please help me.

    PS: after samtools sort -n -o temp-sorted.bam accepted_hits.bam I have the same result ("SO:queryname" in temp-sorted.bam file).

    Thanks,
    Maria.
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    It looks like you are mixing and matching genome builds and annotation. "chr" designations on your chromosomes indicates that the data is from UCSC but your annotation has ENST* identifiers which indicates Ensembl as the origin.

    I suggest that you find annotation that uses the chr designations for doing the counting.

    BTW: Please stop using TopHat. It is not even recommended by its authors. There are current options like STAR/bbmap/HISAT2 that should be used instead.

    Comment

    • MariaF
      Junior Member
      • Mar 2018
      • 4

      #3
      Originally posted by GenoMax View Post
      It looks like you are mixing and matching genome builds and annotation. "chr" designations on your chromosomes indicates that the data is from UCSC but your annotation has ENST* identifiers which indicates Ensembl as the origin.

      I suggest that you find annotation that uses the chr designations for doing the counting.

      BTW: Please stop using TopHat. It is not even recommended by its authors. There are current options like STAR/bbmap/HISAT2 that should be used instead.
      Thanks, GenoMax, you are absolutely right: my annotation was Ensembl and my .fa was UCSC.

      I downloaded the annotation from http://genome.ucsc.edu/cgi-bin/hgTables - is it correct?

      head my .gtf file:
      chr1 hg19_knownGene exon 11874 12227 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
      chr1 hg19_knownGene exon 12613 12721 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
      chr1 hg19_knownGene exon 13221 14409 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
      All reads are zero in htseq-count again.

      my .fastq file:
      @ERR599187.1 HWI-ST1034:1070DURACXX:4:1101:1451:1992/1
      ATCGCCTCCTGNTAGGCCAGCTCTACATCTTCTTGGGTGACCACTAGTTTAATCTTATTCCATTTCTCAGGGATATCCAGCCACTGGTGCACAGCCACTGC
      +
      @@@DDDDDHHH#2AEHG>EG?GAGA>FECHIIIICEG?GFBBCGHDFIFGIIIIGGHAFHIIIGG=DDAHBHH@DEDEECBCC@BCCCCCCCBBBCCCC
      @ERR599187.2 HWI-ST1034:1070DURACXX:4:1101:1687:1994/1
      AAAAGAAGTAANATAAGAAGGCAATGCTTGTGGAATGTACAGTGCATATTGGCGGCGCACGCCTCATTACGATTCGCCTGCTTTCTTCTCCTGTTCAATCG
      +
      @@@DDDDDHHH#3AFHGGGGII@EHIGIIIIIGIBGHHIFGI0BFBGIII@H<FE8=/398=89@@CD#################################
      @ERR599187.3 HWI-ST1034:1070DURACXX:4:1101:1636:1996/1
      CACCCACGGAANCGAGAAAGAGCTATCAATCTGTCAATCCTGTCCGTGTCCGGGCCGGGTGAGGGTTCCCGTGTTGGGTCAAATTAAGCCGCAGGCTCCAC
      +
      @@@?DD=DD<F#222CFHIGIC;FHEDDGH@H4?<9BGHE<FFHGGIGIIGHA4=>C@#########################################
      tail test_count.txt
      uc031tkv.1 0
      uc031tkw.1 0
      uc031tkx.1 0
      uc031tky.1 0
      uc031tkz.1 0
      __no_feature 4
      __ambiguous 0
      __too_low_aQual 0
      __not_aligned 0
      __alignment_not_unique 0
      BTW: I see what you mean, I know about STAR and HISAT-2. It's just my task.

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        Can you post a few lines from your SAM/BAM file? Just part of the SAM header should be fine.

        Comment

        • MariaF
          Junior Member
          • Mar 2018
          • 4

          #5
          Thanks for your attention to my problem, GenoMax.

          samtools view -H temp.samheader.sam
          @HD VN:1.0 SO:unsorted
          @SQ SN:0_2.65543%_cov_993_len_552 LN:552
          @SQ SN:1_2.28932%_cov_1093_len_446 LN:446
          samtools view -H test_genome.bwt.samheader.sam
          @HD VN:1.0 SO:coordinate
          @SQ SN:0_2.65543%_cov_993_len_552 LN:552
          @SQ SN:10_0.802277%_cov_294_len_562 LN:562
          samtools view -H accepted_hits.bam
          @HD VN:1.0 SO:coordinate
          @SQ SN:chr1 LN:249250621
          @SQ SN:chr10 LN:135534747
          I guess I downloaded the wrong annotation because I see this in the sample-file sample_count.txt:
          A1BG 226
          A1BG-AS1 202
          A1CF 5
          However, my .gtf file doesn't contain such names of genes. I can't find the annotation hg19 from UCSC which include such names.

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            Best solution is to get the sequence/annotation bundle from Illumina's iGenomes site (note: link is for big file download). In that download you will find a genes.gtf file in Annotation/Genes folder. Use that for your counting.

            I also recommend that you use featureCounts. Much faster and easier to use.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Pathogen Surveillance with Advanced Genomic Tools
              by seqadmin




              The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
              03-24-2025, 11:48 AM
            • seqadmin
              New Genomics Tools and Methods Shared at AGBT 2025
              by seqadmin


              This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

              The Headliner
              The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
              03-03-2025, 01:39 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-20-2025, 05:03 AM
            0 responses
            41 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-19-2025, 07:27 AM
            0 responses
            51 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-18-2025, 12:50 PM
            0 responses
            38 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-03-2025, 01:15 PM
            0 responses
            193 views
            0 reactions
            Last Post seqadmin  
            Working...