Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • htseq-count low count problem

    Hi guys,

    I am looking at some RNA-seq data using DEseq2, before this I need to get a count table for each gene.

    here is what i have done:

    mapped the stranded, paired-end reads to transcriptome using tophat2:
    Code:
    tophat -p 12 -r 60 -o $out --transcriptome-only --no-novel-juncs --no-coverage-search --library-type fr-firststrand --transcriptome-ind
    ex=$known $hg19 lane1.1.repaa_val_1.fq lane1.2.repaa_val_2.fq
    I got about 6.5 million mapped pairs, which is about 50% of the input reads.

    then i took the mapped reads and count them against a gtf table

    Code:
    htseq-count -f bam -r pos -t exon -i gene_id accepted_hits.bam hg19.gtf > accepted_hits.bam.counts
    I got 0.4 million reads counted into the table and the number of no feature reads is about 13 million.

    I tried to sort the bam files using samtools and use -r name option in the htseq-count line but it also didn't work.

    this is how the gtf file look like
    Code:
    chr1	unknown	exon	11874	12227	.	+	.	gene_id "DDX11L1"; gene_name "DDX11L1"; transcript_id "NR_046018"; tss_id "TSS14844";
    chr1	unknown	exon	12613	12721	.	+	.	gene_id "DDX11L1"; gene_name "DDX11L1"; transcript_id "NR_046018"; tss_id "TSS14844";
    chr1	unknown	exon	13221	14409	.	+	.	gene_id "DDX11L1"; gene_name "DDX11L1"; transcript_id "NR_046018"; tss_id "TSS14844";
    chr1	unknown	exon	14362	14829	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	14970	15038	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	15796	15947	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	16607	16765	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	16858	17055	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	17233	17368	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	17606	17742	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	17915	18061	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	18268	18366	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	24738	24891	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	29321	29370	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	34611	35174	.	-	.	gene_id "FAM138A"; gene_name "FAM138A"; transcript_id "NR_026818_1"; tss_id "TSS8403";
    chr1	unknown	exon	34611	35174	.	-	.	gene_id "FAM138F"; gene_name "FAM138F"; transcript_id "NR_026820_1"; tss_id "TSS8403";
    chr1	unknown	exon	35277	35481	.	-	.	gene_id "FAM138A"; gene_name "FAM138A"; transcript_id "NR_026818_1"; tss_id "TSS8403";
    chr1	unknown	exon	35277	35481	.	-	.	gene_id "FAM138F"; gene_name "FAM138F"; transcript_id "NR_026820_1"; tss_id "TSS8403";
    chr1	unknown	exon	35721	36081	.	-	.	gene_id "FAM138A"; gene_name "FAM138A"; transcript_id "NR_026818_1"; tss_id "TSS8403";
    chr1	unknown	exon	35721	36081	.	-	.	gene_id "FAM138F"; gene_name "FAM138F"; transcript_id "NR_026820_1"; tss_id "TSS8403";
    chr1	unknown	CDS	69091	70005	.	+	0	gene_id "OR4F5"; gene_name "OR4F5"; p_id "P1230"; transcript_id "NM_001005484"; tss_id "TSS14428";
    chr1	unknown	exon	69091	70008	.	+	.	gene_id "OR4F5"; gene_name "OR4F5"; p_id "P1230"; transcript_id "NM_001005484"; tss_id "TSS14428";
    chr1	unknown	start_codon	69091	69093	.	+	.	gene_id "OR4F5"; gene_name "OR4F5"; p_id "P1230"; transcript_id "NM_001005484"; tss_id "TSS14428";
    chr1	unknown	stop_codon	70006	70008	.	+	.	gene_id "OR4F5"; gene_name "OR4F5"; p_id "P1230"; transcript_id "NM_001005484"; tss_id "TSS14428";
    Any ideas? thanks

  • #2
    Have a look at things in IGV or use the -o option to track what's happening to reads that aren't getting counted but you think should be.

    Comment


    • #3
      Got most mapped reads counted if I specify -s no, but the library was made using a strand-specific protocol and mapped using tophat in --library-type fr-firststrand mode. Don't know why.

      Comment


      • #4
        I'd have to recheck the strandedness settings, maybe you just need "-s reverse" to match your library type.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          The Impact of AI in Genomic Medicine
          by seqadmin



          Article Coming Soon......
          Today, 02:07 PM
        • seqadmin
          Multiomics Techniques Advancing Disease Research
          by seqadmin


          New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

          A major leap in the field has
          ...
          02-08-2024, 06:33 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 02-23-2024, 04:11 PM
        0 responses
        31 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 02-21-2024, 08:52 AM
        0 responses
        46 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 02-20-2024, 08:57 AM
        0 responses
        37 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 02-14-2024, 09:19 AM
        0 responses
        63 views
        0 likes
        Last Post seqadmin  
        Working...
        X