Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • htseq-count low count problem

    Hi guys,

    I am looking at some RNA-seq data using DEseq2, before this I need to get a count table for each gene.

    here is what i have done:

    mapped the stranded, paired-end reads to transcriptome using tophat2:
    Code:
    tophat -p 12 -r 60 -o $out --transcriptome-only --no-novel-juncs --no-coverage-search --library-type fr-firststrand --transcriptome-ind
    ex=$known $hg19 lane1.1.repaa_val_1.fq lane1.2.repaa_val_2.fq
    I got about 6.5 million mapped pairs, which is about 50% of the input reads.

    then i took the mapped reads and count them against a gtf table

    Code:
    htseq-count -f bam -r pos -t exon -i gene_id accepted_hits.bam hg19.gtf > accepted_hits.bam.counts
    I got 0.4 million reads counted into the table and the number of no feature reads is about 13 million.

    I tried to sort the bam files using samtools and use -r name option in the htseq-count line but it also didn't work.

    this is how the gtf file look like
    Code:
    chr1	unknown	exon	11874	12227	.	+	.	gene_id "DDX11L1"; gene_name "DDX11L1"; transcript_id "NR_046018"; tss_id "TSS14844";
    chr1	unknown	exon	12613	12721	.	+	.	gene_id "DDX11L1"; gene_name "DDX11L1"; transcript_id "NR_046018"; tss_id "TSS14844";
    chr1	unknown	exon	13221	14409	.	+	.	gene_id "DDX11L1"; gene_name "DDX11L1"; transcript_id "NR_046018"; tss_id "TSS14844";
    chr1	unknown	exon	14362	14829	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	14970	15038	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	15796	15947	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	16607	16765	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	16858	17055	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	17233	17368	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	17606	17742	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	17915	18061	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	18268	18366	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	24738	24891	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	29321	29370	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	34611	35174	.	-	.	gene_id "FAM138A"; gene_name "FAM138A"; transcript_id "NR_026818_1"; tss_id "TSS8403";
    chr1	unknown	exon	34611	35174	.	-	.	gene_id "FAM138F"; gene_name "FAM138F"; transcript_id "NR_026820_1"; tss_id "TSS8403";
    chr1	unknown	exon	35277	35481	.	-	.	gene_id "FAM138A"; gene_name "FAM138A"; transcript_id "NR_026818_1"; tss_id "TSS8403";
    chr1	unknown	exon	35277	35481	.	-	.	gene_id "FAM138F"; gene_name "FAM138F"; transcript_id "NR_026820_1"; tss_id "TSS8403";
    chr1	unknown	exon	35721	36081	.	-	.	gene_id "FAM138A"; gene_name "FAM138A"; transcript_id "NR_026818_1"; tss_id "TSS8403";
    chr1	unknown	exon	35721	36081	.	-	.	gene_id "FAM138F"; gene_name "FAM138F"; transcript_id "NR_026820_1"; tss_id "TSS8403";
    chr1	unknown	CDS	69091	70005	.	+	0	gene_id "OR4F5"; gene_name "OR4F5"; p_id "P1230"; transcript_id "NM_001005484"; tss_id "TSS14428";
    chr1	unknown	exon	69091	70008	.	+	.	gene_id "OR4F5"; gene_name "OR4F5"; p_id "P1230"; transcript_id "NM_001005484"; tss_id "TSS14428";
    chr1	unknown	start_codon	69091	69093	.	+	.	gene_id "OR4F5"; gene_name "OR4F5"; p_id "P1230"; transcript_id "NM_001005484"; tss_id "TSS14428";
    chr1	unknown	stop_codon	70006	70008	.	+	.	gene_id "OR4F5"; gene_name "OR4F5"; p_id "P1230"; transcript_id "NM_001005484"; tss_id "TSS14428";
    Any ideas? thanks

  • #2
    Have a look at things in IGV or use the -o option to track what's happening to reads that aren't getting counted but you think should be.

    Comment


    • #3
      Got most mapped reads counted if I specify -s no, but the library was made using a strand-specific protocol and mapped using tophat in --library-type fr-firststrand mode. Don't know why.

      Comment


      • #4
        I'd have to recheck the strandedness settings, maybe you just need "-s reverse" to match your library type.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Exploring the Dynamics of the Tumor Microenvironment
          by seqadmin




          The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
          07-08-2024, 03:19 PM
        • seqadmin
          Exploring Human Diversity Through Large-Scale Omics
          by seqadmin


          In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
          06-25-2024, 06:43 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 07-10-2024, 07:30 AM
        0 responses
        19 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 07-03-2024, 09:45 AM
        0 responses
        197 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 07-03-2024, 08:54 AM
        0 responses
        207 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 07-02-2024, 03:00 PM
        0 responses
        190 views
        0 likes
        Last Post seqadmin  
        Working...
        X