Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • lanner
    Member
    • Apr 2014
    • 29

    Zero counts in htseq-count output

    I am pretty desperate, so if anyone has any advice, I will be forever grateful.

    Basically, I am using the following command:

    htseq-count -s no -a 0 FourA.sam hg19.gtf > FourA.count

    and getting zero counts in the output. I have tried -s as yes/no (I am almost sure it should be "yes" based off the site I got my files), and -a between 0 and 10. No matter what parameters, I get zero counts.

    This is for a quick project, so I am not going for accuracy, I just want any results (of any quality)!

    Here is the background of my data
    1) Downloaded .sra file.
    2) Used fastq-dump to convert .sra file to .fastq file
    3) Only kept 1/10 sequences from .fastq file (it was too large).
    4) On Galaxy: Ran "Fastq Groomer" tool to remove the leading quality score for the adaptor base (input 'Color Space Sanger', default for the rest)
    5) On Galaxy: Ran "Manipulation Reads" to convert to base-space "0123. -> ATCGN" and remove the adaptor itself. This generated fastqcssanger file.
    6) On Galaxy: Ran ""Map with BWA for SOLiD" tool on the fastqcssanger file and mrna.fa file to get the SAM file

    I downloaded the mrna.fa file from: (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/)

    Any advice is greatly appreciated. I don't think I have time to rerun/remap. If there is any faster solution for where I went wrong in this pipeline, I would need to do that. I just need something to present, regardless of the quality, and will admit that to my class!

    (Also, I am getting an warning "Warning: Malformed SAM line: RNAME != '*' although flag bit &0x0004 set", although I read in other forums that is usually ignorable).
    Last edited by lanner; 05-04-2014, 07:16 PM. Reason: To add detail of the warning
  • lanner
    Member
    • Apr 2014
    • 29

    #2
    This is head output of some of my files:

    hg19.gtf:

    chr1 hg19_refGene start_codon 67000042 67000044 0.000000 + . gene_id "NM_032291"; transcript_id "NM_032291";
    chr1 hg19_refGene CDS 67000042 67000051 0.000000 + 0 gene_id "NM_032291"; transcript_id "NM_032291";
    chr1 hg19_refGene exon 66999825 67000051 0.000000 + . gene_id "NM_032291"; transcript_id "NM_032291";
    chr1 hg19_refGene CDS 67091530 67091593 0.000000 + 2 gene_id "NM_032291"; transcript_id "NM_032291";
    chr1 hg19_refGene exon 67091530 67091593 0.000000 + . gene_id "NM_032291"; transcript_id "NM_032291";
    chr1 hg19_refGene CDS 67098753 67098777 0.000000 + 1 gene_id "NM_032291"; transcript_id "NM_032291";
    chr1 hg19_refGene exon 67098753 67098777 0.000000 + . gene_id "NM_032291"; transcript_id "NM_032291";
    chr1 hg19_refGene CDS 67101627 67101698 0.000000 + 0 gene_id "NM_032291"; transcript_id "NM_032291";
    chr1 hg19_refGene exon 67101627 67101698 0.000000 + . gene_id "NM_032291"; transcript_id "NM_032291";
    chr1 hg19_refGene CDS 67105460 67105516 0.000000 + 0 gene_id "NM_032291"; transcript_id "NM_032291";


    mrna.fasta:

    >AF001540 1
    ggcacgaggcaggtctgtctgttctgttggcaagtaaatgcagtactgtt
    ctgatcccgctgctattagaatgcattgtgaaacgactggagtatgatta
    aaagttgtgttccccaatgcttggagtagtgattgttgaaggaaaaaatc
    cagctgagtgataaggctgagtgttgaggaaatttctgcagttttaagca
    gtcgtatttgtgattgaagctgagtacatttgctggtgtatttttaggta
    aaatgcttttttgttcatttctgggtggtgggaggggactgaagccttta
    gtcttttccagatgcaaccttaaaatcagtgacaagaaacattccaaaca
    agcaacagtcttcaagaaattaaactggcaagtggaaatgtttaaacagt
    tcagtgatctttagtgcattgtttatgtgtgggtttctctctcccctccc


    FourA.sam:

    @SQ SN:AF001540 LN:1781
    @SQ SN:AF001541 LN:1138
    @SQ SN:AF001542 LN:2992
    @SQ SN:AF001543 LN:903
    @SQ SN:AF001544 LN:434
    @SQ SN:AF001545 LN:370
    @SQ SN:AF001546 LN:1142
    @SQ SN:AF001547 LN:1092
    @SQ SN:AF034176 LN:7232
    @SQ SN:AF038950 LN:2384

    FourA.count (head and tail)

    head:
    NM_000014 0
    NM_000015 0
    NM_000016 0
    NM_000017 0
    NM_000018 0
    NM_000019 0
    NM_000020 0
    NM_000021 0
    NM_000022 0
    NM_000023 0

    tail:
    NR_046431 0
    NR_046432 0
    NR_046433 0
    NR_046435 0
    NR_046436 0
    no_feature 257
    ambiguous 0
    too_low_aQual 0
    not_aligned 6366159
    alignment_not_unique 0

    Comment

    • dpryan
      Devon Ryan
      • Jul 2011
      • 3478

      #3
      Since you mapped against the transcriptome, htseq-count (or any tool for that matter) won't be able to give you counts. The reason for this is that these tools look at the coordinates in the SAM file (e.g., "AF001540:100-300") and then the GTF file and try to see if they overlap (and if the read overlaps multiple things). Since AF001540 (or any other transcript ID) doesn't exist as a chromosome or contig in your annotation file, this will never work.

      You have two possibilities for moving forward. (1) Map to the genome with STAR or tophat2 or a similar tool. The resulting alignments will work fine with htseq-counts (or featureCounts, which is faster). (2) Realize that the counts are equivalent to the number of reads mapping to each contig/transcript and that you can get this information with "samtools idxstats sorted.and.indexed.alignments.bam". Option (2) is the less ideal of the two, since you can't filter by MAPQ (well you can, but you need to just create a new BAM file to do so) and even if you could it would be misleading since you'd have reads that look like they map to multiple places but actually don't since it's likely you have many transcripts that overlap. I would strongly encourage you to just remap things.

      Comment

      Latest Articles

      Collapse

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, Today, 10:09 AM
      0 responses
      8 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, Yesterday, 08:59 AM
      0 responses
      14 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 12:03 PM
      0 responses
      23 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 11:40 AM
      0 responses
      20 views
      0 reactions
      Last Post SEQadmin2  
      Working...