Seqanswers Leaderboard Ad

**lanner** · 05-04-2014, 04:22 PM

This is head output of some of my files:

hg19.gtf:

chr1 hg19_refGene start_codon 67000042 67000044 0.000000 + . gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene CDS 67000042 67000051 0.000000 + 0 gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene exon 66999825 67000051 0.000000 + . gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene CDS 67091530 67091593 0.000000 + 2 gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene exon 67091530 67091593 0.000000 + . gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene CDS 67098753 67098777 0.000000 + 1 gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene exon 67098753 67098777 0.000000 + . gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene CDS 67101627 67101698 0.000000 + 0 gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene exon 67101627 67101698 0.000000 + . gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene CDS 67105460 67105516 0.000000 + 0 gene_id "NM_032291"; transcript_id "NM_032291";

mrna.fasta:

>AF001540 1
ggcacgaggcaggtctgtctgttctgttggcaagtaaatgcagtactgtt
ctgatcccgctgctattagaatgcattgtgaaacgactggagtatgatta
aaagttgtgttccccaatgcttggagtagtgattgttgaaggaaaaaatc
cagctgagtgataaggctgagtgttgaggaaatttctgcagttttaagca
gtcgtatttgtgattgaagctgagtacatttgctggtgtatttttaggta
aaatgcttttttgttcatttctgggtggtgggaggggactgaagccttta
gtcttttccagatgcaaccttaaaatcagtgacaagaaacattccaaaca
agcaacagtcttcaagaaattaaactggcaagtggaaatgtttaaacagt
tcagtgatctttagtgcattgtttatgtgtgggtttctctctcccctccc

FourA.sam:

@SQ SN:AF001540 LN:1781
@SQ SN:AF001541 LN:1138
@SQ SN:AF001542 LN:2992
@SQ SN:AF001543 LN:903
@SQ SN:AF001544 LN:434
@SQ SN:AF001545 LN:370
@SQ SN:AF001546 LN:1142
@SQ SN:AF001547 LN:1092
@SQ SN:AF034176 LN:7232
@SQ SN:AF038950 LN:2384

FourA.count (head and tail)

head:
NM_000014 0
NM_000015 0
NM_000016 0
NM_000017 0
NM_000018 0
NM_000019 0
NM_000020 0
NM_000021 0
NM_000022 0
NM_000023 0

tail:
NR_046431 0
NR_046432 0
NR_046433 0
NR_046435 0
NR_046436 0
no_feature 257
ambiguous 0
too_low_aQual 0
not_aligned 6366159
alignment_not_unique 0

**dpryan** · 05-05-2014, 12:36 AM

Since you mapped against the transcriptome, htseq-count (or any tool for that matter) won't be able to give you counts. The reason for this is that these tools look at the coordinates in the SAM file (e.g., "AF001540:100-300") and then the GTF file and try to see if they overlap (and if the read overlaps multiple things). Since AF001540 (or any other transcript ID) doesn't exist as a chromosome or contig in your annotation file, this will never work.

You have two possibilities for moving forward. (1) Map to the genome with STAR or tophat2 or a similar tool. The resulting alignments will work fine with htseq-counts (or featureCounts, which is faster). (2) Realize that the counts are equivalent to the number of reads mapping to each contig/transcript and that you can get this information with "samtools idxstats sorted.and.indexed.alignments.bam". Option (2) is the less ideal of the two, since you can't filter by MAPQ (well you can, but you need to just create a new BAM file to do so) and even if you could it would be misleading since you'd have reads that look like they map to multiple places but actually don't since it's likely you have many transcripts that overlap. I would strongly encourage you to just remap things.

Topics	Statistics	Last Post
Study Reveals How Bacteria Defend Against Viral Attacks by seqadmin Started by seqadmin, 08-27-2024, 04:40 AM	0 responses 16 views 0 likes	Last Post by seqadmin 08-27-2024, 04:40 AM
New Single-Molecule Sequencing Platform Introduces Advanced Features for High-Throughput Genomics by seqadmin Started by seqadmin, 08-22-2024, 05:00 AM	0 responses 293 views 0 likes	Last Post by seqadmin 08-22-2024, 05:00 AM
New DNA Code Discovered Revealing Complex Gene Regulation Mechanisms by seqadmin Started by seqadmin, 08-21-2024, 10:49 AM	0 responses 135 views 0 likes	Last Post by seqadmin 08-21-2024, 10:49 AM
Epigenetic Clocks Derived from Retroelements Offer New Insights into Aging by seqadmin Started by seqadmin, 08-19-2024, 05:12 AM	0 responses 124 views 0 likes	Last Post by seqadmin 08-19-2024, 05:12 AM

Seqanswers Leaderboard Ad

Announcement

Zero counts in htseq-count output

Comment

Comment

Latest Articles

ad_right_rmr

News