Best solution is to get the sequence/annotation bundle from Illumina's iGenomes site (note: link is for big file download). In that download you will find a genes.gtf file in Annotation/Genes folder. Use that for your counting.
I also recommend that you use featureCounts. Much faster and easier to use.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Thanks for your attention to my problem, GenoMax.
samtools view -H temp.samheader.sam
@HD VN:1.0 SO:unsorted
@SQ SN:0_2.65543%_cov_993_len_552 LN:552
@SQ SN:1_2.28932%_cov_1093_len_446 LN:446
@HD VN:1.0 SO:coordinate
@SQ SN:0_2.65543%_cov_993_len_552 LN:552
@SQ SN:10_0.802277%_cov_294_len_562 LN:562
@HD VN:1.0 SO:coordinate
@SQ SN:chr1 LN:249250621
@SQ SN:chr10 LN:135534747
A1BG 226
A1BG-AS1 202
A1CF 5
Leave a comment:
-
Can you post a few lines from your SAM/BAM file? Just part of the SAM header should be fine.
Leave a comment:
-
Originally posted by GenoMax View PostIt looks like you are mixing and matching genome builds and annotation. "chr" designations on your chromosomes indicates that the data is from UCSC but your annotation has ENST* identifiers which indicates Ensembl as the origin.
I suggest that you find annotation that uses the chr designations for doing the counting.
BTW: Please stop using TopHat. It is not even recommended by its authors. There are current options like STAR/bbmap/HISAT2 that should be used instead.
I downloaded the annotation from http://genome.ucsc.edu/cgi-bin/hgTables - is it correct?
head my .gtf file:
chr1 hg19_knownGene exon 11874 12227 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
chr1 hg19_knownGene exon 12613 12721 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
chr1 hg19_knownGene exon 13221 14409 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
my .fastq file:
@ERR599187.1 HWI-ST1034:1070DURACXX:4:1101:1451:1992/1
ATCGCCTCCTGNTAGGCCAGCTCTACATCTTCTTGGGTGACCACTAGTTTAATCTTATTCCATTTCTCAGGGATATCCAGCCACTGGTGCACAGCCACTGC
+
@@@DDDDDHHH#2AEHG>EG?GAGA>FECHIIIICEG?GFBBCGHDFIFGIIIIGGHAFHIIIGG=DDAHBHH@DEDEECBCC@BCCCCCCCBBBCCCC
@ERR599187.2 HWI-ST1034:1070DURACXX:4:1101:1687:1994/1
AAAAGAAGTAANATAAGAAGGCAATGCTTGTGGAATGTACAGTGCATATTGGCGGCGCACGCCTCATTACGATTCGCCTGCTTTCTTCTCCTGTTCAATCG
+
@@@DDDDDHHH#3AFHGGGGII@EHIGIIIIIGIBGHHIFGI0BFBGIII@H<FE8=/398=89@@CD#################################
@ERR599187.3 HWI-ST1034:1070DURACXX:4:1101:1636:1996/1
CACCCACGGAANCGAGAAAGAGCTATCAATCTGTCAATCCTGTCCGTGTCCGGGCCGGGTGAGGGTTCCCGTGTTGGGTCAAATTAAGCCGCAGGCTCCAC
+
@@@?DD=DD<F#222CFHIGIC;FHEDDGH@H4?<9BGHE<FFHGGIGIIGHA4=>C@#########################################
uc031tkv.1 0
uc031tkw.1 0
uc031tkx.1 0
uc031tky.1 0
uc031tkz.1 0
__no_feature 4
__ambiguous 0
__too_low_aQual 0
__not_aligned 0
__alignment_not_unique 0
Leave a comment:
-
It looks like you are mixing and matching genome builds and annotation. "chr" designations on your chromosomes indicates that the data is from UCSC but your annotation has ENST* identifiers which indicates Ensembl as the origin.
I suggest that you find annotation that uses the chr designations for doing the counting.
BTW: Please stop using TopHat. It is not even recommended by its authors. There are current options like STAR/bbmap/HISAT2 that should be used instead.
Leave a comment:
-
Reads are 0 in htseq-count
Hello,
I have a trouble using htseq-count, I used TopHat to aling my data, but when I tried to count the number of reads all the genes present 0 read.
I checked the chromosome names in ./gtf file: cut -f 1 -d " " ~/HG19/genes.gtf | cut -f 1 | sort | uniq So all of them are started with "chr". The same result was accepted by bowtie2-inspect --names ~/HG19/genome. (in .gtf file, I've seen the numbers like "chrHSCHR9_2_CTG3" and don't know what it is).
1) programs/tophat2 -p 4 -r 50 -G ~/HG19/genes.gtf -o tst_output HG19/genome tst_fastq Result: accepted_hits.bam deletions.bed junctions.bed prep_reads.info align_summary.txt insertions.bed logs unmapped.bam
samtools view -H ./accepted_hits.bam:
@HD VN:1.0 SO:coordinate @SQ SN:chr1 LN:249250621 @SQ SN:chr10 LN:135534747 @SQ SN:chr11 LN:135006516 @SQ SN:chr12 LN:133851895 @SQ SN:chr13 LN:115169878 @SQ SN:chr14 LN:107349540 @SQ SN:chr15 LN:102531392 @SQ SN:chr16 LN:90354753 @SQ SN:chr17 LN:81195210 @SQ SN:chr18 LN:78077248 @SQ SN:chr19 LN:59128983 @SQ SN:chr2 LN:243199373 @SQ SN:chr20 LN:63025520 @SQ SN:chr21 LN:48129895 @SQ SN:chr22 LN:51304566 @SQ SN:chr3 LN:198022430 @SQ SN:chr4 LN:191154276 @SQ SN:chr5 LN:180915260 @SQ SN:chr6 LN:171115067 @SQ SN:chr7 LN:159138663 @SQ SN:chr8 LN:146364022 @SQ SN:chr9 LN:141213431 @SQ SN:chrM LN:16571 @SQ SN:chrX LN:155270560 @SQ SN:chrY LN:59373566 @PG ID:TopHat VN:2.1.1 CLrograms/tophat -p 4 -r 50 -G ~/HG19/genes.gtf -o tst_output HG19/genome tst.fastq
Reads: Input : 99 Mapped : 76 (76.8% of input) of these: 1 ( 1.3%) have multiple alignments (0 have >20) 76.8% overall read mapping rate.
ENST00000610276 0 ENST00000610277 0 ENST00000610278 0 ENST00000610279 0 ENST00000610280 0 __no_feature 40 __ambiguous 27 __too_low_aQual 0 __not_aligned 0 __alignment_not_unique 15
PS: after samtools sort -n -o temp-sorted.bam accepted_hits.bam I have the same result ("SO:queryname" in temp-sorted.bam file).
Thanks,
Maria.
Latest Articles
Collapse
-
by seqadmin
The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...-
Channel: Articles
11-06-2024, 07:24 PM -
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 11:09 AM
|
0 responses
23 views
0 likes
|
Last Post
by seqadmin
Today, 11:09 AM
|
||
Started by seqadmin, Today, 06:13 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
Today, 06:13 AM
|
||
Started by seqadmin, 11-01-2024, 06:09 AM
|
0 responses
30 views
0 likes
|
Last Post
by seqadmin
11-01-2024, 06:09 AM
|
||
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, 10-30-2024, 05:31 AM
|
0 responses
21 views
0 likes
|
Last Post
by seqadmin
10-30-2024, 05:31 AM
|
Leave a comment: