Error while using htseq-count
Hi,
I am getting an error while using htseq-count.
I used the following command:
htseq-count CL1_forBWA_left_DSS3.sam NC_003911.gff
This is the error i am getting. I got all the files from ncbi.
Error occured in line 2169 of file NC_003911.gff.
Error: Feature tRNA does not contain a 'gene_id' attribute
[Exception type: SystemExit, raised in count.py:55]
I would really appreciate if anyone can help me out.
Thanks
Shalabh
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
For an explanation why htseq-count skips multiply matched reads, see post #4 of this thread: http://seqanswers.com/forums/showthread.php?t=9129
Now, if you have two mappings for a read and one has high quality and the other does not, it would indeed make sense to use the good one and discard the bad one rather than skipping both. Hence, you could sort you SAM file by read ID, so that multiple mappings are in adjacent lines and the write a script to filter the best one. Here is a quick attempt to do this with HTSeq:
Code:import sys, re import HTSeq insam = HTSeq.SAM_Reader( sys.stdin ) # Go through all reads, with their alignments bundled up: for bundle in HTSeq.bundle_multiple_alignments( insam ): bestAlmt = None # Go through all alignments of a given read, looking # for the one with the best alignment score for almt in bundle: if bestAlmt is None: bestAlmt = almt elif almt.aQual > bestAlmt.aQual: bestAlmt = almt elif almt.aQual == bestAlmt: # If there are more than one best alignment, # better skip the read bestAlmt = None if bestAlmt is not None: # Change the NH field to 1 and print the line print re.sub( "NH:i:\d+", "NH:i:1", bestAlmt.original_sam_line ),
Code:sort samfile.sam | python chooseBest.py > filtered.sam
Be aware hat I haven't really tested this thoroughly.
Leave a comment:
-
That is easily doable - but this would imply that reads are counted several time as part of different alignments. To my knowledege such a way of counting is not compatible with the statistical model used by DESeq and this is actually what I would like to use the counts for.
Leave a comment:
-
Hi Tobi, could you write a script to change the NH flag to NH:i:1 in the sam file so that htseq-count doesn't ignore them?
Leave a comment:
-
alignment_not_unique
Hi HTSeq users,
I realized that the htseq-count script is very conservative when counting reads matching to known gene models:
I generated an alignment of my RNA-seq reads to the Drosophila genome using tophat in combination with known Ensembl gene models (provided by gtf file). The output in sam format looks like this:
Code:[tobiasko@bs-dsvr09 tophat_out]$ head accepted_hits.sam -1350550920:7:9:328:542 0 chr2L 8880 3 35M * 0 0 CTGGACTCGCAAAGTGGACTTGTTCAGCAAGGACA 6IC@/IIIIIFABI2II1III3'E2>@I:,&'(31 NM:i:0 NH:i:2 CC:Z:chrX CP:i:19097273 XS:A:- -1350550920:7:77:1586:563 0 chr2L 8883 3 35M * 0 0 GACTCGCAAAGTGGACTTGTTCAGCAAGGACATAA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NM:i:0 NH:i:2 CC:Z:chrX CP:i:19097270 XS:A:- -1350550920:7:24:1131:1918 0 chr2L 9327 3 35M * 0 0 TTTTACAAATCTTACGTAAACACTCCAAGCATGAA IIIIIIIIIIIIIIII?III=ICEI;EI:IIDII; NM:i:0 NH:i:2 CC:Z:chrX CP:i:19096819 XS:A:-
I wonder, if there is a way to either tune tophat to only report the best alignment per read like one can do when using bowtie with -k1 --best, or to modify the couting scheme of htseq-count to be less stringent (e.g. count every read once at the best alignment position).
Greetings,
Tobi
Leave a comment:
-
Install instructions for CentOS (RHEL)
Hi,
for what it's worth I detailed the additional steps necessary to install HTSeq on CentOS (should work on Red Head Enterprise as well) in RNA-Seq howto wiki. (Or rather in the first incomplete alpha version)
Best Wishes,
Björn
Leave a comment:
-
I am runing DEXseq and htseq-count and have some problems. Maybe others are interesting to this or have the answer, so I post to here again.
When I run dexseq_count.py in DEXseq on my sam file from bwa, it always has this warning:
~/python/lib/python2.7/site-packages/HTSeq-0.5.3p1-py2.7-linux-x86_64.egg/HTSeq/__init__.py:543:
UserWarning: Malformed SAM line: RNAME != '*' although flag bit
&0x0004 set
algnt = SAM_Alignment.from_SAM_line( line )
~/python/lib/python2.7/site-packages/HTSeq-0.5.3p1-py2.7-linux-x86_64.egg/HTSeq/__init__.py:592:
UserWarning: Read GA2-EAS337_0017_FC:1:8:11202:16260#0 claims to have
an aligned mate which could not be found. (Is the SAM file properly
sorted?)
"which could not be found. (Is the SAM file properly sorted?)" )
but I can found GA2-EAS337_0017_FC:1:8:11202:16260#0 mate in the data,
as show in the attachments a.sam. And the a.sam is sorted by the name
(default by bwa).
In htseq-count, it also have this problem.
htseq-count a.sam Homo_sapiens.GRCh37.63/Homo_sapiens.GRCh37.63.gtf
100000 GFF lines processed.
200000 GFF lines processed.
...
2000000 GFF lines processed.
2064769 GFF lines processed.
Warning: Malformed SAM line: RNAME != '*' although flag bit &0x0004 set
Warning: Read GA2-EAS337_0017_FC:1:8:11202:16260#0 claims to have an
aligned mate which could not be found. (Is the SAM file properly
sorted?)
4 sam line pairs processed.
Why here report 4 line pairs? In fact, there are 3 sam line pairs in a.sam.
BTW: Do this kind of count using BWA or tophat
mapping, which one is better? (Illumina pair-end human RNA-seq data)
Thank you.Attached Files
Leave a comment:
-
Oops, I didn't realise that I had the quiet option on! Yes I'm now getting warnings about the chromosomes not appearing in the GFF file. I guess these reads are not included in the output SAM file? Thanks for your help Simon.
Leave a comment:
-
Thanks for your reply. No I didn't get any warnings about reads being skipped. I'll email the files to you.
Leave a comment:
-
Ho Joro,
did you get any warnings about reads being skipped? Otherwise, could you send me some examples of SAM lines that are in the input but not the output file, togetehr with your GTF file? Thanks.
Simon
Leave a comment:
-
Hi,
I have some reads in the original SAM file that don't seem to be accounted for in the output of htseq-count. To look in more detail I created a new SAM file using --samout. The reads aren't in this SAM file. Presumably they should be in the SAM file but assigned something like "no_feature"? Any suggestions are welcome, thanks.
Leave a comment:
-
In principle, you could follow the 'Tour' in the HTSeq documentation to learn how to do this yourself.
The htseq-count script falls into two parts: the main part doing the actual counting, and surrounding it a thin wrapper to get the intervals out of the SAM file. This is all that needs to be changed, and once I separate these into two modular parts, it will be even easier. Might take a while until I find the time to so, though.
S
Leave a comment:
-
Will HTSeq be able to count reads from a bed file
Dear Simon,
My case is somehow unusual. I have a mapped sequence reads file in bed format. I wonder if you could point me out how I can use HTSeq for reads counts on gene basis. I understand that the bed file contains only interval data and doesn't have quality scores for each base. But in my case, I only need read counts. I have limited knowledge about python.
Your help will be greatly appreciated!
Best
Steve
Leave a comment:
-
The '-i' and '-t' options give you some flexibility what htseq-count is counted.
If this is insufficient, please remember that htseq-count was actually meant not so much as a tool in its own right but rather a demonstration of the possibilities of HTSeq, our Python framework for HTS analysis. As demonstrated in the documentation, it is fairly easy (if you know Python or are willing to learn it) to write your own counting script using HTSeq and then implement whatever custom logic you need for your use case.
Leave a comment:
-
Another quick question: is it possible with htseq-counts to exclude the reads from the 3' UTR?
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.
3D Genomics
While spatial biology often involves studying proteins and RNAs in their...-
Channel: Articles
01-01-2025, 07:30 PM -
-
by seqadmin
Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...-
Channel: Articles
12-16-2024, 07:57 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 01-09-2025, 04:04 PM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
01-09-2025, 04:04 PM
|
||
Started by seqadmin, 01-09-2025, 09:42 AM
|
0 responses
19 views
0 likes
|
Last Post
by seqadmin
01-09-2025, 09:42 AM
|
||
Started by seqadmin, 01-08-2025, 03:17 PM
|
0 responses
29 views
0 likes
|
Last Post
by seqadmin
01-08-2025, 03:17 PM
|
||
Started by seqadmin, 01-03-2025, 11:18 AM
|
1 response
47 views
1 like
|
Last Post
by Tonia
01-05-2025, 12:15 PM
|
Leave a comment: