HTSeq: A Python framework to work with high-throughput sequencing data

apredeus replied

07-23-2014, 07:12 PM
I've had a quick question about this picture:

does it matter for "ambiguos" reads if they land on the right strand? I.e. for cases shown in the two last cases, if gene A and gene B are on opposite strands, and the library is stranded, there is no ambiguity actually. Is that taken into consideration?

Thank you in advance!
Leave a comment:
dlepe replied

07-21-2014, 08:16 AM
Hi Simon,
I'm analyzing SOLID data using bowtie for mapping and htseq for quantification. The thing is when I used the --stranded parameter (I tried it just to familiarize myself with htseq) I get very similar numbers whether I set it to yes or no.

For example for my 001_02.count file when --stranded=yes
__no_feature 278195
__ambiguous 26690
__too_low_aQual 0
__not_aligned 0
__alignment_not_unique 0

For example for my 001_02.count file when --stranded=no
__no_feature 255213
__ambiguous 115445
__too_low_aQual 0
__not_aligned 0
__alignment_not_unique 0

Since my protocol wasn't stranded I should be losing half the counts when --stranded=yes but as you can see this was not the case.. I tried the same for some Illumina data I have access to and got this, which I think its alright.

stranded=yes __no_feature 9381365
stranded=no __no_feature 492513

So after struggling with this for a while the only thing I found was that the sam files for the SOLID data only have two different flags 0 or 16, which I'm guessing is not enough information for htseq?

707_1366_1065 16 Chr1 1078 255 28M * 0 0 CCCCCCCCCCCCCACCCCCCAAATTGAG [\L!2_______UBL__ZU!"_______ XA:i:2 MD:Z:28 NM:i:0 CM:i:2
42_176_82 0 Chr1 4868 255 73M * 0 0 GGCGGTCAGTGGCTGAGTGACTATATCGACCTGCAACAGCAAGTTCCTTACTTGGCACCTTATGAAAATGAGT ___________________________________________UU______ZY^________^Z[_^^__\KM XA:i:0 MD:Z:73 NM:i:0 CM:i:0

so my question is, are the results I'm getting for the SOLID data with --stranded=no reliable?
Leave a comment:
Lagzxadr replied

01-16-2014, 06:56 AM
Finally done. Thank u very much!

Originally posted by dpryan View Post

Just use the GTF.
Leave a comment:
dpryan replied

01-16-2014, 05:32 AM
Just use the GTF.
Leave a comment:
Lagzxadr replied

01-16-2014, 05:07 AM
Yes. Got it. Thanks a lot! then how to generate a gff?

Originally posted by dpryan View Post

When downloading that table from the UCSC table browser, just change the "output format" drop-down box to "GTF - gene transfer format".
Leave a comment:
dpryan replied

01-16-2014, 04:47 AM
When downloading that table from the UCSC table browser, just change the "output format" drop-down box to "GTF - gene transfer format".
Leave a comment:
Lagzxadr replied

01-16-2014, 04:38 AM
oo... I failed to generate a gff file from UCSC. I can only download a gff3 file from ncbi. I ran the HTSeq on the gff3 and my bam file. but no Feature counted. I think it is because the ID form ncbi gff3 cannot be matched to the IDs in bam, which was mapped with ucsc basement. can u give some suggestion? Should I use the gff3 from ncbi or where can I get a ucsc gff?

Originally posted by Simon Anders View Post

This does not at all look like a GFF file to me. No wonder that it does not work.
Leave a comment:
dpryan replied

01-16-2014, 04:15 AM
Originally posted by Lagzxadr View Post

#bin name chrom strand txStart txEnd cdsStart cdsEnd exonCoun
1 NM_131426 chr1 + 50321633 50410568 50322024
1 NM_001110522 chr1 - 58701200 58722813 58701200
9 NM_001143751 chr1 + 6072450 6331842 6072675 6331842 11

That's genePred format from UCSC! The conversion procedure is detailed here
Leave a comment:
Simon Anders replied

01-16-2014, 04:04 AM
This does not at all look like a GFF file to me. No wonder that it does not work.
Leave a comment:
Lagzxadr replied

01-16-2014, 04:01 AM
#bin name chrom strand txStart txEnd cdsStart cdsEnd exonCoun
1 NM_131426 chr1 + 50321633 50410568 50322024
1 NM_001110522 chr1 - 58701200 58722813 58701200
9 NM_001143751 chr1 + 6072450 6331842 6072675 6331842 11

Originally posted by Simon Anders View Post

Please post the beginning of your GFF file, to see whether there really is a '+' in line 2.
Leave a comment:
Simon Anders replied

01-16-2014, 01:02 AM
Please post the beginning of your GFF file, to see whether there really is a '+' in line 2.
Leave a comment:
Lagzxadr replied

01-16-2014, 12:26 AM
Dear Simon,
I met a problem when using the HTSeq count. How can I fix the error? Thanks a lot!
huoxj@ubuntu:/host/ubuntu$ htseq-count -s no -i ID Hxj3TAN_hits.bam Zv9.gff > Hxj4count.txt
Error occured when processing GFF file (line 2 of file Zv9.gff):
invalid literal for int() with base 10: '+'
[Exception type: ValueError, raised in __init__.py:223]

Originally posted by Simon Anders View Post

Hi

I noticed this bug myself just yesterday and fixed it. Please try again with version 0.4.3-p4 and tell me whether this solves the issue.

Cheers
Simon
Leave a comment:
dvanic replied

11-06-2013, 10:54 PM
I'm thinking that the "stranded=reverse" is the way to go if I want to measure sense expression, since for the fr-firststrand protocol, the right most strand is sequenced first which is opposite to the coding strand. Is this correct?

Yes. I've posted on this here:http://seqanswers.com/forums/showpos...8&postcount=50
Leave a comment:
alig replied

10-16-2013, 02:54 PM
Hello,

I've used Tophat 2.0.9 & then HTseq version 0.5.4p3 & just with 3 of my 28 SAM files I get this error.

Error occured in line 63841485 of file RNA8_sorted.sam.
Error: ("'seq' and 'qualstr' do not have the same length.", 'line 63841485 of file RNA8_sorted.sam')
[Exception type: ValueError, raised in _HTSeq.pyx:772]

Can anyone please help as it's holding up my analysis.

Thank you
alig
Leave a comment:
ppatrickt replied

09-18-2013, 02:21 PM
library type and stranded parameter

Hello,

I'm trying to figure out the right "stranded" parameter to use for my RNA-seq data which was aligned using TopHat with the "--library-type fr-firststrand" parameter. I'm using paired-end reads.

From what I can see, the results of running "stranded=no" is similar to "stranded=reverse" which gives me about ~50% of the total fragments, the majority have no feature. But if I ran using "stranded=yes", I only get ~2% of total fragments as having a feature.

I'm thinking that the "stranded=reverse" is the way to go if I want to measure sense expression, since for the fr-firststrand protocol, the right most strand is sequenced first which is opposite to the coding strand. Is this correct?

Thanks,
Patrick
Leave a comment:

Previous 1 2 3 4 5 6 13 template Next

Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM
Strategies for Sequencing Challenging Samples

by seqadmin

Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
- Channel: Articles
03-22-2024, 06:39 AM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News