Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • apredeus
    replied
    I've had a quick question about this picture:



    does it matter for "ambiguos" reads if they land on the right strand? I.e. for cases shown in the two last cases, if gene A and gene B are on opposite strands, and the library is stranded, there is no ambiguity actually. Is that taken into consideration?

    Thank you in advance!

    Leave a comment:


  • dlepe
    replied
    Hi Simon,
    I'm analyzing SOLID data using bowtie for mapping and htseq for quantification. The thing is when I used the --stranded parameter (I tried it just to familiarize myself with htseq) I get very similar numbers whether I set it to yes or no.

    For example for my 001_02.count file when --stranded=yes
    __no_feature 278195
    __ambiguous 26690
    __too_low_aQual 0
    __not_aligned 0
    __alignment_not_unique 0

    For example for my 001_02.count file when --stranded=no
    __no_feature 255213
    __ambiguous 115445
    __too_low_aQual 0
    __not_aligned 0
    __alignment_not_unique 0

    Since my protocol wasn't stranded I should be losing half the counts when --stranded=yes but as you can see this was not the case.. I tried the same for some Illumina data I have access to and got this, which I think its alright.

    stranded=yes __no_feature 9381365
    stranded=no __no_feature 492513

    So after struggling with this for a while the only thing I found was that the sam files for the SOLID data only have two different flags 0 or 16, which I'm guessing is not enough information for htseq?

    707_1366_1065 16 Chr1 1078 255 28M * 0 0 CCCCCCCCCCCCCACCCCCCAAATTGAG [\L!2_______UBL__ZU!"_______ XA:i:2 MD:Z:28 NM:i:0 CM:i:2
    42_176_82 0 Chr1 4868 255 73M * 0 0 GGCGGTCAGTGGCTGAGTGACTATATCGACCTGCAACAGCAAGTTCCTTACTTGGCACCTTATGAAAATGAGT ___________________________________________UU______ZY^________^Z[_^^__\KM XA:i:0 MD:Z:73 NM:i:0 CM:i:0

    so my question is, are the results I'm getting for the SOLID data with --stranded=no reliable?

    Leave a comment:


  • Lagzxadr
    replied
    Finally done. Thank u very much!
    Originally posted by dpryan View Post
    Just use the GTF.

    Leave a comment:


  • dpryan
    replied
    Just use the GTF.

    Leave a comment:


  • Lagzxadr
    replied
    Yes. Got it. Thanks a lot! then how to generate a gff?
    Originally posted by dpryan View Post
    When downloading that table from the UCSC table browser, just change the "output format" drop-down box to "GTF - gene transfer format".

    Leave a comment:


  • dpryan
    replied
    When downloading that table from the UCSC table browser, just change the "output format" drop-down box to "GTF - gene transfer format".

    Leave a comment:


  • Lagzxadr
    replied
    oo... I failed to generate a gff file from UCSC. I can only download a gff3 file from ncbi. I ran the HTSeq on the gff3 and my bam file. but no Feature counted. I think it is because the ID form ncbi gff3 cannot be matched to the IDs in bam, which was mapped with ucsc basement. can u give some suggestion? Should I use the gff3 from ncbi or where can I get a ucsc gff?
    Originally posted by Simon Anders View Post
    This does not at all look like a GFF file to me. No wonder that it does not work.

    Leave a comment:


  • dpryan
    replied
    Originally posted by Lagzxadr View Post
    #bin name chrom strand txStart txEnd cdsStart cdsEnd exonCoun
    1 NM_131426 chr1 + 50321633 50410568 50322024
    1 NM_001110522 chr1 - 58701200 58722813 58701200
    9 NM_001143751 chr1 + 6072450 6331842 6072675 6331842 11
    That's genePred format from UCSC! The conversion procedure is detailed here

    Leave a comment:


  • Simon Anders
    replied
    This does not at all look like a GFF file to me. No wonder that it does not work.

    Leave a comment:


  • Lagzxadr
    replied
    #bin name chrom strand txStart txEnd cdsStart cdsEnd exonCoun
    1 NM_131426 chr1 + 50321633 50410568 50322024
    1 NM_001110522 chr1 - 58701200 58722813 58701200
    9 NM_001143751 chr1 + 6072450 6331842 6072675 6331842 11
    Originally posted by Simon Anders View Post
    Please post the beginning of your GFF file, to see whether there really is a '+' in line 2.

    Leave a comment:


  • Simon Anders
    replied
    Please post the beginning of your GFF file, to see whether there really is a '+' in line 2.

    Leave a comment:


  • Lagzxadr
    replied
    Dear Simon,
    I met a problem when using the HTSeq count. How can I fix the error? Thanks a lot!
    huoxj@ubuntu:/host/ubuntu$ htseq-count -s no -i ID Hxj3TAN_hits.bam Zv9.gff > Hxj4count.txt
    Error occured when processing GFF file (line 2 of file Zv9.gff):
    invalid literal for int() with base 10: '+'
    [Exception type: ValueError, raised in __init__.py:223]



    Originally posted by Simon Anders View Post
    Hi



    I noticed this bug myself just yesterday and fixed it. Please try again with version 0.4.3-p4 and tell me whether this solves the issue.

    Cheers
    Simon

    Leave a comment:


  • dvanic
    replied
    I'm thinking that the "stranded=reverse" is the way to go if I want to measure sense expression, since for the fr-firststrand protocol, the right most strand is sequenced first which is opposite to the coding strand. Is this correct?
    Yes. I've posted on this here:http://seqanswers.com/forums/showpos...8&postcount=50

    Leave a comment:


  • alig
    replied
    Hello,

    I've used Tophat 2.0.9 & then HTseq version 0.5.4p3 & just with 3 of my 28 SAM files I get this error.

    Error occured in line 63841485 of file RNA8_sorted.sam.
    Error: ("'seq' and 'qualstr' do not have the same length.", 'line 63841485 of file RNA8_sorted.sam')
    [Exception type: ValueError, raised in _HTSeq.pyx:772]

    Can anyone please help as it's holding up my analysis.

    Thank you
    alig

    Leave a comment:


  • ppatrickt
    replied
    library type and stranded parameter

    Hello,

    I'm trying to figure out the right "stranded" parameter to use for my RNA-seq data which was aligned using TopHat with the "--library-type fr-firststrand" parameter. I'm using paired-end reads.

    From what I can see, the results of running "stranded=no" is similar to "stranded=reverse" which gives me about ~50% of the total fragments, the majority have no feature. But if I ran using "stranded=yes", I only get ~2% of total fragments as having a feature.

    I'm thinking that the "stranded=reverse" is the way to go if I want to measure sense expression, since for the fr-firststrand protocol, the right most strand is sequenced first which is opposite to the coding strand. Is this correct?

    Thanks,
    Patrick

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Technologies
    by seqadmin



    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

    Long-Read Sequencing
    Long-read sequencing has seen remarkable advancements,...
    12-02-2024, 01:49 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 07:45 AM
0 responses
9 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 07:59 AM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-09-2024, 08:22 AM
0 responses
9 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-02-2024, 09:29 AM
0 responses
175 views
0 likes
Last Post seqadmin  
Working...
X