Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • jrounds
    Junior Member
    • Nov 2014
    • 3

    DEXSeq ignore strand issues.

    DEXSeq runs a flatten python script and then a counting python script. To my knowledge the flatten does not have a ignore strand option, while the counting script does.

    It has occurred to me that it is impossible to do a correct "flatten" operation without knowing if the bins are going to be used with ignore strand or without ignoring strand. The idea behind flattening is to split over-lapping exons into bins that that have no overlap, easing some statistical modelling issues, and enabling better detecting differential splicing, but it occurs to me that if strand is respected during "flattening" and then ignored during counting it can be the case that sub-exon counting bins overlap when the model has them as non-overlapping.

    One way around this might be to always split exons without respect to strand, but that seems unlikely to be a default case.

    Anyone know how DEXSeq handles the issue of strand when splitting exons?
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    Well, the flattening should always be stranded, since the genes from which exonic bins arise are stranded. However the counting can still be unstranded, since that depends on the library type. The two need not be identical. If a genomic region has overlapping exonic bins on each strand then it'll just receive a 0 count if you have a non-directional dataset. This is the same as with gene counts.

    Comment

    • jrounds
      Junior Member
      • Nov 2014
      • 3

      #3
      I actually focused on the complicated case: exon counting bins overlapping only because of strand, but there is now a simpler case to discuss given your reply.

      First in UCSC KnownGene it wouldn't be entirely correct to say, "genes are stranded". Transcripts are stranded, genes consist of multiple transcripts, transcripts describe how to use exon ranges to construct a function polypeptide.


      The point here that should be considered is this: a very common source of multiple transcripts for the same gene is literally the strand identifier being switched from positive to negative, but otherwise it is the same transcript with a new id, and ostensibly it is probably the same RNA isoform except observed to be generated from a different strand. I don't have a quantity of how frequently this occurs, but have done a fair bit of work in this area "by-hand" it is fair to say it occurs a lot.


      In the case of gene counting with the exon union model, this matters not at all, but in the case of exon counting bins it presents a first obvious challenge. Keep in mind I do not mean to say DEXSeq is "wrong", I think this is a topic with "defensible decisions", but one "defensible decision" does not imply there are not others.

      If the only difference between two exons is the strand, but otherwise the transcript they are embedded in is identical (including order, start, stops, usage, and resulting RNA isoform), it is not seem best to count them separately or count them as zero because they overlap. For all appearances they may be biologically identical except for strand, and in the case of read data that doesn't generate strand data you won't ever know which is being utilized.


      Counting these identical exons as zero because they are in a transcript with a counter part on the reverse strand is just taking a mulligan where it appears you do not need too. I assumed DEXSeq did not do that, but now I am curious.

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        UCSC's annotations are a complete mess and will generally screw DEXseq up. I can't recommend using them for any reason at all.

        Comment

        Latest Articles

        Collapse

        • GATTACAT
          Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by GATTACAT
          Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
          Yesterday, 11:43 AM
        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

          Here are nine questions we think about, in roughly the order they matter, before...
          06-18-2026, 07:11 AM
        • SEQadmin2
          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
          by SEQadmin2


          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
          ...
          06-02-2026, 10:05 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, 06-30-2026, 05:37 AM
        0 responses
        9 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-26-2026, 11:10 AM
        0 responses
        18 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        52 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-09-2026, 11:58 AM
        0 responses
        110 views
        0 reactions
        Last Post SEQadmin2  
        Working...