Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DEXSeq ignore strand issues.

    DEXSeq runs a flatten python script and then a counting python script. To my knowledge the flatten does not have a ignore strand option, while the counting script does.

    It has occurred to me that it is impossible to do a correct "flatten" operation without knowing if the bins are going to be used with ignore strand or without ignoring strand. The idea behind flattening is to split over-lapping exons into bins that that have no overlap, easing some statistical modelling issues, and enabling better detecting differential splicing, but it occurs to me that if strand is respected during "flattening" and then ignored during counting it can be the case that sub-exon counting bins overlap when the model has them as non-overlapping.

    One way around this might be to always split exons without respect to strand, but that seems unlikely to be a default case.

    Anyone know how DEXSeq handles the issue of strand when splitting exons?

  • #2
    Well, the flattening should always be stranded, since the genes from which exonic bins arise are stranded. However the counting can still be unstranded, since that depends on the library type. The two need not be identical. If a genomic region has overlapping exonic bins on each strand then it'll just receive a 0 count if you have a non-directional dataset. This is the same as with gene counts.

    Comment


    • #3
      I actually focused on the complicated case: exon counting bins overlapping only because of strand, but there is now a simpler case to discuss given your reply.

      First in UCSC KnownGene it wouldn't be entirely correct to say, "genes are stranded". Transcripts are stranded, genes consist of multiple transcripts, transcripts describe how to use exon ranges to construct a function polypeptide.


      The point here that should be considered is this: a very common source of multiple transcripts for the same gene is literally the strand identifier being switched from positive to negative, but otherwise it is the same transcript with a new id, and ostensibly it is probably the same RNA isoform except observed to be generated from a different strand. I don't have a quantity of how frequently this occurs, but have done a fair bit of work in this area "by-hand" it is fair to say it occurs a lot.


      In the case of gene counting with the exon union model, this matters not at all, but in the case of exon counting bins it presents a first obvious challenge. Keep in mind I do not mean to say DEXSeq is "wrong", I think this is a topic with "defensible decisions", but one "defensible decision" does not imply there are not others.

      If the only difference between two exons is the strand, but otherwise the transcript they are embedded in is identical (including order, start, stops, usage, and resulting RNA isoform), it is not seem best to count them separately or count them as zero because they overlap. For all appearances they may be biologically identical except for strand, and in the case of read data that doesn't generate strand data you won't ever know which is being utilized.


      Counting these identical exons as zero because they are in a transcript with a counter part on the reverse strand is just taking a mulligan where it appears you do not need too. I assumed DEXSeq did not do that, but now I am curious.

      Comment


      • #4
        UCSC's annotations are a complete mess and will generally screw DEXseq up. I can't recommend using them for any reason at all.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 03-27-2024, 06:37 PM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-27-2024, 06:07 PM
        0 responses
        11 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        69 views
        0 likes
        Last Post seqadmin  
        Working...
        X