Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq2 - understanding multi-factor DE analysis

    Hello again dear SeqAnswers,

    I have another question regarding RNA-seq data processing, this time with DESeq. I had count tables from my experiment generated successfully by DEXSeq counting script. Since I wanted to use DESeq2 this time, without having to re-count all the samples, I used geneCountTable function on my ExonCountSet in R to "compress" all exon bins into genes.

    1) I hope this is correct approach to obtaining gene count tables for DESeq ?

    Now the main question, my study is about gender differences in response to some knock-out. So I have 4 sample groups:

    Wild-type male
    Wild-type female
    Knock-out male
    Knock-out female

    I generated all the necessary tables following the DESeq2 vignette:

    sex strain
    WTM_1.counts male WT
    WTM_2.counts male WT
    WTM_3.counts male WT
    WTF_1.counts female WT
    WTF_2.counts female WT
    WTF_3.counts female WT
    KOM_1.counts male KO
    KOM_2.counts male KO
    KOM_3.counts male KO
    KOF_1.counts female KO
    KOF_2.counts female KO
    KOF_3.counts female KO


    2) What is then a proper approach to see differences between gene expression of KO samples that would take into consideration the natural variance between genders in wild-type (like sex-specific genes) ?

    I know I can manipulate the GLM with the formula, but to be honest I am unsure if I understand the entire meaning behind it.

    Currently, just for fun, I used formula (design = ~ sex + strain). Now what I got back are 3 tables with fold-changes - Intercept, male vs female, KO vs WT.

    3) What is intercept exactly ? Are values in tables male vs. female and KO vs. WT simply pair-wise comparisons, like ones I would get by using formulas: design = ~ sex and design = ~ strain separately ?

    I know this might take a longer answer so I will patiently wait for somebody's time. I will be extremely thankful for any insight, I am a beginner in RNA-seq , hence those questions might seem trivial for some.

    All the best!

  • #2
    Summing the counts from DEXSeq may lead to spurious results since you'll be double counting reads overlapping multiple exons. If you found any differential exon usage with DEXSeq, then you're even more likely to get spurious results here. You can get counts pretty quickly with featureCounts(), so give that a try.

    If you're interested in "gender differences in response to some knock-out" then I would think you'd be most interested in the sex:strain interaction (so "design = ~ sex*strain").

    The intercept is the base level of the model, which appears to be WT females (presumably you releveled() the strain factor in your design, since normally they're set alphabetically, meaning that KO females would be the base level), unless I have that backward. The "male vs female" result would be interpreted as "differences due to gender after accounting for differences due to strain". The "KO vs. WT" results would have a similar interpretation.

    Comment


    • #3
      That's very concise and informative. Thank you for the answer !


      However, one more thing troubles me, I have an Ensembl GRCm38 .gtf file that contains in 3rd column values such as exon, CDS, start_codon, stop_codon, but no "gene". Since I want to use the .gtf file for HTseq counting of reads based on their overall mapping to genes, I thought I should use the -t gene option. Otherwise the default -t exon would count the multi-exon reads twice into same gene, like the DEXseq counting script, right ? Or maybe the -t exon option is fine for preparing count tables for DESeq ?
      Last edited by kajot; 04-07-2014, 12:18 AM.

      Comment


      • #4
        htseq-count won't double count reads mapping to multiple exons, since it's designed with gene-level analyses in mind. The scripts that come with DEXSeq are different in this regard, since there it makes sense to double count in these instances (the DESeq/DEXSeq authors put enough thought into things that you won't normally shoot yourself in the foot doing this sort of thing). If you used the default "-t exon -i gene_id" then what you're telling the script to do is to look at reads mapping within exons and then count them according to how they overlap genes denoted by their gene_id. This is normally the appropriate way to go about things.

        Comment


        • #5
          I understand, I wasn't sure since in my .gtf file there are entries like this:

          1 protein_coding exon 4830268 4830315 . + . exon_id "ENSMUSE00001118471"; exon_number "4"; gene_biotype "protein_coding"; gene_id "ENSMUSG00000025903"; gene_name "Lypla1"; p_id "P31648"; transcript_id "ENSMUST00000150971"; transcript_name "Lypla1-003"; tss_id "TSS49998";
          1 protein_coding exon 4830268 4830315 . + . exon_id "ENSMUSE00001118471"; exon_number "4"; gene_biotype "protein_coding"; gene_id "ENSMUSG00000025903"; gene_name "Lypla1"; p_id "P31294"; transcript_id "ENSMUST00000119612"; transcript_name "Lypla1-009"; tss_id "TSS45053";


          As you can see, there is a double entry for two transcripts from the same gene, but the exon position and gene IDs are the same, therefore I was a bit concerned that the script might count a read mapped to such exon twice for the same ENSMUSG00000025903.

          Comment


          • #6
            Ah, yeah, that won't cause any double counting or other problems

            Comment


            • #7
              Ok, all is clear, I am deeply grateful for your help

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              66 views
              0 likes
              Last Post seqadmin  
              Working...
              X