Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • multi-factor vs. pair-wisw design

    Hi everyone,

    I have a data set of nine different groups, each with three samples. All these groups a are being compared against each other to find differentially regulated genes. All in all I have 14 different comparisons.
    I have tested one design matrix for all samples vs. a pair-wise approach, where only the two compared samples were uploaded.

    I was wondering which way make more sense, since I'm getting different results, when comparing the two ways.
    this is my design matrix for all the samples:
    Code:
                conditionT
    HP4_1              HP4
    HP4_2              HP4
    HP4_3              HP4
    HP24_1            HP24
    HP24_2            HP24
    HP24_3            HP24
    CR4w_1           CR4w4
    CR4w_2           CR4w4
    CR4w_3           CR4w4
    CR4w24_1        CR4w24
    CR4w24_2        CR4w24
    CR4w24_3        CR4w24
    CTRL4_1          CTRL4
    CTRL4_2          CTRL4
    CTRL4_3          CTRL4
    CTRL24_1        CTRL24
    CTRL24_2        CTRL24
    CTRL24_3        CTRL24
    basalCR4w_1  basalCR4w
    basalCR4w_2  basalCR4w
    basalCR4w_3  basalCR4w
    basalCTRL_1  basalCTRL
    basalCTRL_2  basalCTRL
    basalCTRL_3  basalCTRL
    basalHP_1      basalHP
    basalHP_2      basalHP
    basalHP_3      basalHP
    and accordingly the pair-wise design:
    Code:
                conditionT
    HP4_1              HP4
    HP4_2              HP4
    HP4_3              HP4
    CTRL4_1          CTRL4
    CTRL4_2          CTRL4
    CTRL4_3          CTRL4
    or
    Code:
                conditionT
    HP24_1            HP24
    HP24_2            HP24
    HP24_3            HP24
    basalHP_1      basalHP
    basalHP_2      basalHP
    basalHP_3      basalHP
    When comparing the different results I am getting for the first matrix design better adjusted p-values as for the pair-wise approach. As I expected, I get similar (but not identical, probably due to the different size factors) log2 fold-changes.

    Here is a sample of one of the comparisons from the full matrix design
    Code:
    miRNA	log2FoldChange	padj
    mmu-miR-29a-3p	0.534368658	0.000259248
    mmu-miR-26a-5p	0.378956528	0.000310647
    mmu-miR-200a-3p	0.299780505	0.00060916
    mmu-miR-29c-3p	0.433273797	0.00060916
    mmu-miR-29b-3p	0.625200783	0.001034352
    mmu-miR-30d-5p	0.253729371	0.00715
    mmu-miR-30a-5p	0.289108972	0.00715
    mmu-miR-26b-5p	0.287258966	0.009435688
    mmu-miR-30a-3p	0.263099811	0.012596849
    mmu-miR-200c-3p	0.480164731	0.016093411
    mmu-miR-455-3p	0.734375756	0.016093411
    mmu-miR-101a-3p	0.231741597	0.019381496
    mmu-miR-101c	0.23216037	0.021264359
    mmu-miR-30e-3p	0.276381941	0.026896293
    mmu-miR-92b-3p	0.491022916	0.041665933
    mmu-miR-99a-5p	0.332214684	0.049609316
    mmu-miR-151-5p	0.259334395	0.08039887
    mmu-miR-181c-5p	-0.226533316	0.08039887
    mmu-miR-127-3p	-0.5404365	0.092739149
    mmu-miR-182-5p	0.460856503	0.095664474
    mmu-miR-30e-5p	0.212060214	0.095664474
    and the same samples from the pair-wise design:
    Code:
    	log2FoldChange	padj
    mmu-miR-29a-3p	0.488112296	0.054110034
    mmu-miR-29b-3p	0.531545957	0.080779499
    mmu-miR-29c-3p	0.398383972	0.080779499
    mmu-miR-451a	-0.515259487	0.080779499
    mmu-miR-26a-5p	0.35141262	0.086831362
    It is clear that there are far less DE miRNA in the pair-wise comparison, than in the full matrix design.
    On the other hand, probably also due to the differences in the size factors I am getting log2FC values also in miRNAs, which have no reads attached at all.

    Code:
    miRNA	baseMeanA	baseMeanB	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj	HP24_1	HP24_2	HP24_3	CTRL24_1	CTRL24_2	CTRL24_3
    mmu-miR-376b-5p	0	0	0.127918331	-0.000507796	0.046785551	-0.010853703	0.991340168	NA	0	0	0	0	0	0
    mmu-miR-1968-3p	0	0	0.127682606	-0.000515398	0.046192878	-0.011157523	NA	NA	0	0	0	0	0	0
    I can see that the adjusted p-values are neglectable here, but still it make me wonders which of the two designs are better to continue with.

    The full design matrix shows better p-values, but creates possible artefacts in the data set. The pair-wise design shows less significant results.

    So, which one of the matrices will show me more realistic results?

    thanks,
    Assa

  • #2
    In general it is preferable to fit the model to all the samples you have and then, using appropriate contrast matrices, get the comparisons of interest. This way you get better estimates of the coefficients. So your first option is recommended.

    If you search the limma vignettes and/or edgeR vignettes and/or bioconductor mailing list you should be able to find similar cases.

    Comment


    • #3
      Originally posted by dariober View Post
      In general it is preferable to fit the model to all the samples you have and then, using appropriate contrast matrices, get the comparisons of interest. This way you get better estimates of the coefficients. So your first option is recommended.
      thanks for the reply, I know about that fact. This is why I did it in the the first place.
      My question was more about finding out why I have such a difference in the two tables of results with the p-values.

      Is there a way to see exactly what design matrix is used for the actual DESeq2 analysis ( I mean the one with the 0 and 1)?

      Comment


      • #4
        The difference in the results table is typically due to increased or decreased dispersion estimates, when including other samples.

        Take a look at the PCA plot. If the two groups you are comparing, say A and B have higher within group variance than the other groups, then what might be happening is that the dispersion estimates can be lowered by including the other groups (because we estimate a single dispersion value per gene). See the DESeq2 paper for details on the dispersion estimation.

        See the vignette section "Access to all calculated values", for extracting parameters. the model matrix is attr(dds, "modelMatrix")

        Regarding the LFCs which are near zero but not equal to zero for a contrast of two groups with zeros within a larger analysis, this is expected, but I have also fixed this behavior in the next release (v1.8 released in one week), so that these will be zeroed out ( https://support.bioconductor.org/p/65213/#65254 )

        Comment


        • #5
          thanks for the information and the news about the new version. I will look at the PCA.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          27 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          31 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          27 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Working...
          X