Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • miRNA differential expression (DESeq)

    Hi everyone!
    I'm currently analyzing small-RNA-seq illumina data coming from 4 different samples. I have data coming from 2 biological replicates (so in total, 8 libraries).
    We have found novel and conserved miRNA hairpins and defined the best miRNA/miRNA* duplexes for each of them.
    Now I'm trying using DESeq and EdgeR for some differential expression analysis.
    I have 2 different tissues, under stress and control condition. My comparison are: Tissue1stress vs Tissue1 control; Tissue2stress vs Tissue2 control; but also Tissue1stress vs Tissue2 stress; Tissue1 control vs Tissue2 stress.

    I've big troubles in determining:
    1. which tag to load in DESeq as raw data: should I consider all diffferent tags coming from the whole library (too much I think) or only tag mapping on putative hairpins? or uniquely tags mapping on the predicted duplexes?

    2. Using tags coming from many putative hairpins, I tried running DESeq, but I got very bad SCV Plots, and what is most, many tags are not significant because of FDR value! Even if fold change is very high! Some times also ResVarA or resVarB are high too...

    I can not understand if the problem is in my replicates (being real biological replicates they are not highly similar) or in the statistical model that doesn't fit miRNA data...

    Some times the tag that is most expressed in a pre-miRNA and seems to have a high fold change, is not significant, while another tag on the same hairpin show a very low FDR and p-value...

    Can anybody help me?

    Thank you very much


  • #2
    1. Taking only the tags mapping on putative hairpins should be fine. For mRNA, you don't count reads mapping to intragenic regions, anyway. I guess, you see all the other tags only two or three times each, anyway, at least, if you have removed low-quality reads.

    2. What is the SCV value? (Read off at the most common expression strength, i.e., where the black line peaks.) What are typical fold changes? You know that if alpha is the SCV value, you can interpret 1+sqrt(alpha) roughly as the typical fold change between replicates for strongly expressed genes. If your fold changes are not a good deal larger than the fold change between replicates, you cannot do much.

    This is, unless your samples are paired. Is the "tissue1stressed" and the "tissue1" sample the same sample, used twice in a different way, or are they two different samples of the same tissue? In the former case, you gain statistical power by introducing a blocking factor.


    • #3
      Thanks for you reply!
      1: I've taken all tags mapping on hairpins even if the risk is running DESeq with less than 6000 tags, hope it won't create any problem...

      2: SCV value (peak of the black line) is on the X axis around 10, and on the Y-axis a little bit less than 1.5. I'm sorry for my stupid question, but which is the SCV value? X or Y value? Anyway, the typical fold change is between 6 and 22, with some genes going to 35 or 70! Some genes showing a fold change around 10, have a high FDR. I attach here the SCV Plot, that looks very strange to me...

      As for the tissue, maybe I was not clear enough: we collected 4 different samples: tissue1-control; tissue1-stress; tissue2-control and tissue2-stress.
      When I refer to tissue1-stress and control they comes from two distinct groups of plants: one subjected to stress, one in control condition, while tissue1 and tissue2 in the same condition (let's say control) come from the same plants, but are 2 different part of the plants.
      Given this specification, I don't understand what you have written about the blocking factor.

      A last question is: When runnng Deseq, I load 8 different libraries in countsTable, defining the vector of the 4 different conditions. After this, when performing the nbinomTest, I compare 2 conditions at a time: could this influence some how the analysis? Would it be different if I performed DESeq with only 2 conditions and a countsTable of 4 libraries?

      Attached Files


      • #4
        Anybody could answer my question?


        Latest Articles


        • seqadmin
          Quality Control Essentials for Next-Generation Sequencing Workflows
          by seqadmin

          Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

          Nucleic Acid Quality Control
          Preparing for NGS starts with isolating the...
          02-10-2025, 01:58 PM
        • seqadmin
          An Introduction to the Technologies Transforming Precision Medicine
          by seqadmin

          In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...
          01-27-2025, 07:46 AM





        Topics Statistics Last Post
        Started by seqadmin, 02-07-2025, 09:30 AM
        0 responses
        Last Post seqadmin  
        Started by seqadmin, 02-05-2025, 10:34 AM
        0 responses
        Last Post seqadmin  
        Started by seqadmin, 02-03-2025, 09:07 AM
        0 responses
        Last Post seqadmin  
        Started by seqadmin, 01-31-2025, 08:31 AM
        0 responses
        Last Post seqadmin  