Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • About negative binomial distribution fit

    Hi everybody!

    How do we test the adjustment of RNA-seq data (reads per gene) to the negative binomial distribution? We are currently using the following aproximation:

    a) We used the R function goodfit() to find the parameters of the negative binomial curve closest to our data.
    b) We used the R function ks.test() -Kolmogorov-Smirnov Test, ks- to compare our data with the negative binomial curve estimated by goodfit.

    Do we need to use the original reads per gene count as input? D:

    We weren't able to prove the adjustment of the reads per gene (as an integer vector) to the negative binomial curve. Here, goodfit was able to estimate the closest negative binomial parameters, but the p for the ks test was too low (the Ho that the distribution was negative binomial was rejected). However, when using categorized data (we imposed 20 bins each one representing a reads per gene range) the ks test prooved the adjustment of the data to the negative binomial distribution (p > 0.8).

    Are our adjustments for categorized data valid? And in case it is ...
    why are we unable to proove the adjustment of the original reads per gene vector to the negative binomial distribution?

    Thanks a lot

  • #2
    First off, a statistical test doesn't prove anything. It suggests by assigning probability to the null hypothesis. If the probability is sufficiently low, you can reject the null hypothesis.

    Secondly, a p-value greater than 0.8 is not necessarily meaningful. The negative binomial may not be a good fit for the data, depending on the application. Are you including zero count genes? Are you looking at all genes? Or are you only looking at a subset or locally? For small numbers of different categories the negative binomial is probably a good assumption, but for large numbers it may not be sufficient. Additionally there are other considerations, such as sequencing bias. I think most tools for differential expression will do the renormalization and account for these factors.

    Comment


    • #3
      I am quite puzzled about what you are trying to achieve. What do you mean by "adjustment"? What exactly do you want to fit and why?

      I hope you are not trying to take all the per-gene count values from a sample and try to fit an NB distribution to it. (Sorry, if I make you sound overly naive, but a some people have misunderstood the whole NB stuff to mean that these values were NB distributed. Of course they are not. The values for one gene, across samples, are postulated to be NB distributed*, but this is hard to check unless you have dozens of samples.)

      * but only out of convenience, not because we really believe they are; see here: http://seqanswers.com/forums/showpos...49&postcount=5

      Comment


      • #4
        Thanks to both of you for the replies

        Simon:

        You were right, if fact we were trying to fit "all the per-gene count values from the same sample" to the NB distribution. Everyone in our lab (and maybe in other groups) thought till we read your answer that that was the meaning of the statistical assumption made by DEseq.

        Considering your answer everything it's ok with our analysis (or the opposite can't be tested) :P

        Thank you,

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Advanced Methods for the Detection of Infectious Disease
          by seqadmin




          The recent pandemic caused worldwide health, economic, and social disruptions with its reverberations still felt today. A key takeaway from this event is the need for accurate and accessible tools for detecting and tracking infectious diseases. Timely identification is essential for early intervention, managing outbreaks, and preventing their spread. This article reviews several valuable tools employed in the detection and surveillance of infectious diseases.
          ...
          11-27-2023, 01:15 PM
        • seqadmin
          Strategies for Investigating the Microbiome
          by seqadmin




          Microbiome research has led to the discovery of important connections to human and environmental health. Sequencing has become a core investigational tool in microbiome research, a subject that we covered during a recent webinar. Our expert speakers shared a number of advancements including improved experimental workflows, research involving transmission dynamics, and invaluable analysis resources. This article recaps their informative presentations, offering insights...
          11-09-2023, 07:02 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 12-01-2023, 09:55 AM
        0 responses
        21 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 11-30-2023, 10:48 AM
        0 responses
        20 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 11-29-2023, 08:26 AM
        0 responses
        15 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 11-29-2023, 08:12 AM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Working...
        X