Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • About negative binomial distribution fit

    Hi everybody!

    How do we test the adjustment of RNA-seq data (reads per gene) to the negative binomial distribution? We are currently using the following aproximation:

    a) We used the R function goodfit() to find the parameters of the negative binomial curve closest to our data.
    b) We used the R function ks.test() -Kolmogorov-Smirnov Test, ks- to compare our data with the negative binomial curve estimated by goodfit.

    Do we need to use the original reads per gene count as input? D:

    We weren't able to prove the adjustment of the reads per gene (as an integer vector) to the negative binomial curve. Here, goodfit was able to estimate the closest negative binomial parameters, but the p for the ks test was too low (the Ho that the distribution was negative binomial was rejected). However, when using categorized data (we imposed 20 bins each one representing a reads per gene range) the ks test prooved the adjustment of the data to the negative binomial distribution (p > 0.8).

    Are our adjustments for categorized data valid? And in case it is ...
    why are we unable to proove the adjustment of the original reads per gene vector to the negative binomial distribution?

    Thanks a lot

  • #2
    First off, a statistical test doesn't prove anything. It suggests by assigning probability to the null hypothesis. If the probability is sufficiently low, you can reject the null hypothesis.

    Secondly, a p-value greater than 0.8 is not necessarily meaningful. The negative binomial may not be a good fit for the data, depending on the application. Are you including zero count genes? Are you looking at all genes? Or are you only looking at a subset or locally? For small numbers of different categories the negative binomial is probably a good assumption, but for large numbers it may not be sufficient. Additionally there are other considerations, such as sequencing bias. I think most tools for differential expression will do the renormalization and account for these factors.

    Comment


    • #3
      I am quite puzzled about what you are trying to achieve. What do you mean by "adjustment"? What exactly do you want to fit and why?

      I hope you are not trying to take all the per-gene count values from a sample and try to fit an NB distribution to it. (Sorry, if I make you sound overly naive, but a some people have misunderstood the whole NB stuff to mean that these values were NB distributed. Of course they are not. The values for one gene, across samples, are postulated to be NB distributed*, but this is hard to check unless you have dozens of samples.)

      * but only out of convenience, not because we really believe they are; see here: http://seqanswers.com/forums/showpos...49&postcount=5

      Comment


      • #4
        Thanks to both of you for the replies

        Simon:

        You were right, if fact we were trying to fit "all the per-gene count values from the same sample" to the NB distribution. Everyone in our lab (and maybe in other groups) thought till we read your answer that that was the meaning of the statistical assumption made by DEseq.

        Considering your answer everything it's ok with our analysis (or the opposite can't be tested) :P

        Thank you,

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        29 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        31 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X