Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • About negative binomial distribution fit

    Hi everybody!

    How do we test the adjustment of RNA-seq data (reads per gene) to the negative binomial distribution? We are currently using the following aproximation:

    a) We used the R function goodfit() to find the parameters of the negative binomial curve closest to our data.
    b) We used the R function ks.test() -Kolmogorov-Smirnov Test, ks- to compare our data with the negative binomial curve estimated by goodfit.

    Do we need to use the original reads per gene count as input? D:

    We weren't able to prove the adjustment of the reads per gene (as an integer vector) to the negative binomial curve. Here, goodfit was able to estimate the closest negative binomial parameters, but the p for the ks test was too low (the Ho that the distribution was negative binomial was rejected). However, when using categorized data (we imposed 20 bins each one representing a reads per gene range) the ks test prooved the adjustment of the data to the negative binomial distribution (p > 0.8).

    Are our adjustments for categorized data valid? And in case it is ...
    why are we unable to proove the adjustment of the original reads per gene vector to the negative binomial distribution?

    Thanks a lot

  • #2
    First off, a statistical test doesn't prove anything. It suggests by assigning probability to the null hypothesis. If the probability is sufficiently low, you can reject the null hypothesis.

    Secondly, a p-value greater than 0.8 is not necessarily meaningful. The negative binomial may not be a good fit for the data, depending on the application. Are you including zero count genes? Are you looking at all genes? Or are you only looking at a subset or locally? For small numbers of different categories the negative binomial is probably a good assumption, but for large numbers it may not be sufficient. Additionally there are other considerations, such as sequencing bias. I think most tools for differential expression will do the renormalization and account for these factors.

    Comment


    • #3
      I am quite puzzled about what you are trying to achieve. What do you mean by "adjustment"? What exactly do you want to fit and why?

      I hope you are not trying to take all the per-gene count values from a sample and try to fit an NB distribution to it. (Sorry, if I make you sound overly naive, but a some people have misunderstood the whole NB stuff to mean that these values were NB distributed. Of course they are not. The values for one gene, across samples, are postulated to be NB distributed*, but this is hard to check unless you have dozens of samples.)

      * but only out of convenience, not because we really believe they are; see here: http://seqanswers.com/forums/showpos...49&postcount=5

      Comment


      • #4
        Thanks to both of you for the replies

        Simon:

        You were right, if fact we were trying to fit "all the per-gene count values from the same sample" to the NB distribution. Everyone in our lab (and maybe in other groups) thought till we read your answer that that was the meaning of the statistical assumption made by DEseq.

        Considering your answer everything it's ok with our analysis (or the opposite can't be tested) :P

        Thank you,

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Exploring the Dynamics of the Tumor Microenvironment
          by seqadmin




          The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
          07-08-2024, 03:19 PM
        • seqadmin
          Exploring Human Diversity Through Large-Scale Omics
          by seqadmin


          In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
          06-25-2024, 06:43 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 07-19-2024, 07:20 AM
        0 responses
        29 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 07-16-2024, 05:49 AM
        0 responses
        42 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 07-15-2024, 06:53 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 07-10-2024, 07:30 AM
        0 responses
        43 views
        0 likes
        Last Post seqadmin  
        Working...
        X