Hi everybody!

How do we test the adjustment of RNA-seq data (reads per gene) to the negative binomial distribution? We are currently using the following aproximation:

a) We used the R function goodfit() to find the parameters of the negative binomial curve closest to our data.

b) We used the R function ks.test() -Kolmogorov-Smirnov Test, ks- to compare our data with the negative binomial curve estimated by goodfit.

Do we need to use the original reads per gene count as input? D:

We weren't able to prove the adjustment of the reads per gene (as an integer vector) to the negative binomial curve. Here, goodfit was able to estimate the closest negative binomial parameters, but the p for the ks test was too low (the Ho that the distribution was negative binomial was rejected). However, when using categorized data (we imposed 20 bins each one representing a reads per gene range) the ks test prooved the adjustment of the data to the negative binomial distribution (p > 0.8).

Are our adjustments for categorized data valid? And in case it is ...

why are we unable to proove the adjustment of the original reads per gene vector to the negative binomial distribution?

Thanks a lot

How do we test the adjustment of RNA-seq data (reads per gene) to the negative binomial distribution? We are currently using the following aproximation:

a) We used the R function goodfit() to find the parameters of the negative binomial curve closest to our data.

b) We used the R function ks.test() -Kolmogorov-Smirnov Test, ks- to compare our data with the negative binomial curve estimated by goodfit.

Do we need to use the original reads per gene count as input? D:

We weren't able to prove the adjustment of the reads per gene (as an integer vector) to the negative binomial curve. Here, goodfit was able to estimate the closest negative binomial parameters, but the p for the ks test was too low (the Ho that the distribution was negative binomial was rejected). However, when using categorized data (we imposed 20 bins each one representing a reads per gene range) the ks test prooved the adjustment of the data to the negative binomial distribution (p > 0.8).

Are our adjustments for categorized data valid? And in case it is ...

why are we unable to proove the adjustment of the original reads per gene vector to the negative binomial distribution?

Thanks a lot

## Comment