One Parameter Model

Binomial distribution

YiθBer(θ)Σi=1nYiθBinom(n,θ)θBeta(a,b)Y_i|\theta \sim Ber(\theta) \\ \Sigma^n_{i=1}Y_i|\theta \sim Binom(n,\theta) \\ \theta \sim Beta(a,b)
θdataBeta(a+Σi=1nyi,b+nΣy=1nyi)y~θBer(θ)y~dataBer(a+Σi=1nyia+b+n)\theta|data \sim Beta(a+\Sigma^n_{i=1}y_i, b+n-\Sigma^n_{y=1}y_i) \\ \tilde{y}|\theta \sim Ber(\theta) \\ \tilde{y}|data \sim Ber(\dfrac{a+\Sigma^n_{i=1}y_i}{a+b+n})

Success

Fail

Number

belief

data

n

posterior

Poisson Distribution

📖 There is a book with N pages.

Yi:Number  of  typo  in  ith  page Y_i : Number \;of\;typo \;in\;\mathbf{i}th\;page, the type in whole pages becomes ΣYi\Sigma Y_i

When we assume E(Yi)=2E(Y_i)=2, let θ\thetabe the number of typo per page.

θ=ΣYiNYiθPoi(θ)ΣYiθPoi(nθ)\theta = \dfrac{\Sigma Y_i}{N} \\ Y_i|\theta \sim Poi(\theta) \\ \Sigma Y_i|\theta \sim Poi(n\theta)

The sum of Y follows poisson distribution, because the typos per page are independent with i.i.d. assumption. We can assume theta follow this distribution(prior):

θGamma(a,b)θdataGamma(a+Σyi,b+n)\theta \sim Gamma(a,b) \\ \theta | data \sim Gamma(a+\Sigma y_i, b+n)

θdata\theta|data follows gamma distribution as a following proof.

We find prior and posterior distribution, so we can make a bayesian poisson model.

Typo

Page

Typo per page

Prior

Data

Posterior

  • a+Σya+\Sigma y: The number of typo we already knew.

  • b+nb+n: The number of pages we already knew.

1) What is the expected typo error per page? - Posterior Moments

θdataGamma(a+Σyi,b+n)μ=a+Σyib+n=bb+nab+nb+nΣyinσ2=a+Σyi(b+n)2\theta|data \sim Gamma(a+\Sigma y_i, b+n) \\ \mu = \dfrac{a+\Sigma y_i}{b+n}=\dfrac{b}{b+n}\dfrac{a}{b}+\dfrac{n}{b+n}\dfrac{\Sigma y_i}{n} \\ \\↩︎\\ \sigma^2 = \dfrac{a+\Sigma y_i}{(b+n)^2}

The weighted average between prior information and data information becomes the mean of gamma distribution.

2) What is the distribution of new observation? - Posterior Prediction

Y~dataNB(a+Σyi,p),  p=1b+n+1\tilde{Y}|data \sim NB(a+\Sigma y_i,p), \; p=\dfrac{1}{b+n+1} \\

It is followed by this proof.

p(y~data)=p(y~,θdata)d(θ)=p(y~θ,data)p(θdata)d(θ)=p(y~θ)p(θdata)d(θ)Y~θPoi(θ)p(\tilde{y}|data) = p(\tilde{y},\theta|data)d(\theta) \\ = \int p(\tilde{y}|\theta,data)p(\theta|data)d(\theta)=\int p(\tilde{y}|\theta)p(\theta|data)d(\theta) \\ \tilde{Y}|\theta \sim Poi(\theta)

The likelihood of new observation is same to one of data.

NB(Negative Binomial) is the number of success until rth failure with p success rate. Before we failure with the number ofa+Σyi a+\Sigma y_i, Y~\tilde{Y}means the number of success. This success means the typo takes place at new page.

We can also derive mean and variance using Posterior predictive. However when it comes to poisson modeling, there is a disadvantage that data becomes overdispersion. It is because the variance of data is much bigger than theta. In this case we can use NB model or Hierarchical Normal model.

Exponential Families

f(x;θ)={exp[p(θ)K(x)+s(x)+q(θ)],xS.0,otherwise.\begin{equation} f(x;\theta)=\begin{cases} exp[p(\theta)K(x)+s(x)+q(\theta)], & x\in S. \\ 0, & \text{otherwise}. \end{cases} \end{equation}

Under the conditions as follows:

  1. S does not depend on θ\theta

  2. p(θ)p(\theta) is a nontrivial continuous function of θΩ\theta \in \Omega

  3. If X is continuous, K(x)0K'(x) \neq 0and s(x) is continuous function. If X is discrete, K(x)K(x)is nontrivial function

f(yϕ)={h(y)c(ϕ)exp[ϕK(y)],xS.0,otherwise.\begin{equation} f(y|\phi)=\begin{cases} h(y)c(\phi)exp[\phi K(y)], & x\in S. \\ 0, & \text{otherwise}. \end{cases} \end{equation}

Under the conditions as follows:

  1. S does not depend on θ\theta

  2. ϕ\phi is a nontrivial continuous function of θΩ\theta \in \Omega

  3. If Y is continuous, K(y)0K'(y) \neq 0and h(y) is continuous function. If Y is discrete, K(y)K(y)is nontrivial function

If probability density/mass function is expressed above, it means that the probability distribution belongs to an exponential family. Well-known distributions are usually included in exponential family.

Sufficient statistic for theta

  • Y1=u1(X1,,Xn)Y_1=u_1(X_1,\cdots,X_n)is sufficient statistic for theta.

  • p(X1,X2,,XnY1=y1)p(X_1,X_2,\cdots,X_n|Y_1=y_1)doesn't contain theta(parameter.)

  • Our statistic fully explains our parameters.

  • X1,,Xniid;f(x;θ) X_1,\cdots,X_n \sim iid ; f(x;\theta) If f is exponential family, then K(x)=ΣK(xi)K(x)=\Sigma K(x_i)is sufficient statistic for theta

These 4 sentences are in equivalence relation.

If the posterior distribution p(θx)p(\theta|x) are in the same probability distribution family as the prior probability distribution p(θ)p(\theta), the prior and posterior are then called conjugate distributions. (by wikipedia)

f(y1,,ynϕ)=Πh(yi)c(ϕ)eπK(yi)c(ϕ)neϕΣK(yi)p(ϕ)=k(no,t0)c(ϕ)n0en0t0ϕc(ϕ)n0en0t0ϕp(ϕy)p(ϕ)f(yϕ)c(ϕ)n0+nexp[ϕ(n0t0+nΣK(yi)n)]f(y_1,\dots,y_n|\phi)=\Pi h(y_i)c(\phi)e^{\pi K(y_i)} \propto c(\phi)^ne^{\phi \Sigma K(y_i)} \\ p(\phi)=k(n_o,t_0)c(\phi)^{n_0}e^{n_0t_0\phi} \propto c(\phi)^{n_0}e^{n_0 t_0 \phi} \\ p(\phi|y) \propto p(\phi)f(y|\phi) \propto c(\phi)^{n_0+n}exp[\phi(n_0t_0+n\frac{\Sigma K(y_i)}{n})]

Which has the same with prior.

Last updated

Was this helpful?