One Parameter Model
Binomial distribution
Success
Fail
Number
θ
belief
a
b
a+b
data
Σy
n−Σy
n
posterior
a+Σy
b+n−Σy
Poisson Distribution
📖 There is a book with N pages.
Yi:Numberoftypoinithpage, the type in whole pages becomes ΣYi
When we assume E(Yi)=2, let θbe the number of typo per page.
The sum of Y follows poisson distribution, because the typos per page are independent with i.i.d. assumption. We can assume theta follow this distribution(prior):
θ∣data follows gamma distribution as a following proof.

We find prior and posterior distribution, so we can make a bayesian poisson model.
Typo
Page
Typo per page
Prior
a
b
a/b
Data
Σy
n
Σy/n
Posterior
a+Σy
b+n
a+Σy: The number of typo we already knew.
b+n: The number of pages we already knew.
1) What is the expected typo error per page? - Posterior Moments
The weighted average between prior information and data information becomes the mean of gamma distribution.
2) What is the distribution of new observation? - Posterior Prediction
It is followed by this proof.
The likelihood of new observation is same to one of data.
NB(Negative Binomial) is the number of success until rth failure with p success rate. Before we failure with the number ofa+Σyi, Y~means the number of success. This success means the typo takes place at new page.
We can also derive mean and variance using Posterior predictive. However when it comes to poisson modeling, there is a disadvantage that data becomes overdispersion. It is because the variance of data is much bigger than theta. In this case we can use NB model or Hierarchical Normal model.
Exponential Families
Under the conditions as follows:
S does not depend on θ
p(θ) is a nontrivial continuous function of θ∈Ω
If X is continuous, K′(x)=0and s(x) is continuous function. If X is discrete, K(x)is nontrivial function
Under the conditions as follows:
S does not depend on θ
ϕ is a nontrivial continuous function of θ∈Ω
If Y is continuous, K′(y)=0and h(y) is continuous function. If Y is discrete, K(y)is nontrivial function
If probability density/mass function is expressed above, it means that the probability distribution belongs to an exponential family. Well-known distributions are usually included in exponential family.
Sufficient statistic for theta
Y1=u1(X1,⋯,Xn)is sufficient statistic for theta.
p(X1,X2,⋯,Xn∣Y1=y1)doesn't contain theta(parameter.)
Our statistic fully explains our parameters.
X1,⋯,Xn∼iid;f(x;θ) If f is exponential family, then K(x)=ΣK(xi)is sufficient statistic for theta
These 4 sentences are in equivalence relation.
If the posterior distribution p(θ∣x) are in the same probability distribution family as the prior probability distribution p(θ), the prior and posterior are then called conjugate distributions. (by wikipedia)
Which has the same with prior.
Last updated