(1) p(y∣θ)is the distribution of y conditioned on θ. Like Frequentist, Bayesian assumes a specific form of sampling distribution.
(2) p(θ)is the distribution of θ. From a bayesian point of view, parameter is not constant but random variable.
(3) p(θ∣y)reflects how strongly we believe our parameter.
Example
We want to investigate the infection rate(θ) in a small city. This rate impacts on the public health policy. Let's say we just sample 20 people.
Parameter and sample space
θ∈Θ=[0,1]y={0,1,...,20}
Parameter can be on parameter space, from 0 to 1 here. Data y means the number of infected people among 20.
Sampling model
Y∣θ∼binomial(20,θ)
Prior distribution
It is the information about the parameter we know using prior researches. Let's say infection rate has a range (0.05, 0.20) and mean rate is 0.10. In this case, the distribution of parameter would be included in (0.05, 0.20) and expectation should be close to 0.10.
There are a lot of distributions matching this condition, but we just select one distribution convenient in multiplying with sampling distribution(called conjugacy).
θ∼beta(2,20)
E[θ]=0.09Pr(0.05<θ<0.20)=0.66
The sum of parameters a+b in beta distribution is equal to how much I believe the prior distribution. Because, when this sum becomes increased it shows a strong prior. Pr(0.05<θ<0.20)becomes bigger when the sum gets bigger.
Posterior distribution
We update the parameter information by multiplying Prior distribution with Sampling density.