# The Effective Number of Parameters

### The Effective Number of Parameters

&#x20;More complex model is, more increased the number of parameter is. Many parameter is good for reducing error, but **it is vulnerable to overfitting.** In this case, we can lower our model complexity by **imposing penalty** on our model. Using this penalty term, we only can choose beta truly necessary in our model. How the number of necessary beta(parameter) is decided? It depends on matrix property multiplied by output vector.&#x20;

&#x20;Degree of freedom is relevant to the number of eigenvalues in Smoother matrix.

> <https://stats.stackexchange.com/questions/114434/what-is-meant-by-effective-parameters-in-machine-learning>

$$
\hat{y}=Sy \\
df(S)=trace(S)
$$

> Smoother matrix: <https://math.stackexchange.com/questions/2784061/how-to-interpret-the-smoother-matrix>

&#x20;The degree of freedom in matrix is the number of independent column vectors(or the number of non-zero eigenvalues) We can do eigenvalue decomposition of our smoother matrix because it is symmetric matrix.

$$
S=X(X^TX+\lambda\Omega)^{-1}X^T =UDU^T \\
SS\<S (consequence ; of ; shrinkage ; nature)\\
trace(S)=trace(UDU^T)=trace(UU^TD)=trace(D)
$$

&#x20;If it is projection matrix not smoother matrix, trace(D) becomes rank(S) because projection matrix(idempotent matrix) only has 0 or 1 as an eigenvalue. $$SS$$is always small than $$S$$, so eigenvalue is between 0 and 1. I think it would be reason of $$df(S)=trace(S)$$

&#x20;If $$S$$is orthogonal-projection matrix and has M parameters, $$trace(S)=M=df(S)$$. It replaces d in Cp statistic.

$$
Y=f(X)+\epsilon (additive ; model) \\
Var(\epsilon)=\sigma^2\_\epsilon
$$

&#x20; In the above assumption, the following expression is satisfied.

$$
\Sigma^n\_{i=1}Cov(\hat{y}*i,y\_i)=trace(S)\sigma^2*\epsilon \\
df(\hat{y})=\dfrac{\Sigma^n\_{i=1} Cov(\hat{y}*i,y\_i)}{\sigma^2*\epsilon}
$$

$$
BIC = -2loglik+(logN)d
$$

### VC dimension

The Vapnik-Chervonenkis theory provides such a general measure of complexity, and gives associated bounds on the optimism. It measures the complexity by assessing how wiggly its members can be.

> The VC dimension of the class {f(x,alpha)} is defined to  be the largest number of points that can be shattered by members of {f(x,alpha)}

If our function can perfectly separate the three points into two classes, we can say that the VC dimension of the function is 3(# of points).

> <https://keepmind.net/%EA%B8%B0%EA%B3%84%ED%95%99%EC%8A%B5-vc-dimension/>

$$
Err\_\mathcal{T}\leq \bar{err}+\frac{\epsilon}{2}(1+\sqrt{1+\frac{4\bar{err}}{\epsilon}}), \quad (binary ; classification) \\
Err\_\mathcal{T} \leq \frac{\bar{err}}{(1-c\sqrt{e})\_+}, \quad (binary ; classification)
$$


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://statduck.gitbook.io/statduck/esl/ch7/the-effective-number-of-parameters.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
