# Association Rule

**Goal**: Find the most frequently appearing $$X=(X\_1,...,X\_p)$$. This problem can be viewed as to the problem finding the frequent subsets $$(v\_1,...v\_L), ;v\_j \subset X$$, such that the probability density $$P(v\_l)$$ evaluated at each of those values is relative large.

&#x20;   In most cases $$X\_j\in {0,1}$$, where it is referred to as "market basket" analysis. For observation $$i$$, each variable $$X\_j$$ is assigned one of two values; $$x\_{ij}=1$$ if the $$j\_{th}$$ item is purchased. In this setting of the goal, $$X=v\_l$$ will nearly always be too small for reliable estimation. Thus we need to modify our goal as following way.

**Modified Goal**: Instead of seeking values $$x$$ where $$P(x)$$is large, We seeks regions of the $$X$$space with high probability content relative to their size or support. Then, the modified goal is to find subsets of variable $$s\_1,...,s\_p$$ such that the probability of each of the variables is relative large.

$$
P\Big\[\bigcap^p\_{j=1} (X\_j \in s\_j)\Big]
$$

The intersection part is called a conjunctive rule. The subsets $$s\_j$$ are interval for quantitative $$X\_j$$.

$$
P\Big\[\bigcap *{j \in J} (X\_j=v*{0j})\Big]
$$

$$K\subset {1,...,P},; P=\sum^p\_{j=1}|S\_j|$$.   $$|S\_j|$$ is the number of distinct values attainable by $$X\_j$$. $$K$$ is called an item set.

$$
P\Big\[\bigcap\_{k\in K} (Z\_k=1)\Big] =P\Big\[\prod\_{k \in K} Z\_k=1\Big ] \ \widehat{Pr}\Big\[\prod *{k \in K} (Z\_k=1) \Big]=\dfrac{1}{N} \sum^N*{i=1} \prod *{k \in K} z*{ik}
$$

### Market Basket Analysis

&#x20;  &#x20;
