Association Rule

Goal: Find the most frequently appearing X=(X1,...,Xp)X=(X_1,...,X_p). This problem can be viewed as to the problem finding the frequent subsets (v1,...vL),  vjX(v_1,...v_L), \;v_j \subset X, such that the probability density P(vl)P(v_l) evaluated at each of those values is relative large.

In most cases Xj{0,1} X_j\in \{0,1\}, where it is referred to as "market basket" analysis. For observation ii, each variable XjX_j is assigned one of two values; xij=1x_{ij}=1 if the jthj_{th} item is purchased. In this setting of the goal, X=vl X=v_l will nearly always be too small for reliable estimation. Thus we need to modify our goal as following way.

Modified Goal: Instead of seeking values xx where P(x)P(x)is large, We seeks regions of the XXspace with high probability content relative to their size or support. Then, the modified goal is to find subsets of variable s1,...,sps_1,...,s_p such that the probability of each of the variables is relative large.

P[j=1p(Xjsj)]P\Big[\bigcap^p_{j=1} (X_j \in s_j)\Big]

The intersection part is called a conjunctive rule. The subsets sjs_j are interval for quantitative XjX_j.

P[jJ(Xj=v0j)]P\Big[\bigcap _{j \in J} (X_j=v_{0j})\Big]

K{1,...,P},  P=j=1pSjK\subset \{1,...,P\},\; P=\sum^p_{j=1}|S_j| . Sj|S_j| is the number of distinct values attainable by XjX_j. KK is called an item set.

P[kK(Zk=1)]=P[kKZk=1]Pr^[kK(Zk=1)]=1Ni=1NkKzikP\Big[\bigcap_{k\in K} (Z_k=1)\Big] =P\Big[\prod_{k \in K} Z_k=1\Big ] \\ \widehat{Pr}\Big[\prod _{k \in K} (Z_k=1) \Big]=\dfrac{1}{N} \sum^N_{i=1} \prod _{k \in K} z_{ik}

Market Basket Analysis

Last updated

Was this helpful?