Filter rules

To select the more interesting rules, you can apply a filter by using four quality measures: support, confidence, lift and leverage. Filters can be apply before or/and after the rule generation and can be global (for all queries) or different for each query.

In general, the more interesting rules are those having the most important values for the four measures.

The computation of these measures depends on the number of tuple variables (t1, t2, …) specified in the "SCOPE" clause of the query.
If only one tuple variable (t1) is defined, N will correspond to the number of possible tuples in dataset1, verifying the specified condition if it exists (the “WHERE” clause).
If two or more tuple variables are specified (t1, t2, …), N will correspond to the number of possible combinations of t1 in dataset1, t2 in dataset2, … (verifying the specified condition if it exists).

    Support

The support of a rule X => Y is given by:

Support(X=>Y) = Support(Y=>X) = Proba (X and Y) = (count of X U Y) / N

A support equals to 10 % means that X and Y occurs together in 10% of the cases.
Larger is the support, more frequent is the rule.

    Confidence

The confidence is a measure of the strength of a rule, it is defined as:

Confidence (X=>Y) = Proba(Y knowing X) = (count of X U Y) / (count of X)

A confidence equals to 90 % means that Y occurs in 90% of the cases where X occurs.
A confidence equals to 100% means that the rule is exact (i.e. Y occurs each time X occurs).
Larger is the confidence, truer is the rule.

    Lift and Leverage

The lift and the leverage compare the observed support of the rule and the expected support if X and Y were statistically independent, they are defined as:

Lift (X=>Y) = Lift (Y=>X) = Proba(X and Y) / Proba(X and Y with independence) = (count of X U Y * N) / (count of X * count of Y)
Leverage (X=>Y) = Leverage (Y=>X) = Proba(X and Y) - Proba(X and Y with independence) = (count of X U Y / N) - (count of X / N) * (count of Y / N)

A lift equals to 1 and a leverage equals to 0 mean that X and Y are statistically independent.
Larger are the lift and the leverage, greater is the dependence between X and Y.
For instance, a lift equals to 2 means that the actual proportion of combinations where X and Y occurs together is two times larger than the expected proportion if X and Y were statistically independent.
Be careful, if the supports of X and Y are small and per chance they occur a few times (or only once) together, then they can produce enormous lift values.

 
< Previous                                                                                                                         Next >