Machine Learning

Machine Learning Quick Reference Part-2

Finding association rules

In order to find the association rules, we have to first search for all of the rules that have support greater than the threshold support. But the question arises: how do we find these? A possible way to find this is by brute force, which means to list all the possible association rules and calculate the support and confidence for each rule. Later, remove all the rules that fail the confidence and support thresholds.

Given there are n items in the set I, the total number of possible association rules is 3n - 2n+1 + 1.

If X is a frequent itemset with k elements, then there are 2k - 2 association rules.

Let's see how to execute association rules in Python:

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

 

data = pd.read_csv('association_mining.csv', header = None)

 

transactions = []

for i in range(0, 7501):

transactions.append([str(data.values[i,j]) for j in range(0, 20)])

If we are asking for an item to appear three times in a day for seven days' time, the support will be 3 x 7/70517051 is the total number of transactions. We will keep the confidence as 20% in the beginning:

from apyori import apriori

rules = apriori(transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2)

 

results = list(rules)

results

We can visualize the output by running the results command from the preceding code:

Frequent pattern growth

Frequent pattern growth (FP-growth) is a frequent itemset generation technique (similar to Apriori). FP-Growth builds a compact-tree structure and uses the tree for frequent itemset mining and generating rules. It is faster than Apriori and can throw results with large datasets.

Let's go through the steps of FP-Growth:

  1. Setting up the transactions: This step sets up the items by frequency. However, the items are set up vertically, not horizontally, that means transforming input from transaction to items:

t_id

Items

1

(B, C, D, A)

2

(B, C, D)

3

(D, A)

4

(A, B)

5

(A, C, B)

  1. Finding the frequency: Now, we have to find out the frequency of each item individually:

Items

Frequency

A

4

B

4

C

3

D

3

Let's set up the minimum threshold or minimum support as 50%:

  • Min Support = (5*50/100) = 2.5
  • Ceiling of minimum support = 2.5 ~ 3

 

  1. Prioritize the items by frequency: Since all the items have a frequency greater than or equal to minimum support, all the items will be part of it. Also, based on their frequency, priority or rank will be assigned to the items:

Items

Frequency

Rank

A

4

1

B

4

2

C

3

3

D

3

4

The order of the items is: A, B, C, and D (by frequency in descending order)

  1. Ordering the items by priority: Now the order of items will be set according to the priority given to various items based on frequency. Currently, the order is A, B, C, and D:

t_id

Items

Order by priority

1

(B, C, D, A)

(A, B, C, D)

2

(B, C, D)

(B, C, D)

3

(D, A)

(A, D)

4

(A, B)

(A, B)

5

(A, C, B)

(A, B, C)

Continue Reading Article: Machine Learning Quick Reference Part-3

© copyright 2017 www.aimlmarketplace.com. All Rights Reserved.

A Product of HunterTech Ventures