Frequent pattern tree growth

We will study the different frequent pattern tree growth from the following rows:

**Row 1**: Every FP-Tree starts with a null node as a root node. Let's draw the first row of the tree order along with their frequency:

**Row 2**: It has got**{B,C,D}**.**A**is missing, so we cannot merge it with the earlier node. Hence, we will have to create another node, altogether as shown here:

**Row 3**: It has got**{A,D}**.**B**and**C**are missing, but we can tie it with the earlier node.**A**encounters a repetition, so frequency will change. It becomes**2**now:

**Row 4**: It has got**{A,B}**. We can tie it with the earlier node and will traverse on the previous node.**A**and**B**encounters a repetition, so frequency will change for it. It becomes 3 and 2 respectively:

**Row 5**: It has got**{A,B,C}**. Again, it can be tied with the earlier node and A, B, and C see a repetition, so the frequency will change for them. It becomes 4, 3, and 2 respectively:

Validation

Now, let's count the frequency of the final tree that we have got and compare the frequency of each item with the table to ensure that we have got the correct frequencies in the table:

**A:4****B:4****C:3****D:3**

Now, we will go from bottom to top. We will find out the branches where D appears:

We can see that there are three branches where D appears:

- BC: 1
- ABC: 1
- A: 1

These branches are termed as conditional pattern base for D. While we do this, there are points to be kept in mind:

- Even if we traverse from bottom to top, we write the branches in a top-to-bottom manner
- D is not part of it
- 1 represents the frequency of occurrence of D in each branch

Now, the conditional pattern for D results in the conditional frequencies for A, B, and C, which are 2, 2, and 2. All are less than the minimum support (3). Hence, there can't be any conditional FP- Tree for it.

Now, let's do it for C. C appears in the following branches:

The branches end up like this:

- B:1
- AB:2

It results in A:2 and B:3. So, B fit with the bill in accordance with the minimum support. Now, the conditional tree ends up like this:

Similarly, conditional pattern finding is done for different combinations. Thus, it sets up the frequent item dataset.

Let's see how it can be done in Python. We will be using a library called pyfpgrowth. Also, we shall create an itemset in the following section.

Importing the library

In order to perform validation, we will import the library and build the transactions as shown here:

import pyfpgrowth

We build our transactions as follows:

transaction = [["bread", "butter", "cereal"],

["butter", "milk"],

["bread", "milk"],

["butter", "cereal", "milk"],

["egg", "bread"],

["egg", "butter"],

["cereal", "milk"],

["bread", "butter", "cereal", "egg"],

["cereal", "bread", "butter"]]

Minimum support is defined now to find the pattern find_frequent_patterns()**,** where transactions are the list of items bought at each transaction, and 2 is the minimum threshold set for support count:

patterns = pyfpgrowth.find_frequent_patterns(transaction, 2)

Finally, we have to define the confidence to get the rules. Rules are generated based on the patterns and 0.5 is the minimum threshold set for confidence. Then, we store the rules in a dataframe named rules. rules initially consists of an antecedent, a consequent, and the confidence value:

rules = pyfpgrowth.generate_association_rules(patterns, 0.5)

print(rules)

We get the output as follows:

This is how we get the rules. FP-growth tends to have the edge over Apriori as it is faster and more efficient.

*If you found this article interesting, you can explore **Machine Learning Quick Reference** as a hands-on reference guide for developing, training, and optimizing your machine learning models. Machine learning makes it possible to learn about the unknowns and gain hidden insights into your datasets by mastering many tools and techniques. **Machine Learning Quick Reference** guides you to do just that in a very compact manner.*