Today is the last blog in a five-part series describing types of unsupervised machine learning and specifically frequent pattern growth. Last week I talked about association rules in the Apriori algorithm. The frequent pattern-growth has improved performance over the Apriori algorithm since it finds frequent sets of related data using a divide-and-conquer strategy.
For example, suppose you have the following database with TID (Transaction ID) and Items (Items Purchased).
We start building a frequent pattern tree data structure by calculating the minimum support, or minimum number of times we’ll need to “pass” through the data for a pattern. In our example, we set the minimum support to 30% or (30/100 * 8) = 2.4. Let’s round up from 2.4 to 3. The second step is to find the frequency of each item in the table above.
Item “B” appears the most times (6), then Item D appears 6 times, Item “A” appears 5 times, Item “E” appears 4 times and Item “C” appears 3 times. Another way to write this is: B:6, D:6, A: 5, E:4, C: 3. The items are prioritized or ordered according to how many times they appear. The final step is to order the items according to their priority from the previous step as shown in the table below. A tree is drawn for each row in the table as described in more detail by Hareen Laks.
Some other examples are from the evandempsey (Python) and orange3-associate. One of the best articles is by Arthur Zimek and colleagues. Melvin Serrano has one of the best YouTube video explanations. Other technology applications include cyber intrusion detection, bioinformatics/anomoly detection, large-scale transactional data, Internet of Things and finding co-occurring words in a Twitter feed (page 248+).