4.7 Apriori Algorithm

 

v Apriori Algorithm: -

Ø The Apriori algorithm is an unsupervised machine learning algorithm used for association rule learning. Association rule learning is a data mining technique that identifies frequent patterns, connections and dependencies among different groups of items called itemsets in data.

Ø It is also known a Downward closure property.

 

Ø The Association rule is a strategy for detecting patterns in huge data sets. it involves finding relationships between variables in the data and using those relationships to make predictions or decisions. The purpose of an association rule is to find rules that define the association between distinct elements in a data set.

 

Ø It is a classic association rule mining technique used to discover frequent itemsets (groups of items that often appear together) in transactional databases.

 

Ø It is widely used in market basket analysis, recommendation systems, and pattern recognition.

Ø It can also be used in the healthcare field to find drug reactions for patients.

 

Ø It is designed to find relationships (or associations) between items in a dataset, like which products are frequently purchased together. 

Ø It was proposed by R. Agrawal and Srikant in 1994.

 

The algorithm uses the "Apriori property", which implies that if an itemset is frequent, its subsets must also be frequent.

 

Frequent Itemset

Ø Frequent itemsets are those items whose support is greater than the threshold value or user-specified minimum support. It means if A & B are the frequent itemsets together, then individually A and B should also be the frequent itemset.

 

Ø Suppose there are the two transactions: A= {1,2,3,4,5}, and B= {2,3,7}, in these two transactions, 2 and 3 are the frequent itemsets.

 

Ø The algorithm identifies "frequent itemsets," which are sets of items that appear together more often than a specified minimum support threshold. 

 

 

Steps for Apriori Algorithm : The steps for the Apriori algorithm:

 

Step-1: Determine the support of itemsets in the transactional database, and select the minimum support and confidence.

 

Step-2: Take all supports in the transaction with higher support value than the minimum or selected support value.

 

Step-3: Find all the rules of these subsets that have higher confidence value than the threshold or minimum confidence.

 

Step-4: Sort the rules as the decreasing order of lift.

 

 

Example: 

Suppose we have the following dataset that has various transactions, and from this dataset, we need to find the frequent itemsets and generate the association rules using the Apriori algorithm:



Step-1: Calculating C1 and L1:

·        In the first step, we will create a table that contains support count (The frequency of each itemset individually in the dataset) of each itemset in the given dataset. This table is called the Candidate set or C1.


·        Now, we will take out all the itemsets that have the greater support count that the Minimum Support (2). It will give us the table for the frequent itemset L1.
Since all the itemsets have greater or equal support count than the minimum support, except the E, so E itemset will be removed.


Step-2: Candidate Generation C2, and L2:

·        In this step, we will generate C2 with the help of L1. In C2, we will create the pair of the itemsets of L1 in the form of subsets.

·        After creating the subsets, we will again find the support count from the main transaction table of datasets, i.e., how many times these pairs have occurred together in the given dataset. So, we will get the below table for C2:

 


·        Again, we need to compare the C2 Support count with the minimum support count, and after comparing, the itemset with less support count will be eliminated from the table C2. It will give us the below table for L2


Step-3: Candidate generation C3, and L3:

·        For C3, we will repeat the same two processes, but now we will form the C3 table with subsets of three itemsets together, and will calculate the support count from the dataset.


·        Now we will create the L3 table. As we can see from the above C3 table, there is only one combination of itemset that has support count equal to the minimum support count. So, the L3 will have only one combination, i.e., {A, B, C}.

 

Step-4: Finding the association rules for the subsets:

·        To generate the association rules, first, we will create a new table with the possible rules from the occurred combination {A, B.C}.

·        For all the rules, we will calculate the Confidence using formula sup( A ^B)/A. 

·        After calculating the confidence value for all rules, we will exclude the rules that have less confidence than the minimum threshold (50%).




As the given threshold or minimum confidence is 50%, so the first three rules A ^B → C, B^C → A, and A^C → B can be considered as the strong association rules for the given problem.

 

Advantages of Apriori Algorithm

·        This is easy to understand algorithm

·        The join and prune steps of the algorithm can be easily implemented on large datasets.

 

Disadvantages of Apriori Algorithm

·        The apriori algorithm works slow compared to other algorithms.

·        The overall performance can be reduced as it scans the database for multiple times.

·        The time complexity and space complexity of the apriori algorithm is O(2D), which is very high. Here D represents the horizontal width present in the database.

 

Applications:

·        Market Basket Analysis: Identifying which products are frequently bought together in supermarkets. 

·        Recommendation Systems: Suggesting products or services based on past purchases or browsing history. 

·        Healthcare: Identifying patterns in disease outbreaks or patient behavior. 

·        Bioinformatics: Analysing DNA and protein sequences. 

































Popular posts from this blog

operators in c programming

2.4 Arrays in c programming

Variables in c