NORMA eResearch @NCI Library

Pre-processing Techniques for Optimizing Association Rule Mining Algorithms

Yeramwar, Saurabh Shantkumar (2022) Pre-processing Techniques for Optimizing Association Rule Mining Algorithms. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
PDF (Master of Science)
Download (1MB) | Preview
[thumbnail of Configuration manual]
PDF (Configuration manual)
Download (1MB) | Preview


Association Rule Mining (ARM) algorithms are machine learning techniques for the discovery of relationships between variables in datasets. Market basket analysis is one of the applications of association rule mining. It generates rules that give information about how frequently an item gets picked by customers depending on the current products present in the kart. The Apriori algorithm is perhaps the most well-known association rule mining algorithm, but it is computationally inefficient. There are multiple improvements to the original Apriori which take less time and less computational power to generate rules. ARM algorithms focus on frequent rules, at the expense of less frequent sets. Infrequent sets are however important as they can give clues on rare events, such as in anomaly detection, fraud, and other interesting customer behaviour. ARM algorithms are inefficient to handle rarely occurring rules in the data, because these algorithms generate all the possible rules and then filters out the rules related to the specific item. This process takes a bit longer time as it generates all the rules. To avoid this delay pre-processing of the data can be done with the help of clustering techniques. With this approach, data will get clustered first and then feed the ARM algorithms with data from small clusters. The proposed pre-processing idea is applied before running the Apriori and FP-Growth algorithms. We show the pre-processing step helps in reduction the execution time 31.6% and 35.8%, respectively. It drastically reduces memory consumption, and time and computational power to generate infrequent rules.

Item Type: Thesis (Masters)
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
H Social Sciences > HF Commerce > Electronic Commerce
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Tamara Malone
Date Deposited: 14 Mar 2023 15:50
Last Modified: 14 Mar 2023 15:50

Actions (login required)

View Item View Item