Systematic Review of Machine Learning in Recommendation Systems

July 2024

Dr Phoey Lee Teh presented a paper titled "Systematic Review of Machine Learning in Recommendation Systems Over the Last Decade” at the 2024 Intelligent Computing conference.

If you partake in online shopping or content streaming, you are likely to have encountered product or content recommendations that are tailored to your taste or needs. Dr Teh’s paper delves into the data science behind recommendation systems employed over the last ten years.

How do recommendation systems work?

Put simply, recommendation systems use algorithms to make product or content recommendations to consumers based on past buying/streaming behaviours and preferences. There is a keen interest in the tech world in developing the most sophisticated and accurate recommendation systems possible and data scientists have been using machine learning to optimise these capabilities.

Machine learning analyses large datasets to construct recommendations using various methods including:

Collaborative Filtering

Identifies preferences and behaviours of users with similar tastes and makes recommendations based on these similarities.

Content-Based Filtering

Identifies characteristics and features of products to make recommendations of similar products or services based on shared features, themes or content.

Hybrid Systems

As the name of this approach suggests, a combination of collaborative and content-based filtering is used to heighten recommendation capabilities. The paper details that a hybrid system may, for example, use collaborative filtering to identify groups of users with similar interests and then use content-based filtering to recommend specific products to those groups.

The specific goals and requirements of the system determine which approach is the most suitable to use. Figure 1. provides an overview of the approach utilisation, as identified from the literature review.

Machine Learning – Further Detail

There are several ways in which data is learnt and interpreted:

Supervised Learning

Uses prepared labelled datasets to train algorithms to make recommendations.

Unsupervised Learning

The computer, without human intervention, identifies insights and patterns to make recommendations.

Semi-Supervised Learning

A hybrid approach that combines supervised and unsupervised learning with some labelled datasets.

Reinforcement Learning

Trains software to carry out a sequence of events that take the previous one into consideration, akin to human decision making.

Figure 2. shows the prevalence of unsupervised learning in recommendation systems in the papers examined.

Results

A variety of techniques are used in data analysis, depending on the objective at hand. A ‘K-means’ algorithm is one of the data analysis methods commonly used. This approach enables user datasets to be divided into smaller, more manageable groups that can speed up algorithms and enable greater comparisons between the groups.

It is detailed that the ‘K-means’ algorithm selects the same number of vector points as the desired number of clusters. It then undertakes a process of measuring distances and reevaluating centres until there are no further vector changes in the cluster assignment, therefore establishing the clusters. Figure 3. shows that there are various methods available to measure the distances and similarities between vectors.

Various algorithms have been developed to assess similarities of users, including ‘cosine’ and ‘Pearson’ correlation similarities. The data is turned into numerical formats providing a series of vectors, which are in turn used to determine level of similarity between users. The algorithms are customisable, used to “provide benefits or overcome drawbacks, depending on the situation”. Machine learning also helps address ‘cold start’ problems, whereby new objects or new users lack supporting data which restricts recommendation capabilities. A combination of filtering processes can then be deployed to yield the best recommendations.

The paper concludes that ‘K-means’ is identified as the most frequently employed technique. ‘K-means’ has been applied in various recommendation systems across diverse industries, including the analysis of TV programs. Opportunity for further research in this area is specified. Read the full paper.