Machine Learning has many techniques for product recommendation like Matrix Factorization, User-User similarity, Item-Item similarity, Content based filtering, etc. All have some pros and cons, so in real world companies generally combine most of them to get a better result.
Here I have created a Magento 2 Product Recommendation module which will recommend based Item-Item similarity. That means it will recommend similar products based on past purchases.
First I have loaded the product, customer and order collection to create the user-item matrix. Here, I have created a binary matrix where 1 means the customer has purchased the product and 0 means otherwise.
For example check the user-item interaction matrix below.
We can get the vector representation of the items from the matrix as shown below,
Magento 2 Company ? Find out More
Then I have calculated the product similarities between each product with cosine similarity technique. I have used the cosine similarity formula because the item vector will be of high dimension. And for high dimensional case cosine similarity perform better as compared to euclidean distance. I have used sciphp/numphp library for this calculation.
For example you can check how we can calculate the cosine similarity between the items using the item vector.
As you can see in the above calculation the cosine-similarity between item 1 and item 2 is higher than the cosine-similarity between item 3 and item 2. That means the item 1 is more similar to item 2 than item 3.
So to the user 5, who has purchased only item 2, we will recommend the item 1. (see the user-item matrix)
Similarly for each customer, I have extracted top k similar products based on past orders and the similarity score. And I have mailed the product recommendations to the customers like shown below,
This is a simple implementation of the recommendation system.
We could combine results from User-User similarity to get more diverse results. Also instead of considering all the orders, we could only use last x days data to get new results. This strategy will be more helpful in case of User-User similarity because the user behaviour changes over time.
There will be issues when we have new items. Because there are going to be very few or zero orders for the new products. So for the new items most or all of the values will be zero in the item vector. And the cosine similarities will be close to zero. This is called cold start problem.
To tackle the cold start problems for new users and new items, we could do content based filtering. Where we use product information such as name, price, category to find similar products.
Thanks for reading the blog.