Hybrid recommendation approach with utility factor in MovieLens.
Large scale technology trends transform the information access model into big data model. Current trend supports data flow in many direction even the user itself provides data, causing it as Big data. Web consists of relational data, text data, semi structured data, graph data and streaming data. Such tremendous volume, velocity and variety require the need of inference based model to avoid information abundance (Kaisler et al, 2013). Time evolving data, distributed mining and hidden data are some of the challenges that our current technology is struggling with (Wu et al, 2014).
Recommendation is an ongoing hot research area and it is available in web with E-Commerce applications that expose the user to huge collection of customized items. The amount of information is growing so fast. Online shoppers cannot explore and compare every possible product (Li et al, 2005). Recommendation systems are useful to alleviate this problem. It is also known as information filtering technology designed to determine items that are most likely to customer tastes. E-business offers many customization supports through ubiquitous technology. The user is overloaded with different websites they looking for. Amazon.com is a well known recommender that customizes its product to the user with Collaborative Filtering (CF) technique. In addition to that it applies data mining techniques like frequent pattern mining, dimensionality reduction techniques, feedback analysis to enhance its functionality.
Recommender systems help the user to choose their favorite from overloaded items in an easy way but their internal functionality is as much complex in real time scenario. They perform heuristic search that utilizes user information and item information for prediction and identifies recommendations autonomously for individual users from past purchase behavior. Existing systems have several challenges like lack of personalization, lack of explicit feedback etc. They process data offline rather than processing data in online. Efficient recommendation system needs a detailed user profile and analyzes user's previous behavior with time evolution.
Initially many recommenders are evaluated and ranked based on their predicting ability. Now it is widely accepted that accurate predictions are crucial. Many recommendation systems anticipate user taste, explore diverse items, and preserve their privacy, fast response with easy interaction. It updates user needs periodically with newly added ratings, items and users. It collaborates with customer profile and item profile for good product suggestion. Different user profiles are compared with each other with the help of an algorithm and used to estimate and predict items that are close to user taste.
Search context identifies set of properties that influence the success of recommender system. Recommender system brings customer into contact with movies that he never seen. Content based approach recommends item to the user based on item details, user details and user need. CF technique works based on the principle that similar user have similar taste. It identifies similar users based on their rating value. For a new active user it recommends products by this approach. Scalability issue is solved with CF technique. LikeMinds is one of the famous application works in this approach. It infers rating on particular item based on information collected from nearest neighbor. Two users are said to be similar based on statistical reasons. Current CF suffers from poor accuracy, scalability, sparse data and big error prediction (Bobadilla et al., 2010).
The first challenge for the recommender system is to identify nearest neighbor correctly. The next challenge is it has to identify neighbor from large available data set in short time span. In the first phase it finds similar users and in the second phase it finds predictions for the items. The items whose ratings exceed threshold are given as recommended product.
In real time big data scenario with huge transactional data poses many challenges in recommender system. Progressive growth of big data needs the enhancement of recommendation system with more scalable and efficient CF technique. In this aspect this paper proposes to construct sub set of sparse matrix for different attributes as a major contribution.
The following sections discuss related work, related techniques with the proposed system architecture. Finally the results are discussed with MovieLens dataset.
1. Related Work:
The scalability and sparsity problem of CF is discussed by (A. Venmathi et al., 2015) and it proposes a personalized recommendation approach that joins user clustering and item clustering technology. The similarity between cluster center and active user, the nearest neighbor of target user are found for analysis. Computation of such approach is expensive and the system goes linearly when the user and item size increases.
Overview of CF method is discussed by (Wu et al., 2015) and its limitations of user based CF are highlighted. This work proposes item based CF with conditional probability and weight adjusting factors. Items with grater similarity than preset threshold are chosen as set of supporting items. The presents of two items measure integral parameter to adjust similarity weights.
The proposed system handles the issue of sparse matrix size and solves the problem by taking the subset of sparse matrix with several attribute groups thus reducing the size of rank matrix.
Advantages of the proposed system:
* SVD approach improves prediction accuracy than K Nearest Neighbor approach.
* The proposed approach reduces number of wrong predictions and user finds relevant result easily and improves search experience.
Recommendation systems actively collect heterogeneous data like item details, user details and transaction details. The system interprets each activity of the user such a way to build its recommendation. It gathers scalable knowledge from ratings and evaluation of items.
A. Recommendation Problem:
The recommendation problem considers a set of users U and set of items I. Let [u.sup.*] be the utility function that measures the usefulness of the item i from I to the user u from U. The utility [u.sup.*] is measured from ratings of items by the user. In real time U x I matrix is sparse since users are likely to vote for small subset of available items. Iu denotes the subset of items rated by user u and Ui denotes subset of users rated item i.
Given the rating score of active user a as Ia, the recommendation system finds rating of user a for the disjoint set of I and Ia.
CF technique tries to predict utility of an item for a user u based on items which are rated by the similar users of u. This metric estimates the probability of selecting the item i from recommended list.
C. Similarity functions:
Similarity functions are commonly used in recommender systems to measure the correlation between two users or items. The weighted arithmetic mean is an averaging function that calculates the predicted rating from set of input ratings. In CF technique the prediction function uses similarity of the user and their weighted arithmetic mean.
3. Existing Approach:
Data collection, ingestion, cleaning, integration, discovery, analysis and delivery are the series of steps followed in recommender systems. The following are existing recommendation algorithms (Ricci et al., 2011).
* User based CF
* Item based CF
* Slope one Recommender algorithm
* Singular Value Decomposition (SVD)
Major step of CF is finding nearest neighbors for active user and use their pattern style for active users prediction. All users will not rank all the items. Item based CF finds similarities between different items in the dataset. These ratings are used to predict ratings for user item pairs not present in the dataset. Pearson correlation coefficient is a useful formula to measure the relationship between two variables (Tang et al., 2013).
4. Proposed Approach:
The proposed system follows hybrid recommendation approach with data reduction technique. The proposed system takes 1682 movies as its movie database. These movies are rated by 943 users with 100000 ratings. The proposed system applies Singular Value Decomposition (SVD) method as dimensionality reduction technique. SVD approach transforms uncorrelated items into a set of related items by expressing relationship among items. It takes high dimensional data items and maps them to a lower dimensional space that exposes the substructure of the original items and maps them with least variation.
User based and Item based CF recommender systems constructs similarity matrix to make predictions for the user. Pearson correlation coefficient and spearman correlation coefficient are followed to construct similarity matrix. This matrix is sparse in nature. In order to reduce its construction complexity the proposed system follows dimensionality reduction technique.
5. System Architecture:
The proposed personalized recommendation system follows hybrid approach with utility. General hybrid approaches alleviate the sparsity problem because it mines many details rather than the two approaches. The proposed system adaptively weights different features as utility depending upon different attribute group. User profile, item profiles and historical ratings are three different sources from MovieLens used in the system. The proposed system efficiently searches the neighboring node with utility value. It is more significant than the co rate relation since it involves different attributes age, gender and personality from user profile. Age and gender are available in the dataset itself and personality feature is considered as the category of the user. The proposed system notices the attributes of u and i in their profiles show better impact for choosing recommendations. The utility function gives important for the items when the user attributes have similar property. For example, similar users with same age group will get higher priority than the other users. The extended utility with greater weight proposes a new way to find good recommendations in dynamic scenario.
System design of the proposed approach is given in Fig. 1.
General movie recommendation dataset consists of user data, item data, rating data and prediction data. List of predictive models and analysis applicable for recommender system is shown in Fig.2.
The CF in MovieLens uses user rating data to construct sparse matrix which make predictions and recommendation according to similarity values. Simplicity and high effective scalability promises for appreciable revenues and benefit for targeted promotions. The sparse data enforces unreliability in several circumstances to achieve better prediction performance. The proposed system classifies user data based on attributes then similar user and items are found. This approach encourages dynamic customization in real time analysis.
The proposed system is developed with MovieLens 100k dataset. It is isolated from normal prediction dataset of MovieLens. The proposed prediction rating is compared with K Nearest Neighbor algorithm based predicition rating. The effectiveness of the approach is found by Root Mean Square Error (RMSE) value.
MovieLens dataset consists of three groups. One with 100 K time stamped user ratings of movies, another with 1 M ratings and the last with 10 M ratings. The rating ranges from 0.5 to 5 stars. MovieLens 1M dataset approximately contains 10 lakh ratings for 3900 movies for around 6000 users.
This work is experimented under two conditions such as prediction rating in absence of utility factor and in presence of utility factor. In the case of absence of utility factor, the prediction rating is found from traditional user item CF technique. Pearson correlation coefficient is used to find similarity between two user u and v. rui and rvi are ratings of user u and v for the item i.
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
The prediction formula for user item based CF is measured from correlation coefficient formula.
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
where v is similar user to u and rvi is rating for item i by user v. This prediction value is compared with the proposed system generated movie ratings. The proposed system generates movies for user ID based on SVD approach. The recommended movie prediction rating is available in MovieLens dataset. These ratings are compared for Root Mean Square Error (RMSE).
Then both approaches are compared for quality and accuracy. In case of higher number of ratings like above 350, both the proposed and existing algorithms show similar results. Their RMSE value does not have appreciable decrease. When the number of ratings becomes less around less, the proposed approach shows appreciable variation in RMSE value. This variation is given in Fig.3.
Number of movies recommended for MovieLens users with their corresponding user ID are given in Fig.4.
This work investigates hybrid recommendation system in a different perspective and presents a novel prediction model. A distinct feature predicts movies with utility factor in presence of user similarity and item similarity. Evaluation shows that this method outperforms previous recommendation methods with better recommendation quality. Features from user profile identifies users multiple phase preferences in consideration of computation, accuracy and flexibility. It is infeasible to find all items and their ranks for a user and to find all ratings of users.
Recommendation algorithms apply a wide variety of approaches, but they have to generalize the processing of user and item information simultaneously. Distributed recommender systems that operate on open source, diversity and privacy preserving recommendation systems are interesting research issues to proceed further.
Article history: Received on Feb 27th 2015
Accepted on March 20th 2015
Available online 15th June 2015
Venmathi, A. and R. Kannan, 2015. A collaborative filtering recommendation algorithm based on user clustering and items clustering. International Journal of Innovative Science and Applied Engineering Research, 13(40): 53-59.
Bobadilla, Jesus, Francisco Serradilla and Jesus Bernal, 2010. A new collaborative filtering metric that improves the behavior of recommender systems. Knowledge-Based Systems, 23(6): 520-528.
Kaisler, Stephen, Frank Armour, J. Alberto Espinosa, and William Money, 2013. Big data: Issues and challenges moving forward. In Proceedings of IEEE 46th Hawaii International Conference on System Sciences (HICSS), pp: 995-1004.
Li, Yu, Liu Lu and Li Xuefeng, 2005. A hybrid collaborative filtering method for multiple-interests and multiple-content recommendation in E-Commerce. Expert Systems with Applications, 28(1): 67-77.
Ricci, Francesco, Lior Rokach, and Bracha Shapira. Introduction to recommender systems handbook. Springer US, 2011.
Tang, Xiangyu, and Jie Zhou, 2013. Dynamic personalized recommendation on sparse data. IEEE Transactions on Knowledge and Data Engineering, 25(12): 2895-2899.
Wu, Haitao, Wen-Kuang Chou, Ningbo Hao, Duan Wang, and Jingfu Li. 2015. Collaborative filtering recommendation based on conditional probability and weight adjusting. International Journal of Computational Science and Engineering, 10(1): 164-170.
Wu, Xindong, Xingquan Zhu, Gong-Qing Wu, and Wei Ding, 2014. Data mining with big data. IEEE Transactions on Knowledge and Data Engineering, 26(1): 97-107.
(1) N. Buvaneswari, (2) S. Bose, (3) A. Kiruthiga
(1) Research Scholar, Department of Computer Science and Engineering, Anna University, Chennai 600 025, India.
(2) Associate Professor, Department of Computer Science and Engineering, Anna University, Chennai 600 025, India.
(3) Research Scholar, Department of Computer Science and Engineering, Anna University, Chennai 600 025, India.
Corresponding Author: N. Buvaneswari, Department of Computer Science and Engineering, College of Engineering, Guindy, Anna University, Chennai 600 025, India.
|Printer friendly Cite/link Email Feedback|
|Author:||Buvaneswari, N.; Bose, S.; Kiruthiga, A.|
|Publication:||Advances in Natural and Applied Sciences|
|Date:||Jul 15, 2015|
|Previous Article:||Wavelet based self learning adaptive dictionary algorithm for image denoising.|
|Next Article:||Performance analysis on energy efficient multimedia streaming techniques using cloud infrastructure.|