Contributions:
- Development of the RainFM hybrid model, integrating matrix factorization, neural networks, and Bayesian inference.
- Implementation of a stratified data grouping strategy to enhance predictive accuracy.
- Extensive evaluation of baseline and hybrid models, including SVD, ALS, GMF, MLP, and BFM.
Authors: Rainer Feichtinger, Rongxing Liu, Justin Lo, Ruben Schenk
Institution: ETH Zurich, Computational Intelligence Lab
Overview
The RainFM project addresses the challenge of enhancing predictive accuracy in recommender systems by combining multiple collaborative filtering (CF) strategies. By stratifying the dataset based on statistical properties and applying distinct models to each subset, RainFM demonstrates improved accuracy in item recommendation over conventional approaches.
Motivation
While traditional collaborative filtering techniques like SVD and ALS provide a solid foundation for matrix completion tasks, they often overlook nuanced interactions within the dataset. Our objective was to incorporate both linear (SVD, ALS) and non-linear (MLP, GMF) techniques alongside Bayesian Factorization Machines to achieve a more robust predictive model. Additionally, the use of stratified data grouping allowed for more targeted model training, thereby reducing overfitting and enhancing recommendation precision.
Methodology
Data Stratification:
- The dataset of 10,000 users and 1,000 items was partitioned based on the number of ratings per item and average user ratings.
- Each group was treated as a distinct dataset, allowing for model customization and specific hyperparameter tuning.
Baseline Models:
- SVD and ALS formed the core baseline models, providing a standard collaborative filtering approach.
- Bayesian Factorization Machines (BFM) were employed to capture probabilistic dependencies between user-item interactions.
Neural Collaborative Filtering (NCF):
- Implemented GMF and MLP to capture both linear and non-linear relationships.
- NeuFM, a hybrid of GMF and MLP, further enhanced predictive accuracy by combining both strategies.
RainFM Model:
- Combines the best-performing models from each data stratum through a forward selection strategy.
- Weighted averaging of model predictions based on RMSE reduction during validation.
Results
Quantitative Analysis:
- RainFM achieved an RMSE of 0.9695, outperforming all baseline models, including BFM and NeuFM.
- The stratified approach reduced overfitting and maintained predictive accuracy across all data partitions.
- The KNN-augmented BFM model also demonstrated strong performance, albeit at the cost of increased training time.
Model Comparison Table:
Method | RMSE | MAE |
---|---|---|
Item Average | 1.0309 | 0.8398 |
SVD & ALS | 0.9921 | 0.7896 |
Generalized MF | 1.0822 | 0.8795 |
Multi-Layer Perceptron | 1.0029 | 0.8105 |
NeuFM (Pretrained) | 1.0041 | 0.8092 |
BFM Baseline | 0.9777 | 0.7809 |
RainFM | 0.9695 | 0.7714 |
Conclusion
RainFM represents a novel approach to recommender system modeling by integrating stratified data grouping with hybrid model blending. This combination significantly improves predictive accuracy while maintaining computational feasibility. Future work will focus on optimizing the data partitioning strategy and exploring ensemble learning techniques for further accuracy gains.