Basically, tossed out the old approach of using matrix factorization {basically linear regressions w/ feature-interactions} and just used neural networks to model all this stuff. The paper describes the architecture as two parts: the Generalized Matrix Factorization {GMF} and the Multi-Layer Perceptron {MLP} parts. But from my point of view, at least when reading from a modern context these descriptions are a little overly complicated. Basically the GMF is just learned embedding matrices, and the MLP part is just feed-forward layers.

To simplify, I think you could think of Generalized Matrix Factorization {GMF} as an embedding layer. In the context of the Neural Collaborative Filtering {NCF} framework, GMF learns separate embeddings for users and items, and computes the element-wise product of these embeddings to represent user-item interactions. That being said, usually embedding layers are fed into later layers of the model, but in this case, it was fed directly into the final prediction softmax neuron. Which, I'm guessing, probably isn't ideal when you have sufficient data.

The paper "Neural Collaborative Filtering" by He et al. presents a deep learning approach to collaborative filtering, a popular method used in recommender systems. Collaborative filtering often relies on matrix factorization techniques, which can be limiting in their ability to capture complex user-item interactions. This paper introduces a framework called Neural Collaborative Filtering (NCF), which leverages the power of neural networks to model these interactions more effectively.

The key idea behind NCF is to replace the inner product used in traditional matrix factorization with a multi-layer perceptron {MLP}, which can learn an arbitrary function from data. The MLP is used to learn the user-item interaction function.

The NCF framework consists of two models: Generalized Matrix Factorization {GMF} and Multi-Layer Perceptron {MLP}. GMF is a generalization of matrix factorization that replaces the inner product with an element-wise product followed by a linear layer. MLP uses multiple layers of non-linear functions to learn the user-item interaction function.

The proposed architecture is as follows:

[
\hat{y}*{ui} = \sigma(a*{out}^T(h_u \circ h_i + b_{out})),
]

where:

- (\hat{y}_{ui}) is the predicted rating of user (u) for item (i).
- (\sigma) is the sigmoid function, which maps the output to the range 0,1.
- (a_{out}) is the output layer weight vector.
- (h_u) and (h_i) are the latent vectors of user (u) and item (i), respectively.
- (\circ) denotes the element-wise product of the latent vectors.
- (b_{out}) is the output layer bias.

In addition, the authors propose a fused model called NeuMF, which combines GMF and MLP to better capture the linearity of GMF and non-linearity of MLP. The output layer of the NeuMF model is a weighted sum of the outputs from the GMF and MLP layers.

The authors evaluate the proposed NCF framework on several benchmark datasets and show that it outperforms a number of state-of-the-art methods, demonstrating the effectiveness of applying neural networks to collaborative filtering.

The implications of this work are significant for the field of recommender systems. The NCF framework provides a flexible and powerful tool for modeling user-item interactions, potentially leading to more accurate and personalized recommendations. It also opens up new opportunities for incorporating other types of information, such as textual or visual content, into the recommendation process.