Basically, learned user and item embeddings are learned, based on a user-item interaction graph is represented as an adjacency matrix (A), where the element (A_{ij}) is 1 if user (i) interacted with item (j), and 0 otherwise. You end up with an embedding matrix: $(N_{users} + N_{items}) X N_{embedding dimensions}$ {could be thought of as two separate ones, for clarity}. Then the dot-product of your given user and item is taken during prediction to predict interactions.
Basically, because your output embedding dimension is much smaller than the number of items or the number of users {generally}, the model can't afford to just memorize interactions. As a result, during training users and items are implicitly clustered, so that - for example, baby products all produce similar embedding scores, because a user who buys baby products is likely to have interacted with other baby products. Similarly, such a user would get an embedding that represents 'person who tends to buy baby products', among other things.
Personally, I'm not really sure I understand why this is called a 'Graph Neural Network'. I mean, it sounds fancy, and that's nice. But what's the difference between it and just a neural network with a learned embedding for input users and items? I literally can't tell, if I'm missing something obvious, tweet me at @hillarymsanders.
The architecture of the proposed NGCF model is formed by aggregating embeddings propagated in the user-item interaction graph. The authors use a graph neural network {GNN} to capture the collaborative filtering {CF} signals from the graph structure of the user-item interactions.
The user-item interaction graph is represented as an adjacency matrix (A), where the element (A_{ij}) is 1 if user (i) interacted with item (j), and 0 otherwise. The graph is undirected and self-connected, meaning it has self-loop edges.
Given the adjacency matrix (A), the initial embedding of user (i) or item (i) is represented as (e_i^0), and the model iteratively updates the embedding based on the adjacency of nodes in the graph. The updating rule for each iteration (l) can be described as:
[ e_i^{(l+1)} = LeakyReLU \left( \sum_{j \in \mathcal{N}_i} \frac{1}{\sqrt{|\mathcal{N}_i||\mathcal{N}_j|}} (W_1 e_j^{(l)} + W_2 (e_j^{(l)} \odot e_i^{(l)})) \right) ]
where (e_i^{(l+1)}) is the updated embedding of node (i) at the (l+1) iteration, (\mathcal{N}_i) denotes the first-order neighbors of node (i), (W_1) and (W_2) are learnable weight matrices, and (\odot) denotes element-wise multiplication. The normalized factor (\frac{1}{\sqrt{|\mathcal{N}_i||\mathcal{N}_j|}}) is used to avoid the scale of updates growing with the number of neighbors.
The model is trained by optimizing a pairwise BPR {Bayesian Personalized Ranking} loss, which is a widely used loss function for implicit feedback recommendation. Given a triplet ((u, i, j)) where user (u) interacted with item (i) but not with item (j), the BPR loss is defined as:
[ -\sum_{(u,i,j) \in \mathcal{D}} \log \sigma(e_u^{(K)} \cdot e_i^{(K)} - e_u^{(K)} \cdot e_j^{(K)}) ]
where (\mathcal{D}) denotes the training set, (e_u^{(K)}), (e_i^{(K)}), and (e_j^{(K)}) are the final embeddings of user (u) and items (i) and (j) after (K) iterations, and (\sigma) is the sigmoid function.
The NGCF model has several significant implications:
-item interactions, leading to better recommendation performance.
Learning on Graph Structure: By leveraging GNNs, the model learns embeddings of users and items based directly on the graph structure of the user-item interaction graph. This is a departure from conventional matrix factorization-based CF methods, which learn from the user-item interaction matrix. This approach allows the model to better preserve the topological properties of the interaction graph and can lead to more accurate recommendations.
Applicability to Implicit Feedback: The model is designed for and tested on datasets with implicit feedback, such as click data. Implicit feedback is more abundant and easier to collect than explicit feedback {like ratings}, but it is also noisier and more challenging to model. The success of NGCF on such datasets indicates its effectiveness in handling implicit feedback.
Scalability: NGCF is a scalable model, as it only needs to compute embeddings for the nodes in the graph, which correspond to the users and items. It doesn't need to compute a separate embedding for each user-item pair, which makes it more scalable for large-scale recommendation tasks.
In summary, "Neural Graph Collaborative Filtering" {NGCF} by Xiang Wang et al. represents a significant advance in the field of recommendation systems. It proposes a novel approach that leverages the strengths of Graph Neural Networks {GNNs} to capture high-order connectivities in the user-item interaction graph and provide more accurate recommendations, especially in the context of implicit feedback. The research opens up new avenues for the integration of GNNs in building more effective and scalable recommendation systems.