Deep & Cross Network {DCN}


Basically just an architecture change from a basic feed-forward neural network in the contect of recommendation engines. The architecture consists of some more deep, small layers, plus these 'cross-networks', which basically multiply an earlier layer's output {or original feature} with the current layer's output dot product, the latter acting as a sort of feature-importance signal. {Somewhat similar idea to a skip connection, but different due to the multiplying}.

Seems like mostly just a fairly simple architectural change that worked well for the problem at hand. But if I had to bet money {well, if I had to bet money without looking up the actual comparisons, if there are any}, I'd bet that the transformer-based approaches perform better than this {and handle sequence-based data much better}. This approach did better on the tested datasets compared to some basic approaches {FFN, logistic gradient-boosted decision tree, convolutional neural network, wide & deep, and product-based neural networks)., but that's not saying much.


Sure, I understand. Here's the modified version of the text:

The Deep & Cross Network {DCN} was introduced by Ruoxi Wang et al. in 2017 for ad click predictions. The core idea was to effectively model feature interactions in high-dimensional sparse data, which is often seen in online advertising.

The authors argue that traditional deep learning approaches, such as Feedforward Neural Networks {FNN}, can model feature interactions but fail to do so explicitly and efficiently. They argue that FNNs suffer from low-degree polynomial and high computational complexity. Their solution, the DCN, is designed to explicitly and efficiently capture bounded-degree feature interactions in an automatic fashion.

The DCN combines a 'deep network' {DNN} component and a 'cross network' component to leverage both low- and high-order feature interactions. The cross network is responsible for explicit, high-order feature crossing, and the deep network is responsible for implicit, low-order feature crossing.

The architecture of DCN can be visualized as follows:

Cross Network --------> Stacking
      |                         |
Deep Network -------------

Cross Network:

The cross network applies explicit feature crossing. It takes the input features and applies multiple layers of feature crossing, which can be mathematically represented as:

[ x_{l+1} = x_0 \times (x_l \cdot w_l + b_l) + x_l ]

where (x_l) is the (l^{th}) layer's output, (x_0) is the original input vector, (w_l) and (b_l) are the layer's weight and bias, and (\times) and (\cdot) denote element-wise product and dot product, respectively. The dot product (x_l \cdot w_l) can be seen as generating a feature importance vector.

Deep Network:

The deep network is a standard feedforward neural network {FNN}, which implicitly learns feature interactions. The architecture and layer number can be customized based on the application scenario.


The outputs of the cross network and the deep network are stacked together {concatenated} to form the final output. The final output layer is a sigmoid function for binary classification {click or not click}.

The authors empirically demonstrate that the DCN model outperforms traditional models such as LR {Logistic Regression}, GBDT {Gradient Boosting Decision Tree}, and FNN {Feedforward Neural Network} on a large-scale ad click dataset. The DCN model is also argued to be more efficient in terms of computational complexity and memory usage.

In terms of implications, this paper provides a significant step forward in handling high-dimensional sparse data, which is common in many online applications beyond ad click prediction. The proposed DCN model can efficiently and effectively capture both low- and high-order feature interactions in an automatic fashion, without the need for manual feature engineering. This can greatly simplify the process of building models for such data, making it easier to apply deep learning techniques in these domains.

The approach is quite versatile and flexible, which means it could be applied in many other fields beyond advertising, such as recommendation systems, search engines, social networking, and any other area where high-dimensional sparse data is common.

Tags: 2017
👁️ 758
you need login for comment