Basically just an architecture change from a basic feedforward neural network in the contect of recommendation engines. The architecture consists of some more deep, small layers, plus these 'crossnetworks', which basically multiply an earlier layer's output {or original feature} with the current layer's output dot product, the latter acting as a sort of featureimportance signal. {Somewhat similar idea to a skip connection, but different due to the multiplying}.
Seems like mostly just a fairly simple architectural change that worked well for the problem at hand. But if I had to bet money {well, if I had to bet money without looking up the actual comparisons, if there are any}, I'd bet that the transformerbased approaches perform better than this {and handle sequencebased data much better}. This approach did better on the tested datasets compared to some basic approaches {FFN, logistic gradientboosted decision tree, convolutional neural network, wide & deep, and productbased neural networks)., but that's not saying much.
Sure, I understand. Here's the modified version of the text:
The Deep & Cross Network {DCN} was introduced by Ruoxi Wang et al. in 2017 for ad click predictions. The core idea was to effectively model feature interactions in highdimensional sparse data, which is often seen in online advertising.
The authors argue that traditional deep learning approaches, such as Feedforward Neural Networks {FNN}, can model feature interactions but fail to do so explicitly and efficiently. They argue that FNNs suffer from lowdegree polynomial and high computational complexity. Their solution, the DCN, is designed to explicitly and efficiently capture boundeddegree feature interactions in an automatic fashion.
The DCN combines a 'deep network' {DNN} component and a 'cross network' component to leverage both low and highorder feature interactions. The cross network is responsible for explicit, highorder feature crossing, and the deep network is responsible for implicit, loworder feature crossing.
The architecture of DCN can be visualized as follows:
Input

Embedding

Cross Network > Stacking
 
Deep Network 

Output
Cross Network:
The cross network applies explicit feature crossing. It takes the input features and applies multiple layers of feature crossing, which can be mathematically represented as:
[ x_{l+1} = x_0 \times (x_l \cdot w_l + b_l) + x_l ]
where (x_l) is the (l^{th}) layer's output, (x_0) is the original input vector, (w_l) and (b_l) are the layer's weight and bias, and (\times) and (\cdot) denote elementwise product and dot product, respectively. The dot product (x_l \cdot w_l) can be seen as generating a feature importance vector.
Deep Network:
The deep network is a standard feedforward neural network {FNN}, which implicitly learns feature interactions. The architecture and layer number can be customized based on the application scenario.
Stacking:
The outputs of the cross network and the deep network are stacked together {concatenated} to form the final output. The final output layer is a sigmoid function for binary classification {click or not click}.
The authors empirically demonstrate that the DCN model outperforms traditional models such as LR {Logistic Regression}, GBDT {Gradient Boosting Decision Tree}, and FNN {Feedforward Neural Network} on a largescale ad click dataset. The DCN model is also argued to be more efficient in terms of computational complexity and memory usage.
In terms of implications, this paper provides a significant step forward in handling highdimensional sparse data, which is common in many online applications beyond ad click prediction. The proposed DCN model can efficiently and effectively capture both low and highorder feature interactions in an automatic fashion, without the need for manual feature engineering. This can greatly simplify the process of building models for such data, making it easier to apply deep learning techniques in these domains.
The approach is quite versatile and flexible, which means it could be applied in many other fields beyond advertising, such as recommendation systems, search engines, social networking, and any other area where highdimensional sparse data is common.