Deep Learning-based Recommendation System (DLRS): From Understanding Scaling Law to Exploring Embedding Compression

Wangjia Zhan

Department of Computer Science
The University of Hong Kong

An illustration of key concepts in our project.

Introduction

Backgroud knowledge of this project, briefly introducing the research gaps, motivations and objectives.

Motivation

Scaling has become a critical factor in driving improvements in machine learning, especially in language models. However, unlike the NLP domain:

existing DLRS models do not exhibit a clear parameter scaling law.
On the data scaling side, though previous works have demonstrated the effectiveness of increasing training data volume, massive embedding tables will cause significant GPU memory consumption, posing a bottleneck to scaling.

Objective

This project addresses these research gaps by leveraging Wukong (an advanced recommendation model) to:

Parameter Scaling: Empirically investigate the parameter scaling laws of the Wukong model and identify optimal hyperparameter settings for scalable performance.
Embedding Compression: Analyze and implement embedding compression techniques to reduce memory usage while preserving accuracy, enabling efficient data scaling.
Efficient Framework: Develop a memory-efficient framework that integrates parameter and data scaling, demonstrating the potential of DLRS scaling under resource constraints.

Methdologies

The methodologies use in this project is straightforward.

Software/hardware

We use PyTorch and TorchRec for model implementation and distributed training. Experiments are conducted on 1-4 NVIDIA A100/A40 GPUs provided by UIUC NCSA,, depending on the scale.

Dataset Setup

This project focuses on the Click-Through Rate (CTR) prediction task. We validate scalability on CTR benchmarks: including Criteo Kaggle (4GB), Avazu, and MovieLens as small datasets, and the Criteo Terabyte (1.3TB) as industrial scale dataset.

Evaluation Strategy

We examine scaling efficiency along two axes: 1. Data Scaling: Train on datasets ranging across three orders of magnitude. 2. Parameter Scaling: Scale embedding tables, MLPs, over-arch layers and hidden embedding dimensions. Performance is measured by Area Under the Curve (AUC), LogLoss, and Normalized cross Entropy loss (NE).

Algorithm Design

We test various embedding compression methods within Wukong while keeping the dense component fixed. The methods include Hashing, Quotient-Remainder (QR), Multi-Embedding Compression (MEmCom), and Binary Code-based Hashing (BCH).

Results

The following is the experiments and results of this project.

Parameter Scaling

The Wukong model demonstrates clear parameter scaling behavior on both small and industrial-scale datasets, outperforming prior DLRS models. We explore multiple scaling schemes, including:

nL: number of embeddings generated by LCB,
nF: number of embeddings generated by FMB
K: Rank of compressed embeddings in optimized FMB
Layer: number of Wukong layers in the Interaction Stack.

We can see a clear trend of performance improvement as the model size increases, more obvious and effective than others (DLRM); and combined outperforms the single-factor scaling schemes

Embedding Compression

To address memory bottlenecks from large embedding tables, we evaluate four compression methods: QR, MEmCom, BCH and Hashing. As shown in Figure 6, test AUC is reported across compression ratios from 4× to 1024×:

QR achieves the best performance across all ratios.
MEmCom and BCH follow closely, maintaining strong performance under compression.
Naive hashing shows the most degradation, especially at high compression ratios.

We further explore scaling under compression by applying QR. Results (Figure 7) show that performance continues to improve with model size—even with compressed embeddings—indicating that scaling laws still hold under compression.

Efficient Tricks

To further enhance memory savings and performance under extreme compression, we apply the following design strategies:

Fusion: Use multiplicative fusion when combining embeddings — it outperforms summation and fully-connected layers.

Threshold: Only compress large embedding tables to avoid unnecessary overhead on small ones.

Hierarchic: Apply coarse-to-fine encoding for very large tables to improve compression efficiency.

FYP24047 WordPress Site

Deep Learning-based Recommendation System (DLRS): From Understanding Scaling Law to Exploring Embedding Compression

Introduction

Motivation

Objective

Methdologies

Software/hardware

Dataset Setup

Evaluation Strategy

Algorithm Design

Results

Parameter Scaling

Embedding Compression

Embedding Compression

Efficient Tricks

Conclusion