Introduction:
In the world of digital advertising, click prediction plays a pivotal role in determining the success of ad campaigns. Advertisers aim to serve their ads to users who are most likely to engage with them, which makes building an efficient ad click prediction machine learning system crucial. In this blog, we will delve into the problem statement and metrics designed to create an effective click prediction system.
Problem Statement:
The primary goal is to develop a machine learning model capable of predicting whether an ad will be clicked. While the ad click prediction process can involve complex cascades of classifiers in the AdTech industry, we will focus on a simplified scenario for the sake of clarity.
Background:
Before moving forward, it's essential to understand the background of ad serving. Ad requests follow a waterfall model, where publishers attempt to sell their ad inventory directly to advertisers at a high Cost Per Million (CPM). If they are unable to do so, they pass the impression to other ad networks until it is eventually sold.
Metrics Design and Requirements:
Metrics are critical for evaluating the performance of our ad click prediction model. We will consider both offline metrics used during the training phase and online metrics used after deployment.
Metrics:
Offline Metrics:
During model training, we focus on machine learning metrics rather than revenue or Click-Through Rate (CTR) metrics. Here are the primary offline metrics:
- Normalized Cross-Entropy (NCE): NCE is the predictive logloss divided by the cross-entropy of the background CTR. It is designed to be insensitive to background CTR and is computed using a specific formula.
Online Metrics:
After deployment, we shift our focus to online metrics that assess real-world impact. The key online metric is:
- Revenue Lift: This metric measures the percentage change in revenue over a defined period. To deploy a new model, it is introduced to a small percentage of ad traffic during an A/B testing phase. Balancing the percentage of traffic and the duration of the A/B testing phase is crucial to determine the model's effectiveness.
Requirements:
Training:
To build an effective ad click prediction system, we must address specific training requirements:
- Imbalanced Data: In practice, Click-Through Rate (CTR) is typically low, around 1%-2%. Handling imbalanced data is essential for supervised training.
- Retraining Frequency: The model needs to be retrained multiple times within a day to capture changes in data distribution in the production environment.
- Train/Validation Data Split: To simulate a production system accurately, training and validation data must be partitioned by time, reflecting the sequential nature of ad serving.
Inference:
The inference phase also comes with distinct requirements:
- Serving Latency: Ad predictions must be made with low latency, typically ranging from 50ms to 100ms. Users expect quick responses when viewing web content with embedded ads.
- Latency Considerations: Ad requests pass through a waterfall model, making it crucial for the recommendation latency of the machine learning model to be fast and responsive.
- Overspending Prevention: To avoid over-spending campaign budgets and protect the interests of advertisers and publishers, the ad serving model should have mechanisms in place to control or prevent overspending.
In summary, building an ad click prediction machine learning system involves addressing various challenges and requirements. Key goals include achieving reasonable normalized cross-entropy and Click-Through Rate (CTR) metrics, handling imbalanced data, ensuring high throughput for model retraining, achieving low-latency ad predictions, and implementing measures to control or avoid overspending campaign budgets. By meeting these objectives, the system can effectively predict ad clicks, optimize ad campaigns, and enhance the overall digital advertising experience.
Ad Click Prediction Model
Model, Feature Engineering, and Training:
In our journey to build an effective ad click prediction machine learning system, it's crucial to delve into the heart of the system—model selection, feature engineering, and the data collection and training process.
A. Feature Engineering:
Feature engineering is a critical step in preparing the data for training and ensuring the model's performance. Here are some key features and considerations:
Features Description
- AdvertiserID: Due to the vast number of advertisers, it's essential to handle this feature efficiently. Techniques such as feature hashing or embedding can be applied to manage the diversity of AdvertiserIDs effectively.
- User's Historical Behavior: This feature represents the user's historical interactions with ads, such as the number of clicks over a specific period. To make this feature informative, feature scaling techniques like normalization can be applied to bring the values within a consistent range.
- Temporal Features: Time-related features such as time_of_day and day_of_week can provide valuable insights into when users are most likely to click on ads. These features are typically one-hot encoded to capture their cyclic nature.
- Cross Features: Creating cross-features by combining multiple features can help capture complex relationships between variables. For example, combining user behavior with advertiser information can reveal patterns that may influence ad clicks.
B. Training Data:
Before diving into model selection, we need to gather and prepare training data. This data is the foundation upon which our machine learning model will be built. Here are some considerations for collecting and preparing training data:
- Data Collection Period: The first step is to select a period of historical data to use for training. This can range from the last month to the last six months or more. Striking a balance between the duration of data and model accuracy is crucial.
- Handling Imbalanced Data: As mentioned earlier, the Click-Through Rate (CTR) is typically low (around 1%-2%). To address this imbalance, we downsample the negative data, ensuring that the dataset is representative of the real-world scenario.
C. Model Selection:
Model selection is a pivotal decision in building the ad click prediction system. Here are some considerations for model selection:
- Deep Learning: In a distributed setting, deep learning models can be effective for click prediction. Starting with fully connected layers and applying the Sigmoid activation function to the final layer is a common approach.
- Addressing Data Imbalance: Given the imbalanced nature of the data, it's essential to resample the training dataset to mitigate the effects of data imbalance. However, it's crucial to leave the validation and test datasets intact to obtain accurate performance estimations.
D. Evaluation:
The evaluation phase is where we assess the model's performance and make crucial decisions regarding hyperparameters and model training. Here are some evaluation strategies:
- Splitting Data: One approach is to split the data into training and validation sets to assess the model's performance on unseen data. This helps tune hyperparameters and make necessary adjustments.
- Replay Evaluation: To avoid biased offline evaluation, another approach is to conduct replay evaluation. This involves using test data from a specific time period, reordering their ranking based on the model's predictions during inference, and recording matches to evaluate click predictions accurately.
- Hyperparameter Tuning: During evaluation, it's essential to explore various hyperparameters, including the size of the training dataset, the frequency of model retraining, and other relevant settings. Fine-tuning these parameters can significantly impact the model's effectiveness.
Ads Recommendation System Design
1. Calculation and Estimation: Understanding the Basics
Assumptions and Data Size:
Before diving into the technicalities, let’s establish our baseline. We assume a staggering 40,000 ad requests per second, translating to approximately 100 billion requests monthly. Each ad request record, laden with hundreds of features, takes about 500 bytes of storage. Given these figures, the data size quickly adds up. For instance, with an estimated 1% Click-Through Rate (CTR), we have around 1 billion clicked ads. For a starting point, we consider a month's data for training and validation, summing up to a massive 50 Petabytes (PB). To manage this data effectively, strategies like downsampling (retaining only 1%-10% of data) or using a week’s data for training and the next day for validation are crucial.
Scaling Considerations:
The system is designed to support up to 100 million users, a scale that demands robust infrastructure and efficient data handling.
2. High-Level Design: Crafting the Architecture
Data Lake and Batch Data Prep:
Our system's foundation lies in a data lake, where data from varied sources, like log data or event-driven data (via Kafka), is stored. The batch data preparation involves a series of ETL (Extract, Transform, Load) jobs, channeling data into the Training Data Store.
Batch Training Jobs and Model Store:
Organized into scheduled or on-demand jobs, batch training jobs are vital for retraining new models based on the data stored. The models themselves are housed in a distributed storage system like Amazon S3.
Ad Candidates and Stream Data Prep Pipeline:
Ad candidates are sourced from upstream services, and a stream data prep pipeline processes online features, storing them in key-value storage for low latency downstream processing.
Model Serving:
This standalone service is responsible for loading various models and providing Ad Click probabilities, a crucial step in the recommendation process.
3. System Workflow:
Client Interaction:
The process initiates when a user sends an ad request to the Application Server, which, in turn, contacts the Candidate Generation Service. This service generates a list of Ad Candidates, forwarding it to the Aggregator Service.
Ad Ranking and Selection:
The Aggregator Service plays a pivotal role, distributing the candidate list to Ad Ranking workers for scoring. These workers fetch the latest model, retrieve the relevant features, score the ads, and return the scored list to the Aggregator Service, which then selects the top ads for display.
4. Scaling the Design:
Given the stringent latency requirements (50ms-100ms) for processing a large volume of Ad Candidates (50k-100k), scaling out the Model Serving and employing an Aggregator Service to distribute the load becomes essential. This approach ensures adherence to the Service Level Agreement (SLA).
5. Follow-up Questions:
Adapting to User Behavior:
To keep up with changing user behaviors, the system must retrain the model frequently, potentially every few hours, using newly collected data.
Handling Under-explored Ad Ranking Models:
Introducing a degree of randomization in the Ranking Service can address this. For instance, allocating 2% of requests to receive random candidates can ensure a comprehensive exploration of ad effectiveness.
6. Summary:
Designing an Ads Recommendation System is a complex yet rewarding endeavor. It requires careful consideration of data scale, system architecture, and continuous adaptation to user behavior. By understanding these components, businesses can significantly enhance the relevance and effectiveness of their advertising strategies.
As your trusted technology consultant, we are here to assist you.