BSC Analytics

Batch Inference with SageMaker Endpoints

By Todd Bernson
Chief Technical Officer BSC Analytics

06 Jan 2025

Objective

This article demonstrates how to perform batch inference using SageMaker endpoints, focusing on handling large datasets efficiently by dividing them into manageable chunks and processing predictions in batches.

Batch Processing vs. Real-Time Inference

Real-Time Inference

Designed for single or small sets of inputs.
Provides low-latency predictions ideal for real-time applications (e.g., user-facing APIs).

Batch Processing

Handles large datasets by splitting them into smaller chunks.
Optimized for scenarios where predictions are not time-sensitive, such as offline data analysis or scheduled processing jobs.

Batch inference reduces the overhead of invoking the endpoint repeatedly for individual inputs, making it more efficient for large-scale data.

Preparing Test Datasets for Inference

We prepare the test dataset by loading it into memory and formatting it as required by the SageMaker endpoint. In this case, the endpoint accepts CSV-formatted inputs.

Code Example:

import pandas as pd

# Load test dataset
test_features = pd.read_csv('test_features.csv', header=None)

# Preview test data
print(test_features.head())

Invoking the SageMaker Endpoint in Batches

To perform batch inference, the dataset is divided into chunks (batches), and each batch is sent to the endpoint for predictions. The results are collected and post-processed.

Code for Dividing Datasets into Chunks

The following code processes the test dataset in chunks of 100 rows:

import boto3

# Define batch size
batch_size = 100
predictions = []

# SageMaker runtime client
runtime_client = boto3.client('sagemaker-runtime')

# Process data in batches
for i in range(0, len(test_features), batch_size):
    batch = test_features.iloc[i:i + batch_size]
    batch_data = '\n'.join([','.join(map(str, row)) for row in batch.values])

    # Invoke endpoint
    response = runtime_client.invoke_endpoint(
        EndpointName=predictor.endpoint_name,
        Body=batch_data,
        ContentType='text/csv'
    )

    # Decode predictions
    raw_result = response['Body'].read().decode('utf-8').strip()
    batch_predictions = list(map(float, raw_result.split('\n')))
    predictions.extend(batch_predictions)

print("Batch inference completed.")

Output Example

The predictions are returned as probabilities:

[0.12, 0.87, 0.45, 0.78, 0.23, ...]

These can be thresholded (e.g., > 0.5) to classify inputs into binary categories.

Handling Predictions and Post-Processing

After obtaining the predictions, additional steps may include:

Thresholding probabilities to generate binary classifications.
Merging predictions with input data for reporting.
Saving the results to a file or database for further analysis.

Code Example:

import numpy as np

# Convert probabilities to binary predictions
binary_predictions = (np.array(predictions) > 0.5).astype(int)

# Save predictions to CSV
output = pd.DataFrame({
    'Prediction': predictions,
    'Binary Classification': binary_predictions
})
output.to_csv('batch_predictions.csv', index=False)
print("Predictions saved to batch_predictions.csv")

Visual Representation of the Batch Inference Pipeline

Below is a simplified pipeline for batch inference:

Input Data Preparation:
- Load and preprocess the test dataset.
Batch Processing:
- Split the dataset into smaller chunks.
- Send each chunk to the SageMaker endpoint.
- Collect predictions for each batch.
Post-Processing:
- Combine predictions.
- Threshold probabilities to generate classifications.
- Save the final results for downstream tasks.

Batch inference with SageMaker endpoints provides an efficient way to handle large datasets. By dividing the data into chunks and processing it in batches, we optimize resource utilization while maintaining flexibility.

AI and ML, Machine Learning, Sagemaker, AWS

Batch Inference with SageMaker Endpoints

Objective

Batch Processing vs. Real-Time Inference

Real-Time Inference

Batch Processing

Preparing Test Datasets for Inference

Invoking the SageMaker Endpoint in Batches

Code for Dividing Datasets into Chunks

Output Example

Handling Predictions and Post-Processing

Visual Representation of the Batch Inference Pipeline

Related Posts

Related Articles

Inter-Region WireGuard VPN in AWS

Making PDFs Searchable Using AWS Textract and CloudSearch

Slack AI Bot with AWS Bedrock Part 2

Contact Us