Skip to content
Automated Azure Machine Learning Environment Setup Using Terraform
todd-bernson-leadership

This machine learning project required a scalable infrastructure to support training and deployment. Azure Machine Learning (Azure ML) provided a rich platform for managing these workloads, and setting up its environment must be done with IaC. This article shows how we automated the deployment of an Azure Machine Learning environment using Terraform, enabling consistency, speed, and reliability.


Why Automate with Terraform?

Terraform is an Infrastructure as Code (IaC) tool that allows you to define your cloud resources declaratively. It provides:

  • Reproducibility: Deploy the same setup across environments with minimal effort.
  • Version Control: Track and manage changes to your infrastructure.
  • Scalability: Easily modify configurations to meet growing project demands.

Step-by-Step Setup Guide

Step 1: Define Terraform Configuration

To begin, create a provider.tf file to configure the Azure provider.

provider "azurerm" {
  features {}

  subscription_id = "your-subscription-id"
}

Step 2: Set Up the Resource Group

The resource group serves as a logical container for all Azure resources. Add the following configuration to create a new resource group:

resource "azurerm_resource_group" "this" {
  name     = var.environment
  location = var.location

  tags = var.tags
}

Step 3: Provision Supporting Services

Azure ML requires supporting services such as a key vault, application insights, a storage account, and a container registry. Here’s how you can define them:

Key Vault

resource "azurerm_key_vault" "this" {
  name = local.environment

  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name

  tenant_id = data.azurerm_client_config.current.tenant_id
  sku_name  = lower(var.pricing_sku)

  purge_protection_enabled = false

  tags = var.tags
}

Application Insights

resource "azurerm_application_insights" "ml_insights" {
  name = "${local.environment}-insights"

  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name

  application_type = "web"

  tags = var.tags
}

Storage Account

resource "azurerm_storage_account" "this" {
  name = "${replace(var.environment, "_", "")}${random_string.unique.result}"

  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name

  account_replication_type = var.storage_replication
  account_tier             = var.pricing_sku

  tags = var.tags
}

Container Registry

resource "azurerm_container_registry" "ml_acr" {
  name = "${replace(var.environment, "_", "")}${random_string.unique.result}"

  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name

  sku           = var.pricing_sku
  admin_enabled = true

  tags = var.tags
}

Step 4: Deploy the Azure Machine Learning Workspace

The Azure ML Workspace is the core resource for managing models, experiments, and compute resources. Add the following configuration to your Terraform file:

resource "azurerm_machine_learning_workspace" "ml_workspace" {
  name = "${local.environment}-workspace"

  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name

  application_insights_id       = azurerm_application_insights.ml_insights.id
  container_registry_id         = azurerm_container_registry.ml_acr.id
  key_vault_id                  = azurerm_key_vault.this.id
  storage_account_id            = azurerm_storage_account.this.id
  public_network_access_enabled = true

  identity {
    type = "SystemAssigned"
  }

  tags = var.tags
}

Step 5: Execute Terraform Commands

  1. Initialize Terraform:
    Run the following command to initialize Terraform and set up the backend:

    terraform init
    
  2. Validate the Configuration:
    Ensure your configuration is correct by running:

    terraform validate
    
  3. Plan the Deployment:
    Generate an execution plan to see the resources that will be created:

    terraform plan -out=plan.out
    
  4. Apply the Deployment:
    Deploy the resources to Azure by running:

    terraform apply plan.out
    

Common Issues and Solutions

  • Permission Denied Errors:
    Ensure your Azure account has sufficient permissions to create the required resources.

  • Name Conflicts:
    Resource names in Azure must be unique. Use dynamic naming (e.g., append random strings) to avoid conflicts.

  • Terraform State Issues:
    Use a remote backend (e.g., S3 or Azure Blob Storage) to manage Terraform state securely across teams.


Cost Optimization Tips

  1. Scale Down Idle Resources:
    Configure min_node_count = 0 for compute clusters to reduce costs when not in use.

  2. Use Reserved Instances:
    For long-term projects, reserved instances can save up to 72% on compute costs.

  3. Monitor Spending:
    Leverage Azure Cost Management to track and control expenses.

  4. Tag Resources:
    Apply consistent tags (e.g., environment, cost-center) to track resource usage and allocate costs effectively.


Deploying Azure Machine Learning environments with Terraform simplifies the setup process, reduces errors, and ensures consistency. By automating the deployment of supporting services like key vaults, storage accounts, and container registries, you can focus more on developing and deploying machine learning models. This approach not only saves time but also improves scalability and reliability.

Related Articles

Inter-Region WireGuard VPN in AWS

Read more

Making PDFs Searchable Using AWS Textract and CloudSearch

Read more

Slack AI Bot with AWS Bedrock Part 2

Read more

Contact Us

Achieve a competitive advantage through BSC data analytics and cloud solutions.

Contact Us