Data Modernization

AWS Lake Formation: Part 4 Fine-Grained Access Control with Lake Formation and IAM

This series installment on AWS Lake Formation dives deep into fine-grained access control mechanisms, focusing on how Lake Formation integrates with AWS...

Todd Bernson

2024-09-28

This series installment on AWS Lake Formation dives deep into fine-grained access control mechanisms, focusing on how Lake Formation integrates with AWS IAM to enforce detailed security policies.

Let's explore how to implement and manage these controls to ensure precise and secure data access within your data lake environment.

Clone the project repo here.

Understanding Fine-Grained Access Control in Lake Formation

AWS Lake Formation enhances the security features IAM provides, allowing you to define precise access controls to your data lake resources at the column, row, and cell levels. This granularity ensures that users and services can access only the data necessary for their role, enhancing security and compliance.

Key Concepts:

Data Permissions: Control who can access data within tables and columns.
Table and Database Permissions: Manage access at the database or table level, specifying who can create, modify, or delete resources.
Row-Level Security: Apply filters to data queries, ensuring users see only the data they are authorized to view.

Implementing Granular Security Policies with Lake Formation

Configuring granular security policies involves setting up permissions that align with your organization's data governance policies. Lake Formation provides a comprehensive set of tools to manage these permissions effectively.

Terraform Configuration for Lake Formation Permissions:

resource "aws_lakeformation_permissions" "data_access" {
  principal      = aws_iam_role.analyst.arn
  database_name  = aws_glue_catalog_database.financial_data.name
  table_name     = aws_glue_catalog_table.sales_data.name
  permissions    = ["SELECT"]
  permissions_with_grant_option = []
}

resource "aws_lakeformation_permissions" "row_level_security" {
  principal      = aws_iam_role.analyst.arn
  database_name  = aws_glue_catalog_database.financial_data.name
  table_name     = aws_glue_catalog_table.sales_data.name
  column_names   = ["customer_id", "transaction_value"]
  permissions    = ["SELECT"]
  permissions_with_grant_option = []
  row_filter {
    filter_expression = "customer_region = 'EU'"
  }
}

In this example, we set up permissions for an analyst role, allowing them to perform SELECT queries on specific columns of the sales_data table and applying a row filter to restrict data visibility to a certain region.

Integrating IAM with Lake Formation

While Lake Formation provides the tools for fine-grained access control within the data lake, IAM is crucial in managing overall permissions and identities.

Best Practices for IAM Integration:

RBAC: Use IAM roles to manage access rights, associating these roles with Lake Formation permissions.
Least Privilege Principle: Assign the minimum necessary permissions to roles and individuals to reduce the risk of unauthorized data access.

Terraform Configuration for IAM and Lake Formation Integration:

data "aws_iam_policy_document" "lakeformation_policy" {
  statement {
    actions = [
      "glue:CreateDatabase",
      "glue:GetDatabase",
      "glue:UpdateDatabase",
      "glue:DeleteDatabase",
      "glue:CreateTable",
      "glue:GetTable",
      "glue:UpdateTable",
      "glue:DeleteTable",
      "glue:BatchGetJobs",
      "glue:GetJob",
      "glue:StartJobRun",
      "glue:BatchStopJobRun",
      "glue:CreateCrawler",
      "glue:GetCrawler",
      "glue:UpdateCrawler",
      "glue:StartCrawler",
      "glue:StopCrawler"
    ]
    resources = [
      "arn:aws:glue:${var.region}:${data.aws_caller_identity.current.account_id}:catalog",
      "arn:aws:glue:${var.region}:${data.aws_caller_identity.current.account_id}:crawler/${local.environment}",
      "arn:aws:glue:${var.region}:${data.aws_caller_identity.current.account_id}:database/${local.environment}",
      "arn:aws:glue:${var.region}:${data.aws_caller_identity.current.account_id}:job/${local.environment}",
      "arn:aws:glue:${var.region}:${data.aws_caller_identity.current.account_id}:table/${local.environment}/*",
    ]
    effect = "Allow"
  }
  statement {
    effect = "Allow"
    actions = [
      "s3:DeleteObject",
      "s3:GetObject",
      "s3:PutObject",
    ]
    resources = [
      "${data.aws_s3_bucket.bucket.arn}/*"
    ]
  }
  statement {
    effect = "Allow"
    actions = [
      "s3:ListBucket"
    ]
    resources = [data.aws_s3_bucket.bucket.arn]
  }
}

data "aws_iam_policy_document" "lakeformation_role" {
  statement {
    actions = ["sts:AssumeRole"]
    effect  = "Allow"
    principals {
      identifiers = ["lakeformation.amazonaws.com"]
      type        = "Service"
    }
  }
}

locals {
  environment = "${var.environment}_${random_string.this.result}"
}

resource "aws_iam_policy" "lakeformation_service_policy" {
  name        = "${local.environment}_policy"
  description = "Policy that allows sufficient permissions for the crawler"

  policy = data.aws_iam_policy_document.lakeformation_policy.json
}

resource "aws_iam_role" "lakeformation_service_role" {
  name = "${local.environment}_role"

  assume_role_policy = data.aws_iam_policy_document.lakeformation_role.json

  tags = var.tags
}

resource "aws_iam_role_policy_attachment" "lakeformation_service_policy_attachment" {
  role       = aws_iam_role.lakeformation_service_role.name
  policy_arn = aws_iam_policy.lakeformation_service_policy.arn
}

resource "aws_lakeformation_data_lake_settings" "this" {
  admins = [data.aws_iam_session_context.current.issuer_arn]
}

resource "aws_lakeformation_permissions" "caller_catalog_database_permissions" {
  principal   = data.aws_iam_role.terraform.arn
  permissions = ["ALL"]

  database {
    name = aws_glue_catalog_database.this.name
  }
}

resource "aws_lakeformation_permissions" "caller_catalog_table_permissions" {
  principal   = data.aws_iam_role.terraform.arn
  permissions = ["ALL"]

  table {
    database_name = aws_glue_catalog_database.this.name
    wildcard      = true
  }
}

resource "aws_lakeformation_permissions" "glue_catalog_database_permissions" {
  principal = aws_iam_role.glue_service_role.arn
  permissions = [
    "ALTER",
    "CREATE_TABLE",
    "DROP",
  ]

  database {
    name = aws_glue_catalog_database.this.name
  }
}

resource "aws_lakeformation_permissions" "glue_catalog_table_permissions" {
  principal   = aws_iam_role.glue_service_role.arn
  permissions = ["ALL"]

  table {
    database_name = aws_glue_catalog_database.this.name
    wildcard      = true
  }
}

resource "aws_lakeformation_permissions" "s3_data_location_permissions" {
  principal   = aws_iam_role.glue_service_role.arn
  permissions = ["DATA_LOCATION_ACCESS"]

  data_location {
    arn = data.aws_s3_bucket.bucket.arn
  }
}

resource "aws_lakeformation_resource" "this" {
  arn = data.aws_s3_bucket.bucket.arn

  role_arn = aws_iam_role.lakeformation_service_role.arn
}

I used these actual permissions, which would be too broad for use in a production instance.

Implementing fine-grained access control with AWS Lake Formation and IAM provides robust security for your data lake. By leveraging Lake Formation's detailed access control features and IAM's comprehensive identity management capabilities, you can ensure that data within your data lake is secure and compliant with internal and regulatory standards. Using Terraform to manage these configurations as code enhances security and adds a layer of automation that keeps your data governance updated with organizational changes.

Visit my website here.

Todd Bernson

CTO

View all posts

AI/ML

Why Enterprise AI Must Be Application-Led, Not Agent-Led

A deep dive by Todd Bernson, CTO and Chief AI Officer, on why enterprise AI systems should be architected as application-led, deterministic platforms with embedded agentic AI—not fully autonomous agents. This article explains how API-first, governed, multi-channel architectures deliver higher reliability, compliance, scalability, and business value in real-world Fortune-500 environments.

Todd Bernson

2025-12-02

AI/ML

Application-First Agentic AI

Application-first agentic AI is emerging as the only reliable path to real enterprise ROI. In this in-depth analysis, Todd Bernson, CTO & CAIO, breaks down why most generative AI initiatives stall in production—and how disciplined enterprise architecture, deterministic workflows, and narrowly scoped AI agents can finally unlock repeatable business value. Using a real sprint-intelligence system as a case study, the article shows how organizations can combine serverless engineering, structured orchestration, and constrained LLM reasoning to reduce reporting effort, increase trust, eliminate hallucinations, and deliver actionable insights across engineering, operations, compliance, and customer experience.

Todd Bernson

2025-11-28

AI/ML

Why 95% of AI Projects Fail and How to Be Among the 5% That Succeed

Lee Hylton

2025-08-22

AWS Lake Formation: Part 4 Fine-Grained Access Control with Lake Formation and IAM

Understanding Fine-Grained Access Control in Lake Formation

Implementing Granular Security Policies with Lake Formation

Integrating IAM with Lake Formation

Read More

Why Enterprise AI Must Be Application-Led, Not Agent-Led

Application-First Agentic AI

Why 95% of AI Projects Fail and How to Be Among the 5% That Succeed