BSC Analytics

Architecting a Scalable Voice Cloning Platform on AWS: A Case Study

By Todd Bernson
Chief Technical Officer BSC Analytics

06 Jun 2025

If you've ever found yourself staring at a whiteboard trying to connect the dots between AI workloads, secure infrastructure, and scalability, welcome to my world. This is the story of how I built a fully self-hosted, scalable, and cost-optimized voice cloning platform on AWS using only a few tools: Terraform, containers, and a little grit learned from the Marine Corps and a lifetime under a barbell.

Let me walk you through the choices I made (yes, all of them), the architecture that emerged, and the hilariously non-obvious problems you only find after you're deep into deploying open-source ML models that occasionally throw tantrums like a toddler hyped up on Red Bull.

The Problem: Voice Cloning for Humans, Not Robots

Text-to-speech platforms are everywhere. Some sound like HAL 9000 on decaf. Others are good, but the second you want to use a proprietary voice (like, say, your own), you're either stuck paying by the syllable or signing your data rights away faster than you can say "GDPR."
So I built my own. A fully self-hosted solution using open-source models (shoutout to Tortoise-TTS and its uncanny ability to clone your voice right down to your awkward pauses). But cloning is only part of the fun — delivering that experience at scale, securely, and reliably is where AWS steps into the spotlight.

High-Level Architecture

The stack breaks down like this:

Frontend: Static web app hosted on Amazon S3, served through CloudFront.
Backend API: Deployed on ECS Fargate or Lambda (depending on the workload), behind API Gateway.
Voice Model Serving: Containerized ML model for inference.
Storage: S3 for audio and model artifacts.
Security & Identity: IAM roles, policies, and execution contexts.
Monitoring: CloudWatch for logs and metrics.
Infra: Terraform. Always Terraform.

Everything is defined in code, because if it’s not repeatable and testable, it’s a hobby project — not production-ready.

Frontend: Static Doesn’t Mean Boring

Let’s be honest, most frontends are glorified HTML wrapped in JavaScript sprinkles. Mine isn’t much different, but it’s clean, fast, and lives on S3 with CloudFront doing the content delivery heavy lifting. It’s versioned, integrated into my Terraform code, and invalidates CloudFront caches during deploys so I don’t get support tickets saying “it’s not loading” from someone’s uncle using IE11.

API Layer: Gateway Drug to Lambda or ECS

API Gateway with a VPC Link forwards to a internal load balancer and to EKS deployment.
API Gateway fronts all routes requests based on API parameters. Terraform templates make it trivial to switch execution paths — a small but powerful way to fine-tune cost vs. performance tradeoffs.
And yes, everything is rate-limited, throttled, and logged. Because one day some internal engineer will forget that uploading 200 audio files at once isn't polite.

Voice Model: Running Tortoise, Fast

Tortoise-TTS doesn’t exactly scream efficiency. It’s a brilliant model — and like all brilliant things, it comes with eccentricities. It’s Dockerized, stored in ECR, and run via EKS deployment triggered by events or API calls.
Each task has access to GPU (if needed). To bypass a lot of the S3 presigned URL complexity, S3 is simply mounted to the kubernetes deployment and uses an SA for least privelege. Yes, I do least privilege here. It’s not just a talking point in my security audit — it’s a way of life.

Terraform: The One True Religion

From the IAM role assumptions to VPC peering, subnet creation, and service discovery — everything is codified in Terraform.

Key modules:

aws_s3_bucket
aws_lambda_function
aws_eks_cluster
aws_api_gateway_http_api
aws_cloudwatch_log_group

You can burn it all down and stand it back up in just a few minutes. We work smarter not harder, unlike the Marines which sometimes flipped that around.

IAM: Gatekeeper of Sanity

I treat IAM like a loaded weapon. Every function, container, and service has its own scoped role. S3 buckets enforce object-level permissions. API Gateway uses usage plans and API keys with throttling. There’s no blanket admin access here — even if it makes debugging a little more annoying. It’s worth the tradeoff.

Also: never, ever let a Lambda function assume a role with wildcard permissions. That way lies madness.

Observability: Logs, Metrics, and Catching Fires Early

CloudWatch captures everything:

Lambda logs
EKS logs
Custom metrics for audio generation durations
Alerts for anomalies (latency spikes, task failures, etc.)

You can’t fix what you can’t see. I’ve got dashboards that would make a SOC analyst tear up. And not from joy — from envy.

Real-World Challenges

Running large AI models on AWS is like lifting heavy — it looks cool when it works, but if your form is off, something’s gonna break.

Problems I ran into:

EKS warm-up time was too long for short-lived audio jobs
CloudFront caching had to be fine-tuned to avoid stale UI/UX bugs

Solutions:

Container layers helped deployment move much more quickly.
Readiness probe keeps 5xx errors at bay.
Use CloudFront cache invalidation scripts in CI/CD

Closing Thoughts

Building this platform was part science, part art, and part gym therapy. AWS gave me the tools, Terraform gave me the control, and coffee gave me the persistence.

Would I do it again? Absolutely. But I’d like to remind the next brave soul: just because AWS offers 200+ services doesn’t mean you need all of them. Pick the ones that fit your use case. Glue them together smartly. Monitor everything. Lock it all down.

And if all else fails — lift something heavy, then get back to debugging.

By Todd Bernson, CTO of BSC Analytics, USMC Veteran, and Certified Deadlifter of Ridiculous Cloud Problems

AI and ML, AWS

Architecting a Scalable Voice Cloning Platform on AWS: A Case Study

The Problem: Voice Cloning for Humans, Not Robots

High-Level Architecture

Frontend: Static Doesn’t Mean Boring

API Layer: Gateway Drug to Lambda or ECS

Voice Model: Running Tortoise, Fast

Terraform: The One True Religion

IAM: Gatekeeper of Sanity

Observability: Logs, Metrics, and Catching Fires Early

Real-World Challenges

Closing Thoughts

Related Posts

Related Articles

Inter-Region WireGuard VPN in AWS

Making PDFs Searchable Using AWS Textract and CloudSearch

Slack AI Bot with AWS Bedrock Part 2

Contact Us