AI/ML

Call Center Analytics: Part 2 -Implementing Amazon Transcribe for Call Transcription

In call center analysis, audio transcription into text plays a huge role. It provides a written record of customer interactions and serves as the founda...

Todd Bernson

2024-10-03

In call center analysis, audio transcription into text plays a huge role. It provides a written record of customer interactions and serves as the foundation for further analysis like sentiment assessment and summarization.

Amazon Transcribe is a powerful service within AWS that offers an automatic speech recognition (ASR) facility that can transcribe voice-to-text with high accuracy. This article shows how we jumped into the intricacies of leveraging Amazon Transcribe to revolutionize call center operations.

Check out the code repo here.

Getting Started with Amazon Transcribe

Integrating Amazon Transcribe with AWS services like S3 and Lambda enables automatic transcription initiation upon call recording uploads. The integration follows a trigger-based approach.

Automating Transcription with Lambda

When a new call recording is dropped into an S3 bucket, a Lambda function is triggered to start the transcription job. The function calls the StartTranscriptionJob API of Amazon Transcribe.

Lambda Function Snippet

for record in event['Records']:
    source_bucket_name = record['s3']['bucket']['name']
    key = unquote_plus(record['s3']['object']['key'])
    file_uri = f's3://{source_bucket_name}/{key}'

    transcribe_job_name = f"Transcription-{datetime.now().strftime('%Y%m%dT%H%M%S')}"
    transcribe_client.start_transcription_job(TranscriptionJobName=transcribe_job_name,
        Media={'MediaFileUri': file_uri}, MediaFormat='mp3', LanguageCode='en-US',
        OutputBucketName=TRANSCRIBE_S3_BUCKET, Settings={'ShowSpeakerLabels': True, 'MaxSpeakerLabels': 2})

    while True:
        status = transcribe_client.get_transcription_job(TranscriptionJobName=transcribe_job_name)
        if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
            break

    if status['TranscriptionJob']['TranscriptionJobStatus'] == 'COMPLETED':
        transcript_key = f"{transcribe_job_name}.json"
        transcript_response = s3_client.get_object(Bucket=TRANSCRIBE_S3_BUCKET, Key=transcript_key)
        transcript = json.loads(transcript_response['Body'].read().decode('utf-8'))
        full_text = process_transcript(transcript['results'], transcript['results']['speaker_labels'])

        summary_response = invoke_bedrock_model(full_text)
        summary = summary_response.get('completion', '')

Handling Different Languages

Transcribe can automatically handle language detection and transcription

Custom Vocabulary

You can create a custom vocabulary to guide the transcription process toward industry-specific terms or colloquialisms common in your call recordings.

Post-Processing of Transcripts

After transcription, the text often requires cleaning and formatting. Implementing a post-processing Lambda function enables us to refine transcripts before they are processed for sentiment analysis or stored.

We break it down by user, break it apart by time to get sentiment analysis, and also summarize it.

Lambda Post-Processing Example

def process_transcript(transcript, speaker_labels):
    dialogue_entries = []
    last_speaker = None

    for segment in speaker_labels['segments']:
        speaker_label = segment['speaker_label']
        speaker_name = {"spk_0": "Customer", "spk_1": "Agent"}[speaker_label]

        if last_speaker != speaker_label:
            if last_speaker is not None:
                dialogue_entries.append("\n")
            dialogue_entries.append(f"{speaker_name}:")
            last_speaker = speaker_label

        segment_dialogue = ""

        for item in segment['items']:
            word_info = next((word for word in transcript['items'] if
                              'start_time' in word and word['start_time'] == item['start_time']), None)
            if word_info and 'alternatives' in word_info and len(word_info['alternatives']) > 0:
                if segment_dialogue:
                    segment_dialogue += " "
                segment_dialogue += word_info['alternatives'][0]['content']

        if segment_dialogue:
            dialogue_entries.append(f" {segment_dialogue}")

    formatted_script = "".join(dialogue_entries)
    return formatted_script

Amazon Transcribe has reshaped how call centers approach the transcription of their audio records. Call centers can enhance their operational efficiency by utilizing AWS Lambda for automation, creating custom vocabularies for accuracy, and employing post-processing functions for refinement. The transcription process is not just about converting speech to text; it's the first step toward a comprehensive understanding of customer interactions.

The successful implementation of Amazon Transcribe within a call center's workflow promises not just a record of what was said but a gateway to deeper insights into the voice of the customer.

Check out my website here.

Todd Bernson

CTO

View all posts

AI/ML

Why Enterprise AI Must Be Application-Led, Not Agent-Led

A deep dive by Todd Bernson, CTO and Chief AI Officer, on why enterprise AI systems should be architected as application-led, deterministic platforms with embedded agentic AI—not fully autonomous agents. This article explains how API-first, governed, multi-channel architectures deliver higher reliability, compliance, scalability, and business value in real-world Fortune-500 environments.

Todd Bernson

2025-12-02

AI/ML

Application-First Agentic AI

Application-first agentic AI is emerging as the only reliable path to real enterprise ROI. In this in-depth analysis, Todd Bernson, CTO & CAIO, breaks down why most generative AI initiatives stall in production—and how disciplined enterprise architecture, deterministic workflows, and narrowly scoped AI agents can finally unlock repeatable business value. Using a real sprint-intelligence system as a case study, the article shows how organizations can combine serverless engineering, structured orchestration, and constrained LLM reasoning to reduce reporting effort, increase trust, eliminate hallucinations, and deliver actionable insights across engineering, operations, compliance, and customer experience.

Todd Bernson

2025-11-28

AI/ML

Why 95% of AI Projects Fail and How to Be Among the 5% That Succeed

Lee Hylton

2025-08-22

Call Center Analytics: Part 2 -Implementing Amazon Transcribe for Call Transcription

Getting Started with Amazon Transcribe

Automating Transcription with Lambda

Lambda Function Snippet

Handling Different Languages

Custom Vocabulary

Post-Processing of Transcripts

Lambda Post-Processing Example

Read More

Why Enterprise AI Must Be Application-Led, Not Agent-Led

Application-First Agentic AI

Why 95% of AI Projects Fail and How to Be Among the 5% That Succeed