Skip to content
Call Center Analytics: Part 2 -Implementing Amazon Transcribe for Call Transcription

In call center analysis, audio transcription into text plays a huge role. It provides a written record of customer interactions and serves as the foundation for further analysis like sentiment assessment and summarization. Amazon Transcribe is a powerful service within AWS that offers an automatic speech recognition (ASR) facility that can transcribe voice-to-text with high accuracy. This article shows how we jumped into the intricacies of leveraging Amazon Transcribe to revolutionize call center operations.

Check out the code repo here.

Getting Started with Amazon Transcribe

Integrating Amazon Transcribe with AWS services like S3 and Lambda enables automatic transcription initiation upon call recording uploads. The integration follows a trigger-based approach.

Automating Transcription with Lambda

When a new call recording is dropped into an S3 bucket, a Lambda function is triggered to start the transcription job. The function calls the StartTranscriptionJob API of Amazon Transcribe.

Lambda Function Snippet

 for record in event['Records']:

        source_bucket_name = record['s3']['bucket']['name']

        key = unquote_plus(record['s3']['object']['key'])

        file_uri = f's3://{source_bucket_name}/{key}'



        transcribe_job_name = f"Transcription-{datetime.now().strftime('%Y%m%dT%H%M%S')}"

        transcribe_client.start_transcription_job(TranscriptionJobName=transcribe_job_name,

            Media={'MediaFileUri': file_uri}, MediaFormat='mp3', LanguageCode='en-US',

            OutputBucketName=TRANSCRIBE_S3_BUCKET, Settings={'ShowSpeakerLabels': True, 'MaxSpeakerLabels': 2})



        while True:

            status = transcribe_client.get_transcription_job(TranscriptionJobName=transcribe_job_name)

            if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:

                break



        if status['TranscriptionJob']['TranscriptionJobStatus'] == 'COMPLETED':

            transcript_key = f"{transcribe_job_name}.json"

            transcript_response = s3_client.get_object(Bucket=TRANSCRIBE_S3_BUCKET, Key=transcript_key)

            transcript = json.loads(transcript_response['Body'].read().decode('utf-8'))

            full_text = process_transcript(transcript['results'], transcript['results']['speaker_labels'])



            summary_response = invoke_bedrock_model(full_text)

            summary = summary_response.get('completion', '')

Handling Different Languages

Transcribe can automatically handle language detection and transcription

Custom Vocabulary

You can create a custom vocabulary to guide the transcription process toward industry-specific terms or colloquialisms common in your call recordings.

Post-Processing of Transcripts

After transcription, the text often requires cleaning and formatting. Implementing a post-processing Lambda function enables us to refine transcripts before they are processed for sentiment analysis or stored.

We break it down by user, break it apart by time to get sentiment analysis, and also summarize it.

Lambda Post-Processing Example

def process_transcript(transcript, speaker_labels):

    dialogue_entries = []

    last_speaker = None



    for segment in speaker_labels['segments']:

        speaker_label = segment['speaker_label']

        speaker_name = {"spk_0": "Customer", "spk_1": "Agent"}[speaker_label]



        if last_speaker != speaker_label:

            if last_speaker is not None:

                dialogue_entries.append("\n")

            dialogue_entries.append(f"{speaker_name}:")

            last_speaker = speaker_label



        segment_dialogue = ""



        for item in segment['items']:

            word_info = next((word for word in transcript['items'] if

                              'start_time' in word and word['start_time'] == item['start_time']), None)

            if word_info and 'alternatives' in word_info and len(word_info['alternatives']) > 0:

                if segment_dialogue:

                    segment_dialogue += " "

                segment_dialogue += word_info['alternatives'][0]['content']



        if segment_dialogue:

            dialogue_entries.append(f" {segment_dialogue}")



    formatted_script = "".join(dialogue_entries)

    return formatted_script

Amazon Transcribe has reshaped how call centers approach the transcription of their audio records. Call centers can enhance their operational efficiency by utilizing AWS Lambda for automation, creating custom vocabularies for accuracy, and employing post-processing functions for refinement. The transcription process is not just about converting speech to text; it's the first step toward a comprehensive understanding of customer interactions.

The successful implementation of Amazon Transcribe within a call center's workflow promises not just a record of what was said but a gateway to deeper insights into the voice of the customer.

Check out my website here.

Related Articles

Inter-Region WireGuard VPN in AWS

Read more

Making PDFs Searchable Using AWS Textract and CloudSearch

Read more

Slack AI Bot with AWS Bedrock Part 2

Read more

Contact Us

Achieve a competitive advantage through BSC data analytics and cloud solutions.

Contact Us