In call center analysis, audio transcription into text plays a huge role. It provides a written record of customer interactions and serves as the foundation for further analysis like sentiment assessment and summarization. Amazon Transcribe is a powerful service within AWS that offers an automatic speech recognition (ASR) facility that can transcribe voice-to-text with high accuracy. This article shows how we jumped into the intricacies of leveraging Amazon Transcribe to revolutionize call center operations.
Check out the code repo here.
Getting Started with Amazon Transcribe
Integrating Amazon Transcribe with AWS services like S3 and Lambda enables automatic transcription initiation upon call recording uploads. The integration follows a trigger-based approach.
Automating Transcription with Lambda
When a new call recording is dropped into an S3 bucket, a Lambda function is triggered to start the transcription job. The function calls the StartTranscriptionJob API of Amazon Transcribe.
Lambda Function Snippet
for record in event['Records']: source_bucket_name = record['s3']['bucket']['name'] key = unquote_plus(record['s3']['object']['key']) file_uri = f's3://{source_bucket_name}/{key}' transcribe_job_name = f"Transcription-{datetime.now().strftime('%Y%m%dT%H%M%S')}" transcribe_client.start_transcription_job(TranscriptionJobName=transcribe_job_name, Media={'MediaFileUri': file_uri}, MediaFormat='mp3', LanguageCode='en-US', OutputBucketName=TRANSCRIBE_S3_BUCKET, Settings={'ShowSpeakerLabels': True, 'MaxSpeakerLabels': 2}) while True: status = transcribe_client.get_transcription_job(TranscriptionJobName=transcribe_job_name) if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']: break if status['TranscriptionJob']['TranscriptionJobStatus'] == 'COMPLETED': transcript_key = f"{transcribe_job_name}.json" transcript_response = s3_client.get_object(Bucket=TRANSCRIBE_S3_BUCKET, Key=transcript_key) transcript = json.loads(transcript_response['Body'].read().decode('utf-8')) full_text = process_transcript(transcript['results'], transcript['results']['speaker_labels']) summary_response = invoke_bedrock_model(full_text) summary = summary_response.get('completion', '')
Handling Different Languages
Transcribe can automatically handle language detection and transcription
Custom Vocabulary
You can create a custom vocabulary to guide the transcription process toward industry-specific terms or colloquialisms common in your call recordings.
Post-Processing of Transcripts
After transcription, the text often requires cleaning and formatting. Implementing a post-processing Lambda function enables us to refine transcripts before they are processed for sentiment analysis or stored.
We break it down by user, break it apart by time to get sentiment analysis, and also summarize it.
Lambda Post-Processing Example
def process_transcript(transcript, speaker_labels): dialogue_entries = [] last_speaker = None for segment in speaker_labels['segments']: speaker_label = segment['speaker_label'] speaker_name = {"spk_0": "Customer", "spk_1": "Agent"}[speaker_label] if last_speaker != speaker_label: if last_speaker is not None: dialogue_entries.append("\n") dialogue_entries.append(f"{speaker_name}:") last_speaker = speaker_label segment_dialogue = "" for item in segment['items']: word_info = next((word for word in transcript['items'] if 'start_time' in word and word['start_time'] == item['start_time']), None) if word_info and 'alternatives' in word_info and len(word_info['alternatives']) > 0: if segment_dialogue: segment_dialogue += " " segment_dialogue += word_info['alternatives'][0]['content'] if segment_dialogue: dialogue_entries.append(f" {segment_dialogue}") formatted_script = "".join(dialogue_entries) return formatted_script
Amazon Transcribe has reshaped how call centers approach the transcription of their audio records. Call centers can enhance their operational efficiency by utilizing AWS Lambda for automation, creating custom vocabularies for accuracy, and employing post-processing functions for refinement. The transcription process is not just about converting speech to text; it's the first step toward a comprehensive understanding of customer interactions.
The successful implementation of Amazon Transcribe within a call center's workflow promises not just a record of what was said but a gateway to deeper insights into the voice of the customer.
Check out my website here.