To the and full Year earningsĬonference call. At this time, I would like to welcome everyone Below is an excerpt from one of the recordings, and below are links to the full transcriptions. With PII Detection and Redaction enabled, we ran the above earning calls through our API to generate a redacted transcript for each recording. The full list of information our PII Detection and Redaction feature can detect can be found below: You can learn more about how this works in the AssemblyAI Documentation. AssemblyAI offers PII Detection and Redaction for both transcripts and audio files ran through our API. Other sensitive information like birth dates and medical info can also be stored in recordings and transcripts. Phone call recordings and transcripts often contain sensitive customer information like credit card numbers, addresses, and phone numbers. Normalized truth -> hi my name is bob i am seventy two years old Personally Identifiable Information (PII) Redaction To perform the most accurate comparison, all punctuation and casing is removed, and numbers are converted to the same format.įor example: truth -> Hi my name is Bob I am 72 years old. The WER compares the automatically generated transcription to the human transcription, for each file in our dataset, counting the number of insertions, deletions, and substitutions made by the automatic system (Google, AWS, etc) in order to calculate the WER.īefore calculating the WER for a particular file, both the truth (human transcriptions) and the automated transcriptions (predictions) must be normalized into the same format. Word Error Rate (WER) is the industry-standard for calculating the accuracy of an Automatic Speech Recognition system. This helps to highlight the key differences between the human transcripts and the automated transcripts. Each result is hyperlinked to a diff of the human transcript versus each API's automatically generated transcript. Finally, we compare the API's transcription with our human transcription to calculate Word Error Rate (WER)-more below.īelow, we outline the accuracy score that each transcription API achieved on each audio file.Second, we transcribe the files in our dataset by human transcriptionists-to approximately 100% accuracy.First, we transcribe the files in our dataset automatically through APIs (AssemblyAI, Google, and AWS).Here is more about our dataset below: How We Calculate Accuracy Chosen at random, our intention is to provide you with a healthy sample size of audio transcription performance. We included earning call recordings for 5 major companies, Twilio, Facebook, Apple, Microsoft, and MongoDB. This report is meant to serve as a point of reference to compare the best Automated Speech Recognition and Conversational Intelligence solutions in the market for Telephony Platforms. In addition to reviewing Speech Recognition accuracy, our research team reviewed the results of AssemblyAI's unique Personal Identifiable Information (PII) Redaction, Topic Recognition, Keyword Detection, and Content Safety features on these call recordings. In this report, we look at 5 different earning calls from various companies (shown in more detail below), and review how accurately AssemblyAI, AWS Transcribe, and Google Speech-to-Text are able to automatically transcribe these recordings. Below are a few examples:Īccuracy of a Speech-to-Text system is critical in order for these telephony platforms to build high quality features that users and customers love. Product managers and developers at telephony companies are consistently leveraging automatic speech recognition (ASR) to power core features in their products and platforms.įor example, telephony platforms like Convirza, CallRail, TalkRoute, and WhatConverts offer their customers industry-leading solutions using Speech-to-Text.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |