Reports & Publications

8x8 CX Platform AI Transcription Accuracy vs. Dialpad, & RingCentral

Sponsor: 8x8
8x8 CX Platform AI Transcription Accuracy vs. Dialpad, & RingCentral

Abstract

8x8 commissioned Tolly to benchmark the effectiveness of its integrated AI transcription feature. 8x8’s CX Platform is presented in this Tolly evaluation as a business communications environment with integrated AI transcription designed to support meetings, customer interactions, and other spoken-word workflows where caption accuracy matters. The report compares 8x8 directly against Dialpad and RingCentral, focusing on English-language transcription accuracy across a diverse set of real conversational scenarios rather than synthetic keyword tests. Fifteen audio samples were used, each run four times, with speakers covering a broad range of accents including British, American, Scottish, Welsh, Australian, Indian, Filipino, Irish, Canadian, New Zealand, Nigerian, and African.  


Tolly’s headline result is that 8x8 delivered the best overall transcription accuracy among the three platforms. Using the lowest word error rate achieved per sample as the summary metric, 8x8 produced an average best-case WER of 3.43%, compared with 8.03% for Dialpad and 8.08% for RingCentral. Tolly notes that lower numbers are better, with 0% representing perfect transcription. In practical terms, this meant 8x8’s error rate was less than half that of either competitor in the best-score comparison.  


The report also analyzes average results across all four runs per sample, which helps account for variation from run to run. On that basis, 8x8 again led the group with an average WER of 4.54%, versus 8.53% for Dialpad and 9.20% for RingCentral. Results varied by accent and scenario, and Tolly observed that Scottish and Welsh accents were among the most difficult for all three solutions. Even so, 8x8 generally maintained a lower error rate across the 15 test scenarios, including technical support, account management, outage updates, pricing discussions, and appointment scheduling.  


Methodologically, the test used text-to-speech generated audio files between three and seven minutes in length, typically with two speakers and one sample with three speakers. Audio was injected into each platform using Rogue Amoeba’s Loopback utility, and the resulting transcripts were scored with a Python program built around the open-source jiwer module, which calculates word error rate based on substitutions, deletions, and insertions. Tolly notes one operational distinction: 8x8 transcription became available about 50 seconds after the conversation ended, while the competing platforms produced transcription roughly one second after words were spoken. Overall, the report positions 8x8 as the most accurate of the three tested platforms for English-language AI transcription across varied accents and common business conversation types.