A Guide: Evaluating Conversation Analytics

A Guide:
Evaluating
conversation
analytics
Cutting through the jargon to help you make
sure your speech analytics solution of choice
delivers a return on investment.
May 2017
A Guide: Evaluating Conversation Analytics
Finding the right solution

Whilst speech analytics technologies have been available for
many years, it is only recently that products have evolved to
the point where they can demonstrate significant return on
investment (ROI).
This new era of products encompasses a breadth of maturing technologies that seek
to automate and assist the human review of telephony and face-to-face recording of
customer interactions, whilst providing deep insight into what occurred during the
conversations.
With the array of features and functions available, often described in product specific
jargon, it can be challenging to compare solutions and find the right one for your
business.
The most effective analytics products are those that are highly customised to your
requirements. If you are looking for a plug-in-and-go solution that provides anything
beyond capturing and storing of audio you will not find a solution that meets your needs
nor delivers ROI.
This guide has been produced to help financial services firms navigate the variety of
options available across analyytics vendors to find a solution that provides you with
the essential business benefits you seek.
Sparked your interest?
Find out how Recordsure can help you.

Call +44 (0)203 772 7272 or email info@recordsure.com
2
Features or benefits?
Choosing the right conversation analytics product should
be driven by the needs of your company, considering how
the product performs in meeting your business objectives.
You should remain focused on the product’s effect on your
business process and the return on investment that can be
achieved, rather than the features available within different
products .
When comparing, and ultimately choosing, conversation analytics products you

should:
1. Look beyond the traditional market: Don’t limit your search for an appropriate
vendor to just those within the traditional speech analytics market. There may be
newer vendors out there that have a different approach and solution to your
business objectives.
2. Open your mind: Avoid entering conversations with vendors with a pre-defined
checklist of functionality. Allow each vendor to showcase the breadth of
their product’s capabilities. You may discover functionality or approaches you
didn’t know were in existence.
3. Ignore the jargon: Evaluate against the business outcomes that you require,
rather than being influenced too much by any one vendor’s jargon or specific
features.
4. Prove its worth: Ensure there are early test-points in the process that pit your
shortlisted vendors head-to-head against each other. Make them prove what
they can achieve against your business processes and requirements, using your
audio records, before you commit to significant spend or to vendor selection.
5. Seek continuous improvement: Make sure you take into account changing
technology and your evolving needs. Building continuous improvement into the
vendor contract will ensure the product maintains its viability and enables
sustained efficiencies throughout the contract term.
Now we’ve established how you should go about evaluating the market, in the
following pages we take a deep-dive into three key areas that play an essential role in
enabling you to determine which product will be most effective and appropriate for
your business:
–– Step 1: How important is transcription to you?
–– Step 2: Can you easily navigate the audio you want?
–– Step 3: Does it meet your business objectives effectively?
3
Step 1:
How important is
transcription to you?
It’s easy to skip over the automated transcript element of a
product as it’s often presumed that it’s generic across vendors.
In reality it is a fundamental to how the product will perform.
There are two key transcription options available: Machine generated, technically
known as Automatic Speech Recognition (ASR) is quick however often the accuracy
has much to be desired. Human generated, on the other hand provides a far more
accurate account of what was discussed during the conversation, although naturally it
is much more expensive.
A transcript is the
written version of
the words spoken A machine generated transcript will never be as good as a human transcript but it’s
by each party in important to consider how important accuracy is to the outcome you desire. For
the audio record. example, if you simply require a record of the conversation for audit-trail purposes,
ASR may be adequate. However, if you wish to use it to spot keywords in the transcript
(or spot what’s been missed), detection will only be as good as the words correctly
identified by the ASR. Therefore the key to ascertaining which method is most
appropriate for you is focusing on what you/the product will use the transcript for.
A keyword search across records within an ASR transcript will only be as good as the
words that were correctly identified. Automated business rules that trigger or rank
records based on spotting keywords in the ASR transcript, or upon spotting that they
have been missed, will again only be as good as the words correctly identified by the
ASR. If you desired to use the transcription in the review process to help your quality
and compliance teams rapidly navigate to the sections of the audio they might be
interested in, then the human readability of the generated transcript is all important.
Recognising that human transcription is expensive and often too rigorous for most
firms’ requirements, ASR is usually the only viable option. However, in order to
ensure high quality ASR, during the install phase, vendors will initially create and use
human transcription to ‘train’ their systems to each firms’ needs. The quality of this
training, and the methods by which the trained ASR is then applied to each business
requirement, is key to realising the required benefits and ROI.
What should you do?

We recommend that you understand what transcription quality means both to your
business and also to the performance of other capabilities in each vendors’ product.
Then, test for yourself the transcription output of each potential solution, trained
against your audio, early in the evaluation process.
4
Step 2:
Can you easily navigate
to the audio you want?
When it comes to the time to listen to an audio record, you
want to do it quickly and efficiently. Again, the functions of
vendors’ products that will be most advantageous will be
based on your particular requirements.
Time, and therefore cost efficiencies, are introduced when a vendor’s product
enables you to efficiently navigate to just those sections of the audio that contain the
information you are looking for.
Understanding which topics or aspects of the audio your teams would typically need
You will need to put to review at each stage of the sales cycle for each type of review will enable you to
enough trust in the determine whether the products can fulfil your requirements. For example:
quality of the output to
be satisfied that nothing
–– Compliance review personnel will want to navigate to audio sections where key
has been missed out,
product and other mandatory disclosures were given, where risk appetite was
so you aren’t hunting
around in the audio assessed and where each product’s individual benefits, charges and risks are
yourself. explained.
–– Customer insight teams will want to navigate to discussion around previous

products purchased, and the customer’s current needs and objectives.
–– Sales training teams will want to listen not just for the formal T&C areas of
discussion, but also at areas of more social interaction and rapport building.
Given your breadth of products, channels and review types this is an awful lot of
‘knowledge’ that the vendor’s product needs to have in order to reliably navigate
each user to the audio section they require, so it is key that you understand how the
vendor’s product is ‘taught’ this knowledge, how it applies it and how reliably it can
use this learning to make your users more efficient.
What should you do?

As part of your evaluation of each vendor you should consider how each vendor
teaches their system, what is important to your varied review types and how this is
applied to each type of user review.
Consider whether each vendor can look for different topics/classifications without
knowing from telephony metadata what the call type or product line is that is being
discussed. Test for yourself the ability for each vendor to find the topics relevant to
your business, in your audio, early in the evaluation process.
5
Step 3:
Does it meet business
objectives effectively?
Many conversation analytics projects do not deliver on their
expected ROI due to the complex nature of building and
maintaining intricate business rules. It is therefore fundamental
that you evaluate the performance of each product in meeting
your business objectives and gain clarity on the roles and
responsibilities for product configuration or training.
You likely want your analytics product to prioritise the audio records that might
require review. In order to do this every vendor has a different approach to defining
and configuring business rules, flagging or prioritising audio for human review and for
producing resulting management information.
Different vendors use
different language for There are four key questions to consider in this area when evaluating which approach,
the relevant sections and therefore product, best suits your needs:
in each audio record
and what is discussed 1. How accurately can each product find the things you need to review (the ‘True
in each, so look out for Positives’) versus how much time can you afford to spend reviewing audio that the
‘Themes’, ‘Topics’ and product has incorrectly flagged for review (the ‘False Positives’)?
‘Classifications’.
2. Can the products authoritatively triage out audio records that are insignificant or
irrelevant for your review?
3. How does each product learn the myriad of business rules that will need to be
automated in the product and what upkeep will be required throughout the
contract duration to adapt to product/process changes and evolving regulation?
4. Whose job should it be to maintain the complex business rules set in the product?
Do you want to learn the complexities of algorithms, configuration and machine
learning behaviours, or would you rather your vendor learns about your business,
taking higher-level instruction as to the rules that need to be achieved?
What should you do?

Understand each vendors’ approach to identifying and teaching the business rules
required for each of your department’s review needs and to firmly establish the
complexity and roles in building and maintaining these rules. Evaluate each product
to ensure that there is high accuracy around the True Positives and that your review
teams are willing to accept the level of False Positives that would be projected in the
product outputs.
6
What should you do next?

Once you’ve explored the wider market and created your
shortlist of products to evaluate, you should be asking vendors
for a proof of concept.
Conversation analytics and related technologies live or die by the quality of their
output. No amount of words or canned demonstration will tell you how well any
product is actually going to perform for your business, so it is essential you test this
before agreeing commercials and selecting a vendor.
Different organisations
Such a Proof-of-Concept (PoC) evaluation can be difficult for financial services
have different names
for the PoC, with organisations to arrange, given both the work on the part of the business and the
some buyers asking information security considerations around working with your own data. However,
to establish a ‘model the risk of not pitting vendors head-to-head from a qualitative perspective can
office’ for each potential unfortunately lead to downstream failures in achieving the required ROIs.
vendor, and others
preferring the ‘hot- Follow the three key points below to ensure you get the most of a PoC evaluation:
house’ concept that
makes the competition
1. Provide some sample data from your organisation for each vendor to teach their
between vendors more
systems about your business and audio records (the training records – perhaps 100
visible.
audio files).
2. State the high level goals for outputs from the PoC (e.g. “We want to see your
transcription/topic matching/rules accuracy on our data”), and allow each vendor
some time, perhaps a week, to teach their systems using their own unique methods,
using the training record set you have provided.
3. Provide ‘evaluation records’ (perhaps another 100 audio records) that each vendor
must pass through their system without further product training or any human
manipulation, in order that you can examine the outputs, and see for yourself what
each vendor is actually capable of in your real world.
Given this is your real data, and given there may still be some information security
evaluations to complete, it is common at this stage to provide redacted audio records
for the PoC, with all organisation and client names, as well as other PII removed.
Key take-aways
If you’ve followed the guidance outlined in this document you’ll have a much better
chance of finding a deep conversation analytics solution that will continuously deliver
the desired outcomes. This guide is only brief however and no doubt there will be
additional requirements that are crucial to the success of the product you choose.
The key thing to take away from this guide is to focus on the performance of the
product in meeting your business objectives rather than the functionality of the tool.
Only by testing it with your audio files and against your business rules will you prove
that this significant investment you are about to make will do what the vendors claim.
7
Why we created this guide

Every day we speak to organisations that have regrets about
their choice of speech or conversation analytics product.
Commonly, dissatisfaction stems from their perception that
although the product seemed fantastic on paper, it didn’t quite
live up to expectations.
In our experience, the root cause of this disconnect tends to lie in a firm’s inclination
to compare individual product features against each other and focus on how they can
use the range of product features. Although this may be an appropriate approach for
other technology products, for conversation analytics firms need to focus on how the
products can help them achieve specific business objectives.
Firms that focus on the benefits specific to their business find that the product better
meets their needs when implemented and, most importantly, remains effective for the
duration of its usage. This is why, at Recordsure, we always seek to understand first of
all the needs of the company and its individual departments that will utilise our deep
conversation analytics solution.
Similarly, we find that running both a proof-of-concept and pilot of Recordsure using
client audio provides firms with assurance that our solution works for them.
Tendering can be expensive and arduous, so we hope this guide will help to streamline
the process and ultimately result in a better outcome for both the firm and their
vendor of choice.
About Recordsure
We turn business conversations into powerful insights.
Recordsure provides an unparalleled ability to record, review and analyse the

conversations that matter.
Our conversation recording and analysis solutions give your organisation uniquely
powerful ways to capture, classify, and assure the quality of all customer interactions.
Recordsure enables you to monitor business conversations across all channels more
efficiently, with more control ultimately resulting in lower operational costs and
commercially actionable insights.

8
Cutting through the

jargon
Our glossary sets out some key industry terms and guidance
on how to interpret them from a business perspective.
Accuracy: There are many measures that try to provide some indication of ‘accuracy’
and typically vendors are asked to provide some sort of percentage score. What really
matters is: “Does the analytics solution find the majority of the things I need, without
lots of incorrect findings?”. To remove vendor specific jargon and potentially suspect
‘accuracy’ measures, describe your business objectives, then relate them to the four
key principles of artificial intelligence / analytics solutions, “True Positives” and “True
Negatives” (correct findings) versus “False Positives” and “False Negatives” (incorrect
findings).
Acoustic Model: A key element of any speech analytics solution, this is the element
of the system that identifies language sounds in the audio stream. Vendors will need
to tune their acoustic models to work with your audio. For telephony, it will depend
on your call recorder, the audio compression used for both the calls and storage,
the breadth of accents, background noise, and even the quality of the headsets or
telephone devices that you use. For face-to-face recording, the acoustic model will
need to work with your microphones and the acoustics of the recording environments.
Artificial Intelligence: A fashionable term that means ‘Machine Learning’, see below.
Automatic Speech Recognition (ASR): ASR is the computerised process of creating

a transcript from spoken audio and is the most widely used application of Speech
Analytics. Some vendors will use the more technical abbreviation, LVCSR (Large
Vocabulary Continuous Speech Recognition) but both of these words mean the same
thing. High quality ASR is fundamental to the performance of any solution, however
quality comes more from the method and skills of the person training the machine
learning system.
Diarisation: Also known as ‘speaker segmentation’ or (incorrectly) ‘speaker separation’,

this is the ability of the analytics solution to use acoustic analysis techniques to
identify different speakers in the audio.
False Negatives: Things the system identified as being not present, when they were.
Understand the business impact when the analytics system doesn’t find the things you
need to find in the audio, and perhaps raises alerts that required dialogue is missing,
when actually the dialogue is present.
False Positives: Things the system identified as being present, when they weren’t.
Understand the impact on your business when the analytics system informs you that
key conversation is identified, or rules are broken, when actually they were not.
Language Model: Along with the Acoustic Model, the Language Model (or more
correctly, the Statistical Language Model) is the part of a speech analytics solution
that understands what words most commonly follow others. It is essential, for high
9
quality ASR, that your vendor trains a language model against real conversations that
your advisers hold with your customers, otherwise the output will be inaccurate.
Lattice: Internally, most ASR systems will produce hypotheses of all the things that
might have been said throughout the conversation, called a lattice of word candidates.
It then produces a ‘best path’ of the most likely words through this Lattice, based on
its learned Language Model, in order to produce a transcript. Some analytics vendors
use the Lattice for other purposes, such as evaluating your business rules against what
might have been said, just in case the chosen most-likely candidate words in the ASR
output are incorrect.
LVCSR: See ASR
Machine Learning: The process of teaching a machine to replicate a human task. The
important example here is, of course, ASR. ASR systems are provided with human
created transcripts, along with the accompanying audio as their training data, in order
to try and transcribe future audio recordings for themselves. This is called ‘Training the
ASR’. It’s not fashionable to say so, but ASR is one of the most important examples of
AI.
Natural Language Processing (NLP): An area of language science intended to extract

meaning from text. This text might be a conversation transcript, however more
commonly, NLP (not to be confused with Neuro-Linguistic Programming) is focused
on extracting the meaning from news articles or social media. With conversation
transcripts there’s additional complexity as the meaning that is ‘built-up’ across
speaker-turns in spoken conversation also needs extracting.
Role Labelling: More advanced than Diarization, Role Labelling is the capability of
an analytics solution to automatically attach a role to each spoken utterance, such
as ‘Customer’ or ‘Adviser’. This makes it much easier to understand who said what
when reading a transcript and so the analytics solution can match your business
rules against each utterance. It’s important to know whether it was the Customer or
the Adviser that said “I want to invest….”, “I’m not happy with that…” or “I don’t think
interest rates will change…”!
Speech Analytics: A catch-all term for analysing and interpreting speech. In this
guide, we primarily use it to mean ASR, however other disciplines include diarisation,
biometric identification or verification, phonetic search, grammar-based recognition
(voice control or capture), emotion detection and even health analysis based on the
acoustics of your voice, in its use.
Training: In the context of machine learning, training has a specialised meaning. It’s
the process of teaching a computer to replicate human activities on a ‘watch and learn’
basis. Training a system takes both specialised technology and skills, and depending on
the machine learning / analytics domain, those skills are most typically found amongst
experienced speech scientists, language scientists and data scientists.
True Negatives: Things the analytics system correctly identified as being missing,
perhaps leading to a failed business rule. For things like compliance checking, the
absence of something important is just as important as the presence of something
important, so confident analytics output here is key.
10
True Positives: Things the analytics system found were present, when they truly were.
You want a lot of these!
Word Confusion Network: Sometimes abbreviated to either just ‘Word Network’ or

‘Confusion Network’, a Word Confusion Network it is an alternate way of representing
a Lattice which is a little easier for humans to interpret, and is also slightly more
efficient when it comes to storage. As with the word Lattice, file this under the vendor-
jargon category, unless the vendor is using the Lattice/WCN for anything more than
creating a transcription, such as the automated matching of business rules against
what might have been said.

11
Recordsure Ltd
6th Floor
10 Lower Thames Street
London EC3R 6EN
T +44 (0)203 772 7272

E info@recordsure.com
W www.recordsure.com

A Guide: Evaluating Conversation Analytics

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

A Guide: Evaluating Conversation Analytics

Transféré par

Droits d'auteur :

Formats disponibles

A Guide:

Finding the right solution

Sparked your interest?

Find out how Recordsure can help you.

When comparing, and ultimately choosing, conversation analytics products you

–– Step 1: How important is transcription to you?

–– Step 2: Can you easily navigate the audio you want?

–– Step 3: Does it meet your business objectives effectively?

What should you do?

–– Customer insight teams will want to navigate to discussion around previous

What should you do?

What should you do?

What should you do next?

Why we created this guide

We turn business conversations into powerful insights.

Recordsure provides an unparalleled ability to record, review and analyse the

Sparked your interest?

Find out how Recordsure can help you.

Cutting through the

Automatic Speech Recognition (ASR): ASR is the computerised process of creating

Diarisation: Also known as ‘speaker segmentation’ or (incorrectly) ‘speaker separation’,

LVCSR: See ASR

Natural Language Processing (NLP): An area of language science intended to extract

Word Confusion Network: Sometimes abbreviated to either just ‘Word Network’ or

Sparked your interest?

Find out how Recordsure can help you.

T +44 (0)203 772 7272

Vous aimerez peut-être aussi