Neural Voice

Neural Voice
Neural voice is human voice generated from machines using Artificial

Intelligence (AI). It sounds very natural and similar to human voice. It has been
built by training our machine learning (ML) software programs using pre-recorded
audio utterances of human beings. Neural voice in conjunctions with set of audio
processing tools will generate human like voice from any given input text
instantaneously.
Benefits of Neural Voice.

Neural voice solves the technical challenges of monotonous robotic voice
and opens up new audio markets. Find below some of the neural voice benefits.
● Reduces audio content creation effort.

● Can preserve the voice in best quality for future use. Takes away
health concerns in audio projects.
● Adds more popularity for the voice.
● Opens up new audio markets.
● Brings more revenue from different channels.
Neural Voice Build Process
We have streamlined the neural voice building process and made it simple as
follows.
1. Login into Larynx.ai voice recording platform

2. Read presented sentences from a quiet place (recommended to be from a
studio environment).
3. Larynx.ai Audio Engineer will analyze the audio content and will request
for re-recording of certain sections if required.
Larynx.ai 1
4. Analyzed utterance will be fed to our AI learning engine to build Neural
Voice models.
5. Generated Voice will be stored and deployed in Larynx.ai runtime engine
to generate audio for any given text.
6. Users will be able to access the voice model through Larynx.ai exposed
secure interfaces.
Recording Effort.
AI engines are taught with human utterances to mimic the voice. Our experiments
showed better quality with slow learning. We could build voice model with as little
as 500 sentences but require 10K utterances for professional quality English voice.
Number of training/learning data varies with different languages. AI engine learns
every aspect of the voice from the training data hence different context like news
reading, storytelling etc require utterances corresponds to that context.
We need best quality voice for professional projects which require 10K+ utterances
for training our AI engine. In an average 250 sentence can be read in an hour and
it requires almost 40 hours of effort to finish the whole recording effort.
In addition to audio we need a very short video footage of celebrity for marketing.
Sample Recording Content
Based on our research we came up with reading content that covers different
language aspects. Please find below some sample text.
Then came the night of the first falling-star. It was seen early in the morning rushing over
Winchester eastward, a line of flame, high in the atmosphere. Hundreds must have seen it, and
taken it for an ordinary falling-star. Albin described it as leaving a greenish streak behind it that
glowed for some seconds.
Now when he reached the foot of the hill, he turned again towards the sea, and he saw his ship
approaching the harbour, and upon her prow the mariners, the men of his own land.
The move comes as the world’s top smartphone maker seeks to maintain its lead in the
foldable phone and 5G phone markets, with rivals plotting a catch-up in the nascent, but
growing segments.
Larynx.ai 2
In addition we can train our model with previously recorded audio as well. It doesn't
guarantee a good quality but would reduce the recording effort. Our
recommendation is to record the sentences provided by Larynx.ai.
Build and Setup Cost
In addition to audio recording, we have efforts to train our engine, store the
generated neural voice securely and make it available for authorized access.
Recording Effort:
Studio Cost:
AI Training Effort:
GPU Cost - 4800 $
Engineering Cost : 1000$
Secure Storage Effort:

Highly Available, Secure Storage Infrastructure : 25$/Month.
Text Generation Cost:

GPU Server - 10$/Hour
Other AWS resource: 250$/Month (Shared cost will decrease upon increase
in model count.)
Audio Generation Audit:

Manual Audit effort: 1000$/Month (shared across 10 models)
Legal/Insurance effort:
Legal Consultation/protection: 100/month
Insurance protection: 10$/month
Usage:
elebrity voice can used in different audio projects that include but are not limited
C
to the following use cases.
● AudioBook production
● News paper Reading
● Hotel Kiosks/Self check-in
● Corporate or product brand voice.
● Cartoon Films.
Larynx.ai 3
● Video games.
● Smart Homes.
● Product Explainer Videos.
● Car Brand Audio.
● Etc.
Neural voice feels very natural and very close to human voice. It poses
additional risk of misuse in some use cases like audio projects. In such cases, we
take extra precautions to avoid misuse in every aspect of audio generation. We do
the following to protect the audio.
● We allow limited listening time during the build process. All preview audio
will be watermarked with distinguishable background music.
● We audit every user action and will submit for manual audit of the text
content before audio generation.
● Text content can be either audited by an authorized person assigned by
the voice owner. Larynx offer a paid service for manual audit that will allow
content generation based on voice owner’s preferences and restrictions.
● Approved content will be used for audio generation. It will be stored as
encrypted content at storage and will be audited.
● Generated audio will be quality checked by Larynx team before it’s target
delivery.
Revenue.
TODO: Discuss with John and update this section
Loyalty.
Neural voice is in very early stage of usage and has numerous avenues to use it.
Our services are classified into the following four categories based on different
payment plans.
● Yearly Corporate Contracts (News Paper, Hotels etc)

● Monthly/Daily Subscriptions (Mobile Apps/IoT Devices, etc)
● Audio content duration (product explainers, Books etc)
● License per device (Car, Robots etc)
Larynx.ai 4
Voice artist/owner can decide the different base price for each category of voice.
Larynx holds the full rights to decide the price of audio to its customers. We allow
following two options to the voice owner. Voice artist can select one of the
following
1. 80% of Revenue.
2. 40% of the Revenue.
80-20 Program
● Larynx will pay 80% of the revenue generated from neural voice to artists.
● Larynx will hold exclusive rights of neural voice for 7 years.
● Artist’s personal projects will be free.
● Voice artists would have more flexibility while negotiating corporate
contracts.
40-60 Program.
● Larynx.ai will pay 40% of the generated revenue from neural voice to
artist.
● No exclusivity contract with Larynx.ai.
● Artist has to pay to Larynx.ai for personal use as well. 40% of the payment
will be returned.
● Larynx will charge the production cost if artists is planning to remove the
voice from platform within the first 12 months of its creation.
Summary
Neural voice feels very natural and similar to human voice. It will remove the time
limitations from artists to make use of their voice for different audio projects and will
bring additional revenue opportunities. We have a very flexible and aspiring plans
to work with artists, customer and corporates to reap the benefits of audio and be
successful together.
Let neural voice speak for the future.
Larynx.ai 5

Neural Voice - Introduction

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Neural Voice - Introduction

Transféré par

Droits d'auteur :

Formats disponibles

Neural voice is human voice generated from machines using Artificial

Benefits of Neural Voice.

● Reduces audio content creation effort.

Neural Voice Build Process

1. Login into Larynx.ai voice recording platform

Sample Recording Content

Build and Setup Cost

Secure Storage Effort:

Text Generation Cost:

Audio Generation Audit:

TODO: Discuss with John and update this section

● Yearly Corporate Contracts (News Paper, Hotels etc)

Let neural voice speak for the future.

Vous aimerez peut-être aussi