Vulcan Protocol
  • About Vulcan Coalition
  • Executive Summary
  • Why Disabilities?
    • Neuroplasticity
  • Etna Lab: Research & Development
    • Vulcan Platforms
      • Value Platform
      • Unity Platform
      • Linkage Platform
      • Collab Platform
      • Academy Platform
    • Text-to-Speech AI
      • Introduction
      • Method
      • Result
  • Vulcan's AI Products
    • Vulcan Edio Book
      • Introduction
      • Solution
    • Home Automation AI
    • Depression Detection AI
      • Introduction
      • Solution
    • AI Fashion Model
  • Learning & Development
    • L&D Overview
    • PWDs' Recruitment System
    • Vulcan Academy
    • L&D Partners
    • Vulcan Data Decentralized Unit (VDDU)
      • Role of ‘VDDU’
      • VDDU Process
      • The largest Data Workforce in Thailand
  • Vulcan Business Model
    • Business Model
    • Use of Funds
  • Tokenomics
    • What is Vulcan token?
    • Token Allocation
    • Proof of Data Quality
      • Data Labeling Quality Control
      • On the Contribution Evaluation and Credit Distribution of a Data Labeling Framework
    • Inflationary Model
    • Vulcan Protocol Ecosystem
    • Demand/Supply Mechanism
    • Pegged Pricing Model (PPM)
    • Burning Mechanism
  • Fundamentals
    • Valuation Framework
Powered by GitBook
On this page
  • Listen is a New Read
  • Technical Scope of Works
  • Market Size
  • The Rise of Audio Books
  • Thailand Audio Book
  • Big Opportunity Ahead
  • Business Model
  • Sale of Audio Book
  • Subscription Fee (Future)
  • Competitive Advantages
  1. Vulcan's AI Products
  2. Vulcan Edio Book

Solution

PreviousIntroductionNextHome Automation AI

Last updated 3 years ago

Listen is a New Read

Audio book development using artificial intelligence is a revolutionary technology that can assist e-book publishers in lowering production costs and shortening e-book time to market by more than 90%. We created a Text-to-Speech AI model (TTS)With Thai natural language. In general, the quality of AI voice is pretty similar to that of human reading voice.

Our methodology is to train an AI model using millions of pairs of text and speech in order to teach AI how to synthesize voice in Thai. The AI will be trained to mimic the genuine Thai language voice as closely as feasible.

After we have finished training the TTS model, we will train her to read the script of our e-book, which is in machine-readable format (ePub, text). Then, using a unique technology, we will capture our AI's voice in the form of an audio book. This significantly reduces the production time of a one audio book from weeks to minutes.

The majority of the production costs are spent on training AI; if she is good enough to read, we will no longer require human voice actors for audio book creation.

Technical Scope of Works

Despite the advent of Vaja1^11 text to speech (TTS) in thai voice is not yet admissible in broad applications. Though it has demonstrated a remarkable quality for general speeches, it fails to address specific focuses in individual domains. Besides, its infrequent update does not correspond to the active field of deep learning, where now the state of the art title changes hands from hidden markov model2^22, LSTM3^33, Wavenet4^44, to Tacotron 25^55(as of 2019) in just over two years. This is the motivation that prompts us to look at this thai TTS problem again, despite many have claimed that it has been solved.

Perhaps what pulls its back was the inherent difference between thai and english voice. Wavenet, while highly remarkable in english, depends mostly on its acoustic features, something that requires well-trained linguists to extract. This complication in data preparation makes it hard to provide sufficient amount data to feed the ravenousness of these deep learning models, especially for thai voice.

Even in english, the issue persists. Things could be simpler if we can condition the model on the text directly. And this was the idea. For example, Deep voice 36^66, Char2Wav7^77, and Tacotron8^88 proposed models that could automatically extract acoustic features from text, without human intervention. And from them, we use standard acoustic algorithms to convert them back to wavesound. Like all good deep learning practice: “stage what you want and let the algorithm derives the features for you.” These algorithms were able to produce high quality sound, but the only problem was, they were not as good as Wavenet.

Market Size

The Rise of Audio Books

The average consumer has less time to consume information. When data transfer technology over the Internet became faster, "Listening" became popular as an alternative to "Reading." According to statistics, audio book sales have increased dramatically in recent years.

Thailand Audio Book

The number of audio book publishers in Thailand is fairly low. According to 2021 data, we only have roughly 1,000 audio books in our online store, which is less than 1% of all books sold in the market.

Big Opportunity Ahead

According to Statista, in 2016, 79,000 audio books were released in the United States, which has the most audio book listeners in the world, which is 34% more than general e-book published.

Business Model

Sale of Audio Book

Our audio book business model is fairly simple: we connect with book publishers in Thailand and produce audio books using our TTS model. We devised the following revenue sharing scheme: -

  • 20% to Vulcan

  • 80% to book publishers

Subscription Fee (Future)

We also designed an audio book revenue model in the form of a subscription model, which allows monthly subscribers to listen to our audio books unlimitedly. However, we must study the legal constraints on revenue sharing among different parties.

Competitive Advantages

  • Low production cost

  • Fast time to market

  • Social innovation

  • Advance audio book platform

References

3^33 Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.

4^44 Oord, A. V. D., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., ... & Kavukcuoglu, K. (2016). Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499.

5^55 Shen, J., Pang, R., Weiss, R. J., Schuster, M., Jaitly, N., Yang, Z., ... & Saurous, R. A. (2018, April). Natural tts synthesis by conditioning wavenet on mel spectrogram predictions. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4779-4783). IEEE.

6^66 Ping, W., Peng, K., Gibiansky, A., Arik, S. O., Kannan, A., Narang, S., ... & Miller, J. (2017). Deep voice 3: Scaling text-to-speech with convolutional sequence learning. arXiv preprint arXiv:1710.07654.

7^77 Sotelo, J., Mehri, S., Kumar, K., Santos, J. F., Kastner, K., Courville, A., & Bengio, Y. (2017). Char2wav: End-to-end speech synthesis.

8^88 Wang, Y., Skerry-Ryan, R. J., Stanton, D., Wu, Y., Weiss, R. J., Jaitly, N., ... & Le, Q. (2017). Tacotron: Towards end-to-end speech synthesis. arXiv preprint arXiv:1703.10135.

1^11

2^22​

Text-to-Speech AI
http://www.vajatts.com/overview
https://en.wikipedia.org/wiki/Hidden_Markov_model
Publisher's Total Consumer Audio Book Download Sales Income Chart