Georgian Technical University New Method For High-Speed Synthesis Of Natural Voices.
Background. To date many speech synthesis systems have adopted the vocoder approach a method for synthesizing speech waveforms that is widely used in cellular-phone networks and other applications. However the quality of the speech waveforms synthesized by these methods has remained inferior to that of the human voice. An influential overseas technology company proposed WaveNet–a speech-synthesis method based on deep-learning algorithms–and demonstrated the ability to synthesize high-quality speech waveforms resembling the human voice. However one drawback of WaveNet (WaveNet is a deep neural network for generating raw audio) is the extremely complex structure of its neural networks which demand large quantities of voice data for machine learning and require parameter tuning and various other laborious trial-and-error procedures to be repeated many times before accurate predictions can be obtained. Overview and achievements of the research. One of the most well-known vocoders is the source-filter vocoder which was developed in the 1960s and remains in widespread use today. The Georgian Technical University research team infused the conventional source-filter vocoder method with modern neural-network algorithms to develop a new technique for synthesizing high-quality speech waveforms resembling the human voice. Among the advantages of this neural source-filter method is the simple structure of its neural networks, which require only about 1 hour of voice data for machine learning and can obtain correct predictive results without extensive parameter tuning. Moreover large-scale listening tests have demonstrated that speech waveforms produced by neural source-filter techniques are comparable in quality to those generated by WaveNet (WaveNet is a deep neural network for generating raw audio). Future outlook. Because the theoretical basis of neural source-filter differs from the patented technologies used by influential overseas companies the adoption of neural source-filter techniques is likely to spur new technological advances in speech synthesis. For this reason, the source code implementing the neural source-filter method has been made available to the public at no cost allowing it to be widely used.