IDEA #43QWFX Extracting speech from degraded signals by predicting the inputs to a speech vocoder.KEY WORDS: speech enhancement, noise robustness, speech synthesis.

Key Words: speech enhancement, noise robustness, speech synthesis While the problem of removing noise from speech has been studied for many years, it has focused on modifying the noisy speech to make it less noisy. Imperfections in this process lead to speech that is accidentally removed and noise that is accidentally not removed, both undesirable outcomes. Even if it worked perfectly, in order to remove the noise, some speech would have to be removed as well, in particular, that speech that perfectly overlaps with the noise (in time and frequency). Instead, the current invention proposes to utilize the noisy speech as a template from which to drive a speech synthesizer. Synthetic speech is quite high quality using current synthesizers, and it will by design contain no noise (as we are not adding any noise to it). Using such a system for speech enhancement will therefore produce high-quality speech output, higher than achievable with traditional approaches, as well as infinite noise suppression. This approach also makes the speech synthesis process easier, as it avoids the most difficult step in traditional text-to-speech (TTS) systems. This difficult step is planning the timing, loudness, and pitch of each instant in the utterance, but our system gets afl of this information from the noisy speech signal, so does not need to create it from scratch. The approach can be used in other speech enhancement applications as well, not just removing noise, but recovering from aggressive compression algorithms, packet loss, reverberation, and filtering (such as over the telephone system).
For more information or to license this innovation: