This week I continued my research on text-to-speech (tts), speech-to-text, and Predictive Dialing.
Text-to-Speech
Text-to-speech (TTS) is a type of speech synthesis application that is used to create a spoken sound version of the text in a computer document, such as a help file or a Web page. TTS can enable the reading of computer display information for the visually challenged person, or may simply be used to augment the reading of a text message. Current TTS applications include voice-enabled e-mail and spoken prompts in voice response systems. Below is one of the example of tts:
Speech Recognition/ Speech-to-text
Speech recognition (also known as automatic speech recognition or computer speech recognition) converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software. Recognizing the speaker can simplify the task of translating speech. Speech recognition applications include voice dialing (e.g., "Call home"), call routing (e.g., "I would like to make a collect call"), dogmatic appliance control, search (e.g., find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g., a radiology report), speech-to-text processing (e.g., word processors or emails), and aircraft (usually termed Direct Voice Input).
How it works?
Speech recognition enables the operating system to convert spoken words to written text. An internal driver, called a speech recognition engine, recognizes words and converts them to text. The speech recognition engine may be installed with the operating system or at a later time with other software. During the installation process, speech-enabled packages such as word processors and web browsers, may install their own engines or they may use existing engines. Additional engines are also available through third-party manufacturers. These engines often use a certain jargon or vocabulary; for example, they may use a vocabulary that specializes in medical or legal terminology. They can also use different voices allowing for regional accents such as British English, or use a different language altogether such as German, French, or Russian. We need a microphone or some other sound input device to receive the sound. In general, the microphone should be a high quality device with noise filters built in. The speech recognition rate is directly related to the quality of the input. The recognition rate is significantly lower or may be unacceptable if you use a poor microphone. The Microsoft Speech Recognition Training Wizard (Voice Training Wizard) guides you through the process, recommends the best position to place the microphone, and allows you to test it for optimal results.
Predictive Dialing
Predictive dialing uses a computer-based system that automatically dials groups of telephone numbers, and then passes calls to available operators or agents in a calling center once the calls are connected. The most common use of predictive dialing is in call centers which make large amounts of calls, such as those run by telemarketing companies. Predictive dialing was introduced for the purpose of increasing efficiency within calling centers. Prior to its development, most call centers used devices known as auto dialers, which were merely computers equipped with telephony boards that could dial a number without a caller having to manually enter it on a keypad. Predictive dialing is far more advanced than using an auto dialer because it monitors calls made to see how they are answered. If the call goes unanswered, is met with a busy signal or answering machine, or reaches a fax machine, the predictive dialer immediately ends the call. Only calls that are answered by a live person are put through to an operator. Therefore, productivity is increased because callers do not have to listen to unanswered calls or wait for someone to pick up.