“Creating a Computer Voice that People Like,” by John Markoff, New York Times (Feb. 14, 2016):
It is not yet possible to create a computerized voice that is indistinguishable from a human one for anything longer than short phrases that might be used for weather forecasts or communicating driving directions….
Beyond correct pronunciation, there is the even larger challenge of correctly placing human qualities like inflection and emotion into speech. Linguists call this “prosody,” the ability to add correct stress, intonation or sentiment to spoken language….
For those like the developers at ToyTalk who design entertainment characters, errors may not be fatal, since the goal is to entertain or even to make their audience laugh. However, for programs that are intended to collaborate with humans in commercial situations or to become companions, the challenges are more subtle.
These designers often say they do not want to try to fool the humans that the machines are communicating with, but they still want to create a humanlike relationship between the user and the machine….
The researchers looked for a machine voice that was slow, steady and most importantly “pleasant.” And in the end, they, acting more as artists than engineers, fine-tuned the program. The voice they arrived at is clearly a computer, but it sounds optimistic, even a bit peppy….
Imperson, a software firm based in Israel that develops conversational characters for entertainment, is now considering going into politics. Imperson’s idea is that during a campaign, a politician would be able to deploy an avatar on a social media platform that could engage voters. A plausible-sounding Ted Cruz or Donald Trump could articulate the candidate’s positions on any possible subject.
“The audience wants to have an interactive conversation with a candidate,” said Eyal Pfeifel, co-founder and chief technology officer of Imperson. “People will understand, and there will be no uncanny-valley problem.”