Tag Archives: AI

Speech synthesis, voice recognition and humanoid robots

Speech synthesis or the artificial production of human speech had been around long before daleks on Doctor Who. Apparently, the first speech-generating device was prototyped in the UK in 1960, in the shape of a sip and puff typewriter controller, the POSSUM. Wolfgang von Kempleton preceded all of this with a a speaking machine built in leather and wood that had great significance in the early study of phonetics. Today, text to speech computers and synthesisers are widely used by those with speech impediments to facilitate communication.

Speech to text systems became more prominent thanks to the IBM typewriter Tangora which held a remarkable 20,000-word vocabulary by the mid 1980s. Nowadays speech to text has advanced phenomenally with the Dragon Dictation iOS software being a highly favoured choice. Our world is increasingly becoming dominated by voice automation, from customer service choices by phone to personal assistants like Siri. Voice and speech recognition has been used for identification purposes by banks too since 2014.

I’m curious how these systems work, how they are programmed, what corpus is used and which accents are taken into consideration. Why, because robots fascinate me, and I wonder if it will be possible to “ humanize” digital voices to such an extent that humanoid robots will appear more human than ever because of their voice production and recognition capabilities. It seems like a far cry from the days of speak and spell the kids speech synthesizer of the 80s, but it is looking increasingly more probable as advances in AI develop.

Developments have gone as far as Hiroshi Ishiguro’s Gemonoid HI-1 Android Prototype Humanoid Robot. Hiroshi is a Roboticist at Osaka University Japan, who create a Germaoid robot in 2010 that is a life size replica of himself. He used silicone rubber, pneumatic actuators, powerful electronics, and hair from his own scalp.

Gemonoid is basically a doppelganger droid which is controlled by a motion-capture interface. It can imitate Ishiguro’s body and facial movements, and it can reproduce his voice in sync with his motion and posture. Ishiguro hopes to develop the robot’s human-like presence to such a degree that he could use it to teach classes remotely, lecturing from home  while the Germonoid interacts with his classes at Osaka Univerisity.

You can see a demonstration of Gemonoid here