Tag Archives: speech recognition

Enthusiasm to learn is emotionally driven

Enthusiasm can be displayed in different ways and it can also be present in a learner, but they do not show any visible signs of being enthusiastic to learn, they are simply enjoying the learning and keen to learn.

My current research investigation focuses on how learners demonstrate their enthusiasm when interacting with a speech recognition interface. This includes both linguistic and non-linguistic features. The dataset I am using clearly demonstrates that the psychological state of learners impacts their enthusiasm, and therefore language output and capacity to engage in learning more than any other factor. While this came as a surprise, it aligns with motivation theory and learning which purports that positive emotional and hence psychological states favour learning, and a negative emotional state (anxiety, stress, depression) can adversely affect learning.

I’ve spent a lot of time with humanoid robots, speech recognition interfaces, and autonomous agents and despite their degree of humanness, there is something decidedly safe for me about interacting with a non-conscious being. Maybe that is why Weizenbaum’s research was so successful! The non-judgmental attributes of a machine make the user feel comfortable to interact, and therefore they get more out of the learning experience. This is something I am still investigating, but Buddy, the robot in the image above aims to understand the mood of the use, and then respond accordingly. So empathy is now going beyond human…

Speech synthesis, voice recognition and humanoid robots

Speech synthesis or the artificial production of human speech had been around long before daleks on Doctor Who. Apparently, the first speech-generating device was prototyped in the UK in 1960, in the shape of a sip and puff typewriter controller, the POSSUM. Wolfgang von Kempleton preceded all of this with a a speaking machine built in leather and wood that had great significance in the early study of phonetics. Today, text to speech computers and synthesisers are widely used by those with speech impediments to facilitate communication.

Speech to text systems became more prominent thanks to the IBM typewriter Tangora which held a remarkable 20,000-word vocabulary by the mid 1980s. Nowadays speech to text has advanced phenomenally with the Dragon Dictation iOS software being a highly favoured choice. Our world is increasingly becoming dominated by voice automation, from customer service choices by phone to personal assistants like Siri. Voice and speech recognition has been used for identification purposes by banks too since 2014.

I’m curious how these systems work, how they are programmed, what corpus is used and which accents are taken into consideration. Why, because robots fascinate me, and I wonder if it will be possible to “ humanize” digital voices to such an extent that humanoid robots will appear more human than ever because of their voice production and recognition capabilities. It seems like a far cry from the days of speak and spell the kids speech synthesizer of the 80s, but it is looking increasingly more probable as advances in AI develop.

Developments have gone as far as Hiroshi Ishiguro’s Gemonoid HI-1 Android Prototype Humanoid Robot. Hiroshi is a Roboticist at Osaka University Japan, who create a Germaoid robot in 2010 that is a life size replica of himself. He used silicone rubber, pneumatic actuators, powerful electronics, and hair from his own scalp.

Gemonoid is basically a doppelganger droid which is controlled by a motion-capture interface. It can imitate Ishiguro’s body and facial movements, and it can reproduce his voice in sync with his motion and posture. Ishiguro hopes to develop the robot’s human-like presence to such a degree that he could use it to teach classes remotely, lecturing from home  while the Germonoid interacts with his classes at Osaka Univerisity.

You can see a demonstration of Gemonoid here