PR in Media

“How can communication be improved?” Taken out of context, this question might relate to any number of PR disciplines (and subjects of past and future articles published in the IAPR Newsletter!) from improving communication between advertisers and consumers (see “Pattern Recognition in the Media: Data Mining”, April 2010) to improving communication between individuals (see Larry O’Gorman’s article “Has the time for telepresence finally come?”, April 2010) to improving communication between a prosthetic and the brain (brain-computer interface is a planned topic for an article in this series).

These examples illustrate an increasingly narrow focus on the question of communication. This article presents three examples of systems designed to improve communication. The first builds speech from text using recorded phonemes, the second creates speech from facial muscle movement, and the last turns speech into text.

Text-to-speech

Text-to-speech is a great feature of navigation systems, and I am amazed (and sometimes amused) by the quality of the synthesized speech. My navigation system has a bit of trouble with the letter “r”, but other than that does quite well. Not, however, nearly as well as what I heard last year in an interview with Matthew Aylett, Chief Technical Officer and co-founder of CereProc, a Scottish company started in 2005. The company has created voices that, according to its home page, sound real and have character. Their focus is on speech synthesis that is vibrant and contains emotion.

Scott Siegel of National Public Radio (NPR) interviewed Aylett the day before film critic Roger Ebert was scheduled to appear on the Oprah Winfrey Show to unveil the voice created for him by CereProc. According to Aylett, having a prominent figure for their voice creation services has helped improve their technology. CereProc will use a combination of unit selection (breaking recordings into individual “phones” and then putting them back together based upon the text entered) and the Hidden Markov Model Speech Synthesis System (which “creates a statistical model of captured sounds over time and then inverts it to produce speech” [1]) to help the synthesized Ebert sound like the real Ebert once did.

Of course, the ability to create a natural sounding voice depends upon the availability of high quality recordings. For their standard Voice Creation/Voice Selection services, CereProc uses scripted studio recordings. The quality of the “Voice Creation from Your Speech” of course depends upon the quality of the recordings. In the case of Roger Ebert, the challenge has been the inconsistency in the quality of the available recordings, which were done in different studio environments and using different microphones.

Speech from muscle movements

I heard about this next technology while listening to a morning news program. Through the life experiences of a man who lost his voice to cancer 16 years ago, listeners were taken through the technological developments in the area of soundless speech: from white board and pen to the electrolarynx (a small metal device that creates a robotic voice when held to the throat while speaking) to a voice prosthesis (a surgically implanted valve in the throat that creates a raspy, breathy voice).

Listeners were then introduced to a new technology that had been presented in February, 2011, at the American Association for the Advancement of Science (AAAS) fair in Washington, D.C. Michael Wand of the Cognitive Systems Laboratory at the Karlsruhe Institute of Technology in Germany discussed and demonstrated the Silent Speech Recognition Program, which is based on his Ph.D. thesis research.

“The technology is based on Electromyography, i.e. the capturing and recording of electrical potentials that arise from muscle activity.” The electrical potentials are currently captured through electrodes applied to the skin. While the system currently recognizes about 2,000 words with a 90% accuracy rate, it can get confused by non-speech facial movements such as laughing and frowning. In future, the electrodes and software will be incorporated into a mobile phone held to the side of the face.

Text-from-speech

While investigating the topics of this article further, I also came upon a blog entry in PCWorld that discussed BMW’s announcement that they would have a new dictation option available in about three years. Currently, some car manufacturers, Ford and BMW among them, have text-to-speech systems (Ford and BMW both use Nuance technology) that allow drivers to “stay connected” while driving by reading aloud through the car’s audio system emails, text messages, status updates, and news feeds. That’s half of the communication: incoming messages. Earlier this year, BMW announced that it would enable outgoing messages to complete the communication by offering a voice-to-text option (available within three years) to enable those in the car to dictate email messages.

While many cars and other devices have voice-activation options, these use limited, special-purpose vocabularies . The new technology announced by BMW will use speech-recognition algorithms with a database of over a million words and will include voice commands for editing functions. The focus is on staying safe while staying connected.