Track 3

Invited Talk1

Click here for Top of Page
Right Arrow: Next
Right Arrow: Previous

Computers in the Human Interaction Loop


By Alexander Waibel

(Carnegie Mellon University; University of Karlsruhe)

Review by:

Himanshu Vajaria

University of South Florida


In today’s electronic world, most of us are well connected via computers, cell phones and PDAs. Yet the quality of our communication—with other humans and with our own electronic devices—leaves much to be desired. For example, consider our irritation when someone’s cell phone rings in the middle of an important meeting. Wouldn’t it be nice if our devices were more aware of our surroundings and could modify their behavior accordingly?


The CHIL project aims to facilitate just that:  to put “Computers in the Human Interaction Loop”. Professor Alexander Waibel, the coordinator of CHIL described ongoing efforts for automatic analysis of human behavior. Characterizing human behavior and understanding the social context involves solving many sub-problems, such as determining identities of the people involved; determining who is speaking to whom and what they are saying; and analyzing non-verbal communication, such as pointing gestures, raising of hands, etc.


Dr. Waibel suggests that these problems can be solved by the effective integration of various perceptual technologies. Face and speaker recognition are used to determine a person’s identity. Source localization and gaze detection help identify the current speaker and the intended audience. Speech recognition enables topical classification of meetings Acoustic event classification determines the current social environment of the participants— lecture/meeting etc. Emotion and activity recognition analyze verbal and non-verbal communication.


Dr. Waibel also pointed out that, in addition to integration, more research is needed to make the basic technologies more robust for real scenarios. Real world scenarios require placing sensors at a distance, and this poses problems for both audio and video algorithms. Speech recognition suffers due to noise from cross talk and because, by its very nature, conversational speech is harder to recognize than real speech. Similarly problems are faced in video processing—face recognition suffers because of non-frontal poses and uncontrolled illumination conditions.


The CHIL project aims to provide a platform to help various researchers collaborate in a competitive environment to solve such problems. By conducting technological evaluations using a large database, standardized metrics, and benchmark performances, it intends to provide researchers valuable feedback about their algorithms. Currently, CHIL has 15 partners from 9 countries and has now joined forces with NIST (National Institute of Science and Technology) from the USA to hold joint workshops. More on this research can be found at http://chil.server.de