26TH International Conference on Pattern Recognition

August 21-25, 2022 • Montréal Québec

Keynote Abstracts

C. V. Jawahar
Professor at International Institute of Information Technology (IIIT), India. 

Towards Multimodality in Perception Tasks

Abstract: A number of perception tasks (especially in vision,
language and speech) are solved today with very high accuracy using
data-driven techniques. We are now seeing the emergence of a set
of more natural tasks that are inherently multimodal (eg. VQA).
They are closer to the way we interact with the world around us or
perceive our sensory inputs. As a result, today’s AI systems
(aka. deep learning architectures) are also becoming increasingly
capable of jointly processing inputs from different modalities.
In fact, they enjoy processing multiple modalities (eg. text, speech
and visual) together for superior solutions. Such algorithms
are also now discovering interesting correlations across the
modalities. In this talk, we especially focus on the interplay
between text, speech and visual content in talking face videos.
We present some of the recent results (including some from our
own research) and discuss ongoing trends and the challenges in
front of the community. For example, how many lip movements
can explain the speech that is produced and the reverse? How does
the multimodal nature of our inputs open up new avenues and
innovative solutions? Can we substitute or supplement one modality
for the other? Initial results hint at new possibilities in education,
healthcare and assistive technologies. 

Hatice Gunes
Professor of Affective Intelligence and Robotics (AFAR) and the Head of the AFAR Lab at the University of Cambridge’s Department of Computer Science and Technology, UK. 

Artificial Emotional Intelligence: Quo Vadis?

Abstract: Emotional intelligence for artificial systems is not a luxury but a necessity. It is paramount for many applications that require both short and long–term engaging human–technology interactions, including entertainment, hospitality, education, and healthcare. However, creating artificially intelligent systems and interfaces with social and emotional skills is a challenging task. Progress in industry and developments in academia provide us a positive outlook, however, the artificial emotional intelligence of the current technology is still quite limited. Creating technology with artificial emotional intelligence requires the development of perception, learning, action and adaptation capabilities, and the ability to execute these pipelines in real-time in human-AI interactions. Truly addressing these challenges relies on cross-fertilization of multiple research fields, including psychology, nonverbal behaviour understanding, psychiatry, vision, social signal processing, affective computing, and human-computer and human-robot interaction. My lab’s research has been pushing the state of the art in a wide spectrum of research topics in this area, including the design and creation of new datasets; novel feature representations and learning algorithms for sensing and understanding human nonverbal behaviours in solo, dyadic and group settings; designing short/long-term human-robot interactions for wellbeing; and investigating the bias that creeps into these systems. In this talk, I will present some of my research team’s explorations in these areas including modelling person-specific cognitive processes for personality recognition, continual learning for facial expression recognition, mitigating bias in affect recognition, learning social appropriateness of robot actions, and creating robotic wellbeing coaches with continual adaptation.

Kristen Grauman
Professor in the Department of Computer Science at the University of Texas at Austin and a Research Director in Facebook AI Research (FAIR).

Audio-visual learning

Abstract: Perception systems that can both see and hear have great potential to unlock problems in video understanding, augmented reality, and embodied AI.  I will present our recent work in audio-visual (AV) perception.
First, we explore how audio’s spatial signals can augment visual understanding of 3D environments.  This includes ideas for self-supervised feature learning from echoes, AV floorplan reconstruction, and active source separation, where an agent intelligently moves to hear things better in a busy environment.  Throughout this line of work, we leverage our open-source SoundSpaces platform, which allows state-of-the-art rendering of highly realistic audio in real-world scanned environments. 
Next, building on these spatial AV ideas, we introduce new ways to enhance the audio stream – making it possible to transport a sound to a new physical environment observed in a photo, or to dereverberate speech so it is intelligible for machine and human ears alike.  Finally, I will overview Ego4D, a massive new egocentric video dataset built via a multi-institution collaboration that supports an array of exciting multimodal tasks.

Marleen de Bruijne
Professor of AI in medical image analysis at Erasmus MC, The Netherlands

Learning with less in medical imaging

Abstract: Supervised learning approaches  have had tremendous success in medical imaging in the past few years. Automated analysis using convolutional neural networks is now in many cases as accurate as the assessment of an expert observer. A major factor still hampering the adoption of these techniques in practice is that it can be very expensive, time-consuming, or even impossible to obtain sufficiently many representative and well-annotated training images to train reliable models. On the other hand, weaker labels are often readily available, for instance in the form of a radiologist’s assessment of the presence or absence of certain abnormalities.  In this talk, we will discuss various approaches to exploit such information and to make  machine learning techniques work in real life situations, where (annotated) training data is limited, available annotations may be wrong, data is highly heterogeneous, and training data may not be representative for the target data to analyze. I will present examples in several medical imaging applications.

Dr. Xian-Sheng Hua
Vice President of Alibaba Group, Head of City Brain Lab of DAMO Academy. 

 

Scalable Real-World Visual Intelligence System - from Algorithm to Platform to Application

Abstract: Visual intelligence is one of the key aspects of Artificial Intelligence. Considerable technology progresses along this direction have been made in the past decade. However, how to incubate the right technologies to solve the real-world problems in scale and convert them into real business values remains a challenge. In this talk, we will analyze current challenges of visual intelligence and summarize a few key points that help us successfully develop and apply scalable technologies to solve the core problems. In particular, we will introduce a few key visual intelligence technologies that have been successfully applied in a few exemplar application areas, including smart city, industrial vision, visual design, and medical analysis, from problem discovery and definition, to key algorithms development, to scalable platform building, and to realizing core values in the related applications.

Award Winners

Prof. Tieniu
Tan
Institute of Automation Chinese Academy of Sciences (CASIA), China

Iris Recognition: Progress and Challenges

King-Sun Fu Prize Lecture

Abstract: Iris recognition has proven to be a most reliable biometric solution for personal identification and has received much attention from the pattern recognition community. However, it is far from being a solved problem as many open issues remain to be resolved to make iris recognition more user-friendly and robust. In this talk, I will present an overview of our decades’ efforts on iris recognition, including iris image acquisition, iris image pre-processing, iris feature extraction and security issues of iris recognition systems. I will discuss our most recent work on light-field iris recognition and all-in-focus simultaneous iris recognition of multiple people at a distance. Examples will be given to demonstrate the successful routine use of our work in a wide range of fields such as mobile payment, banking, access control, welfare distribution, etc. I will also address some of the remaining challenges as well as promising future research directions before closing the talk. 

Jiliang Tang
MSU Foundation Professor, Data Science and Engineering Lab, Michigan State University, USA

Graph Neural Networks: Models, Trustworthiness, and Applications

J. K. Aggarwal Prize Lecture

Abstract: Graph Neural Networks (GNNs) have shown their power in graph representation learning. They have advanced numerous recognition and learning tasks in many domains such as biology and healthcare. In this talk, I will first introduce a novel perspective to understand and unify existing GNNs that paves a principled and innovative way to design new GNN models. As GNNs become more pervasive, there is an ever-growing concern over how GNNs can be trusted. Then I will discuss how to build trustworthy GNNs. Given that graphs have been leveraged to denote data in real-world systems, I will finally demonstrate representative applications of GNNs.  

Prof. Yunhong Wang
School of Computer Science and Engineering, Beihang University, China

Towards Practical Biometrics: Face and Gait

Maria Petrou Prize Lecture

Abstract: Biometrics are unique physical or behavioural characteristics that can be adopted for identification. In the last few years, substantial advancements have been made in this field with the development of deep learning theories and technologies. This is evidenced by not only the high results on large-scale benchmarks but also the attempts accounting for soft-biometrics, including gender, expression, age, etc. Meanwhile, recent studies show additional challenges in uncontrolled conditions, such as severe variations in scale, pose, illumination, occlusion and cluttered background, which should be well handled for real-world applications. This talk focuses on two typical representatives, face recogniiton and gait recognition, with dedicatedly designed deep learning based methodologies towards practical use, covering the tasks from identity recognition to attribute analysis, presenting the latest progress on the interpretability and robustness of deep neural networks. Finally, some perspectives are discussed to facilitate future research.