FEATURE:
ICPR 2010 Plenary Talk |

Embracing Uncertainty: The New Machine Intelligence
By Christopher M. Bishop (UK) Reviewed by Cem Keskin (Turkey) |

Prof. Bishop started his talk by giving an overview of the major changes in approaches to machine intelligence in the last few decades. Specifically, he mentioned the shift towards cloud computing on very large and distributed databases from data-driven standalone applications. He said that services are now replacing the applications, and diverse data sources are being fused instead of isolated databases. Most importantly, hand crafted solutions to machine intelligence problems are being replaced by solutions learned from data sets. To illustrate the increasing importance of data, he showed a figure visualizing the growth of stored data over the years. According to the figure, there were 280 exabytes of stored data in 2008, and it is being doubled every 18 months. Before going over what he calls the new age of machine intelligence, Prof. Bishop talked briefly about the history, starting with the first generation machine intelligence, which started in 60’s and ended in 80’s. The main approach to problems in this era relied on the expertise of humans and their ability to define rules describing the system. Even though the researchers were optimistic about the progress of machine intelligence at that time, the combinatorial explosion of required rules attributed to more complex systems proved too hard to deal with. According to Prof. Bishop, the second generation of machine intelligence, which started in the 90’s and hasn’t been abandoned yet, made use of statistical tools, such as neural networks and support vector machines. The general idea has been to collect positive samples and to train a system using these tools. This system could also be somewhat adapted to the end user through a final phase of fine tuning. The main disadvantage of these methods, he said, is the difficulty of incorporating complex domain knowledge. He showed some basic examples as to why prior knowledge is important and then called these methods black-box statistical models. The aim of the third generation is to integrate domain knowledge with statistical learning methods. Prof. Bishop said that there are three key ideas. The first idea is to use probability distributions to model uncertainties, i.e., Bayesian learning, which iteratively updates the uncertainties upon introduction of new knowledge to the system. The second idea is to use probabilistic graphical models, which are especially well-suited to representing domain knowledge. Most well known models and methods, such as Kalman filters, hidden Markov models, principle component analysis, factor analysis, etc., fall into this category. The final key idea is to use efficient inference methods. At this point he showed some basic examples of how a reformulation of terms can lead to huge speed gains, possibly changing the time complexity from exponential to polynomial. Bayesian methods usually give an answer by integrating over the uncertainty, which is not always possible, as integrating the true distributions associated with problems can be intractable. A common solution, Prof. Bishop said, is to use Monte Carlo methods, but these are very costly. Therefore, usually approximate methods are employed, such as variational message passing, loopy belief propagation, expectation propagation, etc., which are not accurate, but have good accuracy. He then demonstrated these approximate methods for a toy problem. Prof. Bishop gave Bayesian ranking as a real world problem example, which is the problem of estimating a global ranking from noisy partial rankings. He showed that, by employing an approximate method, they managed to adapt the system to 20 million active users in multiple teams. This system, called TrueSkill, converges to the correct result an order of magnitude faster than its exact counterpart. Another case study Prof. Bishop showed involved search engines and the number of clicks an ad would receive. Basically, the system tries to estimate the number of clicks an ad would receive if it were shown on a page for specific keywords. This problem has an interesting property: you have to first show the ad in order to collect the data, which is called the exploration vs. exploitation trade-off. Prof. Bishop showed that their system achieved remarkable results. Finally, Prof. Bishop mentioned Infer.NET, a framework they developed for running Bayesian inference in graphical models. The framework can be used to solve many different kinds of machine learning problems. He then summarized his talk and finished by answering some questions from the audience. |

Professor Chris Bishop is Chief Research Scientist at Microsoft Research Cambridge. He also has a Chair in computer science at the University of Edinburgh, and is a Fellow of Darwin College Cambridge. Chris is the author of the leading textbook “Pattern Recognition and Machine Learning” (Springer, 2006). His research interests include probabilistic approaches to machine learning as well as their application to fields such as biomedical sciences and healthcare. |