The monograph is mainly dedicated to the mathematical proof of a few new theorems the author found along with a new mathematical framework useful for statistical learning: algebraic geometry. I need to say that my review will not be done from the point of view of an expert, but from the one of a practitioner in the pattern recognition field.  However, the part of the book that deals with applications for real pattern recognition systems is not explicitly illustrated in the book. I would mostly recommend the book to mathematicians that want to get involved in the pattern recognition world or formalize ideas about convergence or singularity issues of their algorithms.

The author is an experienced researcher in the field and developed an original theory about singularity detection and handling in the course of machine learning processes. This theory aims at analyzing together zeta function, Schwartz distribution, empirical process, and statistical learning by the means of algebraic geometry. Basically, the outcome of the book is the demonstration of four new theorems in this field. Doing that, the author extends the scope of regular machine learning towards  the foundations of an innovative singular learning theory.

In this new framework, linear algebra is replaced by ring and ideal algebra, the parameter estimation by probability distribution estimation, the information criterion AIC by an equation of state, and, above all, the differential geometry by algebraic geometry. Likewise, the central limit theorem and the maximum likelihood estimator do not apply anymore. And in a sense, almost all machine learning systems like artificial neural networks, mixture models, hidden Markov models behave as singular machines. This is the reason why this theory might be of utmost importance even for mere practitioners.

The book is organized into eight chapters. Chapter 1 is a brief overview of classical statistical learning concepts and methods. The four main formulas related to the new theory Sumio Watanabe is elaborating are introduced based on algebraic geometry.  For instance the log density ratio function of any statistical model can be written by a common standard form based on resolution of singularities and also the generalization and training errors in the maximum likelihood method are symmetrical. Chapter 2 lays down the foundations of the singularity theory, and Chapter 3 draws a brief description of algebraic geometry tools. Chapters 4 and 5 cope with the zeta function analysis and the convergence in law of empirical processes in order to end up in Chapter 6 with the thorough proofs of the four formulas. Applications of singular learning theory to information science are presented over the last two chapters.

It is interesting to note that the regular statistical theory was constructed by R.A. Fisher in 1925 but its extension to singular statistics remains still to be accepted in our community. I believe that this monograph can contribute to this effort. However, despite all the efforts to make it accessible to non-mathematician readers, it will ask a lot of time for common practitioners or researchers in the field of pattern recognition to go through all the demonstrations and manage to master the new theory. To me, the real asset for practical applications remains to be proved. To sum up, I suspect that this book could be of utmost importance to predict the behavior of an information system but will need a thorough investigation by a small, dedicated team worldwide.

Click here for Top of Page
Right Arrow: Next
Right Arrow: Previous



Algebraic Geometry and Statistical Learning Theory


by Sumio Watanabe

Cambridge University Press

Monographs on Applied and Computational Mathematics Series


Reviewed by

Nicolas Loménie (Singapore)

Click on the image (above) to go to the publisher’s web page for this book where you will find a description of the book and the Table of Contents.