Information Theory, Inference,

and Learning Algorithms


By  David J. C. Makay


Reviewed by Jason Dowling ( )


Click here for Top of Page
Right Arrow: Next
Right Arrow: Previous

In this 628-page book, Professor David Mackay, from the University of Cambridge, has combined information theory and inference in an entertaining and thorough manner.  The book covers many topics: teaching roadmaps are provided for it’s use as a course text for pattern recognition, information theory and neural networks; introductory information theory, error correcting codes; and machine learning.  It does assume a certain level of knowledge (probably first or second year mathematical statistics and calculus would be sufficient).


One of the best features of this book is the author’s web site at .  Not only is the full text available for download, but also there are additional exercises, chapters, errata, software and other interesting topics (even a comparison with Harry Potter).  It also introduced me to the open-source Matlab-like Octave language (available from ).


Information Theory, Inference, and Learning Algorithms (ITILA) consists of 50 short chapters: an introductory section followed by seven main parts.  The first three introductory chapters present an overview of information theory (focusing on error correcting codes), probability, entropy and inference.  The last of the introductory chapters discusses the use of Bayes theorem for inference and provides a comparison with traditional sampling theory statistics.  If you are after an understanding of ANOVA or linear regression, this book is not a good choice. The chapter concludes with the sentence “The p-values and ‘significance levels’ of classical statistics should be treated with extreme caution. Shun them!  Here ends the sermon”!


The first main part of the book, “Data compression”, covers the source coding theorem (measuring information content and its application to lossy compression), symbol codes (lossless compression, including Huffman coding), stream codes (arithmetic codes and the Lempel-Ziv algorithms), and concludes with a short chapter on codes for integers.


Part two focuses on noisy-channel coding.  Topics covered include dependent random variables (entropy and mutual information), communication over a noisy channel, proofs for the noisy-channel coding theorems, and error-correcting codes (capabilities and Gaussian channels).


The third part of the book explains hash codes and collision resolution, binary codes (Hamming distance, perfect and dual codes), proof of the source coding and noisy channel coding theorems, data compression by linear hash codes, message passing, communication over constrained noiseless channels, and language modeling (crosswords and code-breaking).  The final chapter in this part (entitled, “Why have sex?”) is an entertaining and interesting discussion on information and evolution.


The longest part of the book is the fourth, which is focused on probabilities and inference.  The first chapter presents a good explanation of K-means clustering.  This is followed by chapters on exact inference by complete enumeration, maximum likelihood and clustering, useful probability distributions (binomial, Poisson and exponential), exact marginalization (including trellises and graphs), Laplace’s Method, Occam’s razor, Monte Carlo Methods, Ising models, variational methods, independent component analysis (ICA), decision theory, and finally a chapter comparing Bayesian inference and sampling theory.


In part five, Mackay moves on to neural networks.  After an introductory chapter, the author explains the single neuron as a classifier and it’s capacity, learning as inference, Hopfield networks, Boltzmann machines, supervised learning in multiplayer networks, Gaussian processes, and finally presents an interesting detour into image deconvolution.


Part six covers sparse graph codes.  Three families of error-correction codes are described (low-density parity-checks codes, turbo codes, and repeat-accumulate codes), followed by a chapter on digital fountain codes for erasure correction.


The final part of ITILA are the appendices.  An excellent glossary of notation is included, followed by an appendix on phase transitions in physics, and finally an appendix describing finite theory, eigenvalues and eigenvectors, and, lastly, perturbation theory.


I thought this book was excellent: carefully written with good illustrations and appropriate doses of humor.  I did not notice any editing errors.  I liked the introductions to chapters, which described what I should have read previously.  And I also liked the margin notes, which offered useful comments, and links to previously discussed topics.   Exercises were provided as an integral part of the text and presented a continuous development and explanation of the chapter contents.  Many questions have well-written answers in the text.  Having said that, Chapter 15 completely consists of questions without answers!


I think this book would be very useful for students studying computer science and electrical engineering, or for anyone interested in information theory, machine learning, signal processing, data mining, pattern recognition, or cognitive science.  But don’t take my word for it: download a free copy for your laptop before buying a copy for your bookshelf!

Click above to go to the publisher’s web page where you see a Description of the book, the Table of Contents, an Excerpt, the Index, Copyright information, and Frontmatter. 

Book Reviews Published in

the IAPR Newsletter


Dynamic Vision for Perception and Control of Motion

by Dicmkanns

             (see review in this issue)



by Polanski and Kimmel

             (see review in this issue)


Introduction to clustering large and high-dimensional data

by Kogan

             (see review in this issue)


The Text Mining Handbook

by Feldman and Sanger

             (see review in this issue)


Geometric Tomography

by Gardner

           Oct ‘07   [html]     [pdf]


“Foundations and Trends in Computer Graphics and Vision”

Curless, Van Gool, and Szeliski., Editors

           Oct ‘07   [html]     [pdf]


Applied Combinatorics on Words

by M. Lothaire

           Jul ‘07    [html]     [pdf]



Human Identification Based on Gait

by Nixon, Tan and Chellappar

             Apr ‘07   [html]     [pdf]


Mathematics of Digital Images

by Stuart Hogan

             Apr ‘07   [html]     [pdf]


Advances in Image and Video Segmentation

Zhang, Editor

             Jan ‘07 [html]      [pdf]


Graph-Theoretic Techniques for Web Content Mining

by Schenker, Bunke, Last and Kandel

             Jan ‘07 [html]      [pdf]


Handbook of Mathematical Models in Computer Vision

by Paragios, Chen, and Faugeras (Editors)

           Oct ‘06     [html]     [pdf]


The Geometry of Information Retrieval

by van Rijsbergen

           Oct ‘06     [html]     [pdf]


Biometric Inverse Problems

by Yanushkevich, Stoica, Shmerko and Popel

           Oct ‘06     [html]     [pdf]


Correlation Pattern Recognition

by Kumar, Mahalanobis, and Juday

           Jul. ‘06     [html]     [pdf]


Pattern Recognition 3rd Edition

by Theodoridis and Koutroumbas

           Apr. ‘06    [html]     [pdf]


Dictionary of Computer Vision and

Image Processing

by R.B. Fisher, et. Al

           Jan. ‘06    [html]     [pdf]


Kernel Methods for Pattern Analysis

by Shawe-Taylor and Cristianini

           Oct. ‘05    [html]     [pdf]


Machine Vision Books

           Jul. ‘05     [html]     [pdf]


CVonline:  an overview

           Apr. ‘05    [html]     [pdf]


The Guide to Biometrics by Bolle, et al

           Jan. ‘05    [html]     [pdf]


Pattern Recognition Books

           Jul. ‘04                  [pdf]