Books 5

BOOKSBOOKSBOOKS

Information Theory, Inference,

and Learning Algorithms

By David J. C. Makay

Reviewed by Jason Dowling ( j.dowling@ieee.org )

Click here for Top of Page

Return to Home Page

Click here for

Free Books!

In this 628-page book, Professor David Mackay, from the University of Cambridge, has combined information theory and inference in an entertaining and thorough manner. The book covers many topics: teaching roadmaps are provided for it’s use as a course text for pattern recognition, information theory and neural networks; introductory information theory, error correcting codes; and machine learning. It does assume a certain level of knowledge (probably first or second year mathematical statistics and calculus would be sufficient).

One of the best features of this book is the author’s web site at www.inference.phy.cam.ac.uk/mackay/itila/ . Not only is the full text available for download, but also there are additional exercises, chapters, errata, software and other interesting topics (even a comparison with Harry Potter). It also introduced me to the open-source Matlab-like Octave language (available from www.gnu.org/software/octave/ ).

Information Theory, Inference, and Learning Algorithms (ITILA) consists of 50 short chapters: an introductory section followed by seven main parts. The first three introductory chapters present an overview of information theory (focusing on error correcting codes), probability, entropy and inference. The last of the introductory chapters discusses the use of Bayes theorem for inference and provides a comparison with traditional sampling theory statistics. If you are after an understanding of ANOVA or linear regression, this book is not a good choice. The chapter concludes with the sentence “The p-values and ‘significance levels’ of classical statistics should be treated with extreme caution. Shun them! Here ends the sermon”!

The first main part of the book, “Data compression”, covers the source coding theorem (measuring information content and its application to lossy compression), symbol codes (lossless compression, including Huffman coding), stream codes (arithmetic codes and the Lempel-Ziv algorithms), and concludes with a short chapter on codes for integers.

Part two focuses on noisy-channel coding. Topics covered include dependent random variables (entropy and mutual information), communication over a noisy channel, proofs for the noisy-channel coding theorems, and error-correcting codes (capabilities and Gaussian channels).

The third part of the book explains hash codes and collision resolution, binary codes (Hamming distance, perfect and dual codes), proof of the source coding and noisy channel coding theorems, data compression by linear hash codes, message passing, communication over constrained noiseless channels, and language modeling (crosswords and code-breaking). The final chapter in this part (entitled, “Why have sex?”) is an entertaining and interesting discussion on information and evolution.

The longest part of the book is the fourth, which is focused on probabilities and inference. The first chapter presents a good explanation of K-means clustering. This is followed by chapters on exact inference by complete enumeration, maximum likelihood and clustering, useful probability distributions (binomial, Poisson and exponential), exact marginalization (including trellises and graphs), Laplace’s Method, Occam’s razor, Monte Carlo Methods, Ising models, variational methods, independent component analysis (ICA), decision theory, and finally a chapter comparing Bayesian inference and sampling theory.

In part five, Mackay moves on to neural networks. After an introductory chapter, the author explains the single neuron as a classifier and it’s capacity, learning as inference, Hopfield networks, Boltzmann machines, supervised learning in multiplayer networks, Gaussian processes, and finally presents an interesting detour into image deconvolution.

Part six covers sparse graph codes. Three families of error-correction codes are described (low-density parity-checks codes, turbo codes, and repeat-accumulate codes), followed by a chapter on digital fountain codes for erasure correction.

The final part of ITILA are the appendices. An excellent glossary of notation is included, followed by an appendix on phase transitions in physics, and finally an appendix describing finite theory, eigenvalues and eigenvectors, and, lastly, perturbation theory.

I thought this book was excellent: carefully written with good illustrations and appropriate doses of humor. I did not notice any editing errors. I liked the introductions to chapters, which described what I should have read previously. And I also liked the margin notes, which offered useful comments, and links to previously discussed topics. Exercises were provided as an integral part of the text and presented a continuous development and explanation of the chapter contents. Many questions have well-written answers in the text. Having said that, Chapter 15 completely consists of questions without answers!

I think this book would be very useful for students studying computer science and electrical engineering, or for anyone interested in information theory, machine learning, signal processing, data mining, pattern recognition, or cognitive science. But don’t take my word for it: download a free copy for your laptop before buying a copy for your bookshelf!

Click above to go to the publisher’s web page where you see a Description of the book, the Table of Contents, an Excerpt, the Index, Copyright information, and Frontmatter.