by Andrzej Polanski and Marek Kimmel
Reviewed by: Scott F. Smith,
Boise State University
This book is written from the point of view of applying the techniques of Computational Statistics to applications areas in Bioinformatics. The authors are university researchers with backgrounds in Statistics and Computer Science and readers with similar backgrounds desiring to tackle problems in the biological sciences are the ones most likely to benefit from reading this book.
The book is divided into two parts, nearly equal in size. The first part is a solid review of mathematical and computational methods. This part focuses on those methods that are frequently used in many common bioinformatics applications. While most of this material will be very familiar to most readers, it is useful to have the details compiled concisely into a single place. Topics covered in this section include: maximum likelihood, statistical testing, Markov models, sorting, string searches, classification, clustering, optimization, and dynamic programming.
The second part, Applications, is the real reason for reading this book. Bioinformatics is a very broad field and any attempt at comprehensive coverage of topics would result in many hundreds of pages of superficial discussion. The authors have chosen to focus on a few of the most commonly used and very well researched topics. These topics are drawn from: sequence alignment, phylogenetics, genomics, proteomics, RNA structure, and microarray analysis. With a few exceptions, the most relevant aspects of these topics are covered at a level of detail and length of discussion that seem appropriate.
The chapter on sequence alignment has a good presentation of the material involved in optimal pair-wise sequence alignment using dynamic programming as exemplified by the Smith-Waterman algorithm. However, by far the most commonly used pair-wise alignment method is the suboptimal, but much faster, BLAST algorithm. Many other sequence analysis tasks in bioinformatics use the basic heuristics underlying BLAST to make the algorithms fast enough to be useful. It seems that devoting a few pages to these heuristics would have been helpful, even if the ideas do not have the mathematical beauty of dynamic programming. The concept of multiple alignment is briefly addressed in the final section of the chapter, but without enough detail to do anything with it. This is a shame, since most of the really interesting and difficult problems in sequence alignment involve multiple alignment.
The chapter on RNA is much too short and appears to be an afterthought. The entire topic of non-coding RNA gene search is missing. The use of covariance models for gene search would seem to have fit well here since they are an extension of hidden Markov models which are discussed at length elsewhere in the book. The coverage of RNA secondary-structure prediction using the Zucker algorithm is adequate, but the significant difficulties associated with pseudoknots, while mentioned, are not adequately dealt with.
The book ends with a chapter on Internet resources. This chapter is woefully inadequate and would have been better left out. In general, it seems that the best place for references to Internet resources is on the web itself where the lack of space limitations allows one to be comprehensive and where the links can be added, updated, or removed as the resources evolve.
The book does have end-of-chapter exercises. These are fairly well developed in the first part of the book (mathematical and computational foundations part). However, the exercises in the second part generally do not have very detailed descriptions. Use of these problems for graded homework assignments would be difficult without the instructor adding much additional detail as to exactly what is expected.
One disappointing aspect of this book is that it was not proof-read as thoroughly as one might like. There are numerous instances of spelling errors, incorrect punctuation, and incorrect words (such as then/than substitutions). These generally do not keep the information being conveyed from getting across, but it is definitely distracting.
Overall, this is a useful book for computationally minded individuals who are looking to move into the bioinformatics field. As a textbook, it would be OK (but there are also very few good alternatives). For biologists wanting to move into the computational part of the field, this book would likely be a very hard read since there is a lot of presumed specific background knowledge associated with the statistical and computer science communities.
Click above to go to the publisher’s web page where you can read about this book and see the Table of Contents and sample pages.
Book Reviews Published in
the IAPR Newsletter
Dynamic Vision for Perception and Control of Motion
Introduction to clustering large and high-dimensional data
The Text Mining Handbook
by Feldman and Sanger
Information Theory, Inference,
and Learning Algorithms
“Foundations and Trends in Computer Graphics and Vision”
Curless, Van Gool, and Szeliski., Editors
Applied Combinatorics on Words
by M. Lothaire
Human Identification Based on Gait
by Nixon, Tan and Chellappar
Mathematics of Digital Images
by Stuart Hogan
Advances in Image and Video Segmentation
Graph-Theoretic Techniques for Web Content Mining
by Schenker, Bunke, Last and Kandel
Handbook of Mathematical Models in Computer Vision
by Paragios, Chen, and Faugeras (Editors)
The Geometry of Information Retrieval
by van Rijsbergen
Biometric Inverse Problems
by Yanushkevich, Stoica, Shmerko and Popel
Correlation Pattern Recognition
by Kumar, Mahalanobis, and Juday
Pattern Recognition 3rd Edition
by Theodoridis and Koutroumbas
Dictionary of Computer Vision and
by R.B. Fisher, et. Al
Kernel Methods for Pattern Analysis
by Shawe-Taylor and Cristianini
Machine Vision Books
CVonline: an overview
The Guide to Biometrics by Bolle, et al
Pattern Recognition Books
Jul. ‘04 [pdf]