Introduction to clustering large
and high-dimensional data
by Jacob Kogan
Cambridge University Press, 2007
Reviewed by: Nicolas Loménie
Although the book is entitled Introduction to clustering large and high-dimensional data, it focuses on the k-means numerical scheme and text mining applications. At first glance, one might consider it as a challenge to write an interesting 200 page-long book with 149 references on such a narrow subject as the k-means algorithm. However, in the course of my research activity, I have come to practice the k-means scheme on stereoscopic data for visual computing in many more ways than those generally accepted in the computer science community. Arguing with colleagues—experts in data mining and classification areas—I have claimed that the k-means scheme is too often reduced to merely its basic, primal formulation as a quadratic distance-based algorithm used to discover structures. To me, the k-means scheme is a much more general and subtle scheme. And that is exactly the topic of this book.
This book consists of 11 chapters. Each chapter ends with thorough bibliographic notes and references. Chapter 1 introduces the topic of the book: clustering of sparse data in high-dimensional space, especially for document retrieval. Chapter 2 deals with the classic formulation of the quadratic k-means algorithm in Euclidean spaces. Chapter 3 is a brief chapter dedicated to the BIRCH algorithm that operates on large amounts of data, but where there are limitations on the amount of memory space. Chapter 4 deals with the spherical k-means algorithm, which is an adaptation of the k-means scheme to a particular space (called hypersphere) embedded in the Euclidean space and usually adopted in document retrieval applications. Chapters 5 to 8 broaden the classic quadratic k-means scheme to various formulations, demonstrating that this numerical scheme has broader applicability than is usually depicted in the scope of lectures or even research papers. Chapter 9 moves on to the issue of the assessment of clustering results. Finally, Chapters 10 and 11 give an interesting appendix on optimization and linear algebra backgrounds and solutions to selected problems/exercises raised all along in the preceding chapters.
The author is a professor in the department of Mathematics at the University of Maryland, Baltimore. Therefore, the book is a formal treatment of the topic with numerous definitions, theorems and lemmas. It also provides a lot of numerical experiments and discussions with simple examples to clarify the behaviors of this stimulating scheme. Hence, this book may serve as a useful reference for scientists and engineers who need to understand the concepts of clustering in general and/or to focus on text mining applications. It is also appropriate for students who are attending a course in pattern recognition, data mining, or classification and are interested in learning more about issues related to the k-means scheme for an undergraduate or master's thesis project. Last, it supplies very interesting material for instructors.
To improve the second edition, I would suggest :
¨ to give many more pseudo-code, ready-to-implement algorithms; or, at least, to make them more visible in the text.
¨ to provide many more references to the pattern recognition and computer science communities that have been facing these issues as well : a book like that of J. Bezdek et al. about fuzzy clustering, for example.
From a general point of view, it is interesting to note that, even in such a narrow scientific area, the community does not use a common vocabulary; for instance, term fuzzy is hardly written once in this book!
Click above to go to the publisher’s web page where you see a Description of the book, the Table of Contents, an Excerpt, the Index, Copyright information, and Frontmatter.
Book Reviews Published in
the IAPR Newsletter
Dynamic Vision for Perception and Control of Motion
by Polanski and Kimmel
The Text Mining Handbook
by Feldman and Sanger
Information Theory, Inference,
and Learning Algorithms
“Foundations and Trends in Computer Graphics and Vision”
Curless, Van Gool, and Szeliski., Editors
Applied Combinatorics on Words
by M. Lothaire
Human Identification Based on Gait
by Nixon, Tan and Chellappar
Mathematics of Digital Images
by Stuart Hogan
Advances in Image and Video Segmentation
Graph-Theoretic Techniques for Web Content Mining
by Schenker, Bunke, Last and Kandel
Handbook of Mathematical Models in Computer Vision
by Paragios, Chen, and Faugeras (Editors)
The Geometry of Information Retrieval
by van Rijsbergen
Biometric Inverse Problems
by Yanushkevich, Stoica, Shmerko and Popel
Correlation Pattern Recognition
by Kumar, Mahalanobis, and Juday
Pattern Recognition 3rd Edition
by Theodoridis and Koutroumbas
Dictionary of Computer Vision and
by R.B. Fisher, et. Al
Kernel Methods for Pattern Analysis
by Shawe-Taylor and Cristianini
Machine Vision Books
CVonline: an overview
The Guide to Biometrics by Bolle, et al
Pattern Recognition Books
Jul. ‘04 [pdf]