The Text Mining Handbook


by Ronen Feldman and James Sanger

Cambridge University Press, 2007


Reviewed by L. Venkata Subramaniam

Click here for Top of Page
Right Arrow: Next
Right Arrow: Previous

Text mining today covers a broad range of topics. This handbook gives a high-level perspective of text mining by covering many of the important topics. The handbook is aimed at a wide spectrum of audiences comprising students, academic researchers and professional practitioners.


The first two chapters of the book provide an introduction to text mining and the operations involved in doing text mining. Chapter I presents text mining definitions. It also gives the general architecture of a text mining system. Chapter II presents core text mining operations. This chapter covers various pattern-discovery algorithms.


The next six chapters present basic preprocessing techniques in text mining. Chapter III presents an extremely brief introduction to linguistic preprocessing techniques in text mining. Chapter IV covers text categorization. Chapter V looks at text clustering. Chapter VI covers information extraction (IE). These chapters cover the main definitions and techniques. Chapter VII covers probabilistic models for information extraction. Chapter VIII presents the applications of the probabilistic models presented in the previous chapter to different IE tasks. In particular, hidden Markov models, stochastic context free grammars, and maximal entropy are covered from the mathematical perspective, and their application to IE is given in these two chapters.


The next two chapters cover the user interface part of text mining systems. Chapter IX looks at aspects related to browsing large text collections. Chapter X covers visualization approaches to view the text document collections and the results obtained from various text mining operations on document collections.


In Chapter XI the topic of link analysis is covered. In this chapter, techniques to analyze large networks of entities are presented. The work in the first eight chapters talked about how the entities can be extracted from the text. In this chapter the focus is on finding specific patterns within the network of entities.


Finally, in Chapter XII, real-world applications are presented. Text mining systems in the areas of corporate finance, patent research, and life sciences are presented. 


The Appendix explains DIAL (declarative information analysis language). This is a dedicated information extraction language.


There are notes at the end of each chapter that discuss related work. This is very helpful in placing the work of the chapter in context and for looking up related work to gain better understanding. There is a common bibliography at the end of the book.


One topic that I think the authors should have but didn’t cover at all is text mining in the presence of noise. Real world user-generated text data is noisy and today it is important to deal with it. Blogs, newsgroup postings, emails and other such spontaneously written text found in abundance is very noisy. Further, there is also deliberately added noise in the form of spams and splogs. From my perspective, as a text mining practitioner, I would have liked to see some coverage of this. But that is something for the authors to add in the next edition. .


The authors in their preface have mentioned that they have tried to blend together theory and practice by providing many real-life scenarios that show how the different techniques are used in practice. I think they have largely succeeded in doing that. They have addressed the needs of both developers and users of text mining systems.


My recommendation to the readers is to buy the book. This book is definitely worth having in your book shelf as a handy reference.


Click above to go to the publisher’s web page where you see a Description of the book, the Table of Contents, an Excerpt, the Index, Copyright information, and Frontmatter. 

Book Reviews Published in

the IAPR Newsletter


Dynamic Vision for Perception and Control of Motion

by Dicmkanns

             (see review in this issue)



by Polanski and Kimmel

             (see review in this issue)


Introduction to clustering large and high-dimensional data

by Kogan

             (see review in this issue)


Information Theory, Inference,

and Learning Algorithms

by Makay

                 (see review in this issue)


Geometric Tomography

by Gardner

           Oct ‘07   [html]     [pdf]


“Foundations and Trends in Computer Graphics and Vision”

Curless, Van Gool, and Szeliski., Editors

           Oct ‘07   [html]     [pdf]


Applied Combinatorics on Words

by M. Lothaire

           Jul ‘07    [html]     [pdf]



Human Identification Based on Gait

by Nixon, Tan and Chellappar

             Apr ‘07   [html]     [pdf]


Mathematics of Digital Images

by Stuart Hogan

             Apr ‘07   [html]     [pdf]


Advances in Image and Video Segmentation

Zhang, Editor

             Jan ‘07 [html]      [pdf]


Graph-Theoretic Techniques for Web Content Mining

by Schenker, Bunke, Last and Kandel

             Jan ‘07 [html]      [pdf]


Handbook of Mathematical Models in Computer Vision

by Paragios, Chen, and Faugeras (Editors)

           Oct ‘06     [html]     [pdf]


The Geometry of Information Retrieval

by van Rijsbergen

           Oct ‘06     [html]     [pdf]


Biometric Inverse Problems

by Yanushkevich, Stoica, Shmerko and Popel

           Oct ‘06     [html]     [pdf]


Correlation Pattern Recognition

by Kumar, Mahalanobis, and Juday

           Jul. ‘06     [html]     [pdf]


Pattern Recognition 3rd Edition

by Theodoridis and Koutroumbas

           Apr. ‘06    [html]     [pdf]


Dictionary of Computer Vision and

Image Processing

by R.B. Fisher, et. Al

           Jan. ‘06    [html]     [pdf]


Kernel Methods for Pattern Analysis

by Shawe-Taylor and Cristianini

           Oct. ‘05    [html]     [pdf]


Machine Vision Books

           Jul. ‘05     [html]     [pdf]


CVonline:  an overview

           Apr. ‘05    [html]     [pdf]


The Guide to Biometrics by Bolle, et al

           Jan. ‘05    [html]     [pdf]


Pattern Recognition Books

           Jul. ‘04                  [pdf]