Dr. Qiang Huo
Microsoft Research Asia
Title: OneOCR For Digital Transformation
Abstract: Optical Character Recognition (OCR) or more broadly Document Analysis and
Recognition (DAR) is an important enabling technology that empowers people and organizations to do more and achieve more. In a mobile-first world, we have cameras everywhere, which makes “OCR in the wild” very common in our everyday life. In Microsoft, we have been developing a new generation OCR engine (aka OneOCR), which can detect both printed and handwritten text in an image captured by a camera or mobile phone, and recognize the detected text for follow-up actions. Our unified OneOCR engine can recognize mixed printed and handwritten English text lines with arbitrary orientations (even flipped), outperforming significantly other leading industrial OCR engines on a wide range of application scenarios such as document, invoice, receipt, business card, slide, menu, book cover, poster, GIF/MEME, street view, product label, handwritten note and whiteboard. Empowered by OneOCR engine, Computer Vision Read capability and Cognitive Search capability of Azure Search are generally available, and a Form Recognizer with Receipt Understanding capability is available for preview, all in Azure Cognitive Services, to democratize OCR technologies. In this keynote talk, I will demonstrate the capabilities of Microsoft’s latest OneOCR engine, highlight its core component technologies, and explain the roadmap ahead. I will argue that now is the best time for ICDAR community to make a big impact by developing better technologies and solutions for page object (especially table) detection, table structure recognition, extraction of entities and key-value pairs in forms and receipts, which can power enterprise workflows and Robotic Process Automation (RPA) to spur digital transformation.
Bio: Dr. Qiang Huo is a Partner Research Manager of Speech Group in Microsoft Research Asia (MSRA), Beijing, China. Prior to joining MSRA in August 2007, he had been a faculty member at the Department of Computer Science, The University of Hong Kong for about ten years. Many of his students have become leaders in both academia and industry. From 1995 to 1997, Dr. Huo worked on speech recognition for the world’s first spoken language translation system at Advanced Telecommunications Research Institute (ATR) in Kyoto, Japan. In the past 30 years, he has been doing research and making fundamental contributions in the areas of speech recognition, handwriting recognition, OCR, gesture recognition, biometric-based user authentication, hardware design for speech and image processing. Many core technologies developed by his teams have been deployed widely in industry, including Microsoft’s products and services such as Windows, Office, Azure Cognitive Services, and Bing.
Prof. Enrique Vidal
Universidad Politecnica de Valencia
Title: Text Search and Information Retrieval in Large Historical Collections of Untranscribed Manuscripts
Abstract: Despite recent great advances in handwritten text recognition technology, accurate transcription of large historical manuscript collections remains elusive. In most cases, however, transcripts are only or mainly needed to enable textual search in the documents considered. In this talk we show how plain-text search and many other usual tasks of Information Retrieval and big-data Text Analytics can be accomplished without any previous explicit transcription of the manuscript images.
To this end, some years ago we drew from Lexicon-Free, Word-Segmentation-Free, Query by String, Keyword Spotting concepts and ideas to develop a Probabilistic Indexing approach aimed to support arbitrary textual queries on unconstrained text images. In this approach, a layout-agnostic, pixel-level “heat map” (called posteriorgram) is produced for each text image and each character string which proves sufficiently likely to constitute a real word written in the image. Posteriorgrams are huge, but they are simplified and pruned into manageable lists of promising hypotheses of character strings, along with their corresponding image locations (bounding boxes) and probabilities. Finally, these lists are indexed to allow extremely efficient confidence-threshold-controlled text search and retrieval at query time.
Using this approach, several very large collections of historical manuscripts have been recently indexed and made available for real, effective textual search: Chancery (82,000 page images of Latin/French manuscripts, 14th-15th c.); TSO – Spanish Golden Age Theatre (41,000 page images of Spanish comedies, 16th-18th c.); Bentham Papers (90,000 page images, of mostly English text, 18th-19th c.); Finnish Court Records (102,000 images of about 140,000 pages of Swedish text, 18th-19th c.); Carabela and CaraabelaFull – manuscripts of interest to underwater archaeology (31,000 images, of Spanish documents, most written in abstruse scripts, 15th-16th c.).
Probabilistic Indexing allows us to go beyond basic word spotting. Specifically, we will explain how they can be used for more complex tasks such as searching for hyphenated words, and for words described by wildcards or approximate spelling. Moreover, these indexes enable probabilistic versions of typical Natural Language Processing and Text Analytics tasks, such as estimating the evolution of word usage, estimating the vocabulary or the number of running words of a manuscript or a collection, computing estimated Zipf curves, etc. Finally, we will explain how Probabilistic Indexes also allow for more content-oriented, “semantic” Information Retrieval concepts and tasks such as Boolean (AND/OR/NOT) and Sequence (phrase) queries, layout-agnostic, SQL-like “database queries” in handwritten table images, content-based image classification, or even searching for melodic patterns in images of handwritten music notation.
On-line, live demonstrators of these capabilities can be found in http://transcriptorium.eu/demots/KWSdemos
Bio: Enrique Vidal is a professor emeritus of the Universitat Politcnica de Valncia (Spain) and former co-leader of PRHLT research center in this University. He has published more than two hundred and fifty research papers in the fields of Pattern Recognition, Multimodal Interaction and applications to Language, Speech and Image Processing and has led many important projects in these fields. Dr. Vidal is a member of the IEEE and a fellow of the International Association for Pattern Recognition (IAPR).
Prof. Andreas Dengel
German Research Center for Artificial Intelligence (DFKI)
Title: From Hindsights to Insights – 30 Years in Document Analysis and Recognition
Abstract: We are using text and graphic editors or other technical means, such as cameras, recorders, as well as messaging channels, all of which allow us to produce a document, i.e. a resource for furnishing information evidence or proving the information authenticity. As a result, we obtain an artifact. that may become a subject of study and interpretation. This could be a printed photograph or a sheet of paper with printed text, graphics, or writings, all of which in their specific and individual combination bear the original or legal form of something. When we take this attempt of a definition, then a document is associated with surfaces, which capture the information, the more we think about this very traditional view to a document, the more we are faced with raising challenges that are caused by the way we communicate these days that confronts us with the question: What is a document and what would document evolution mean for the field of document analysis and recognition? This formulation was a guiding motivation throughout my scientific endeavors, which began in the mid-1980s. During this time, I have gone through all phases of a scientific career, starting as a young scientist who asked curious questions, his own team of students, successfully launched research projects and started to establish and develop DFKI, which today is the largest AI research center in the world. In my talk, I will give
insights into these more than 30 years with a special focus on my findings, initiatives, and contributions to the field of document analysis and recognition.
Bio: Andreas Dengel is the Site Head at the German Research Center for Artificial Intelligence (DFKI) in Kaiserslautern and the Scientific Director of the Smart Data & Knowledge Services Research Department at DFKI. In 1993, he became a Professor at the Computer Science Department of the University of Kaiserslautern. Since 2009, he further holds a Professorship (kyakuin) at the Dept. of Computer Science and Intelligent Systems, Graduate School of Engineering of the Osaka Prefecture University. Andreas was program/technical chair of many international conferences, acts as an editorial board member of international journals and book series. He has written or edited 13 books and is author of more than 350 peer-reviewed scientific publications, several of which received a Best-Paper Award. He supervised more than 250 PhD, master and bachelor theses. Moreover, he is founder, initiator and mentor of many successful start-up companies, two of which received a “Pioneer Spirit Award” as well as the “Cebit Innovation Award”. For his contributions, he was honored by the prize “Founding Promoter of the Year”. Furthermore, Andreas is a Fellow of the International Association for Pattern Recognition (IAPR) and the Chairman of the Flexible Factory Partner Alliance (FFPA). He serves as an advisor for academic institutions, research programs as well as ministries, national and international. For his scientific findings, beside others, Andreas received one of the most prestigious personal scientific award in Germany, the Alcatel/SEL Award on Technical Communication and was appointed as “Distinguished Honorary Professor” (tokubetu eiyo kyoju) at the Osaka Prefecture University, an honor only five researchers received within 135 years. His main research interests are in the areas of machine learning, pattern recognition, immersive quantified learning, data mining, and semantic technologies.