ICDAR2017

Special Workshop Speaker

Title

Large scale datasets – Is data holding us back?


Name

Dimosthenis Karatzas

Abstract

The document image analysis community has always struggled with data. In a counter-intuitive situation, while paper documents are abundant and large quantities of them have been digitised, document analysis researchers fall short of using public, large and representative datasets in their research for a number of valid and not-so-valid reasons.

Copyright regulations and information privacy concerns among other reasons have traditionally created barriers for our research, limiting substantially our access to large-scale, public datasets, and making our research irreproducible and therefore marginal.

The wider computer vision community moves by data. It often seems that the availability of data defines what topics it is worth doing research into, and not vice versa. A focus on maximising quantity while accepting certain sacrifices in quality (e.g. by crowdsourcing annotations) is predominant. Would this be the right mentality for document image analysis?

And when large quantities of data are not easily available they are synthetically generated. And what better domain for synthesising data than document image analysis, where the images (documents) are by definition “synthetic” and generated through well understood processes. As a matter of fact, recent success stories in our own domain have been driven by synthetic data. The ability to do word spotting with large scale dictionaries, the considerable improvements in the latest version of the Tesseract OCR engine, scene text localisation, etc have all been driven by good quality synthetic data. Can this be extended to full documents? Would this be a way forward?

In a time when data is king what should the document image analysis community be doing to deal with all the above restrictions and challenges?


Short Bio

Dimosthenis Karatzas, a physicist by education, received his PhD in Computer Science from the University of Liverpool, UK in 2003. From 2002 to 2007 he worked as a Research Fellow Universities of Liverpool and Southampton, UK. Since 2007 he is with the Computer Vision Centre, at the Universitat Autònoma de Barcelona, Spain, where he leads the Robust Reading research unit. He is associate director of the Centre since 2014.

His main research interests are computer vision and pattern recognition, and in particular robust reading systems, document image analysis, human‐document interaction, and human perception modeling. He has produced >100 peer reviewed publications, including 21 in indexed journals and has an h-index of 20. He is the principal investigator of numerous research projects.

In 2013, he received the IAPR/ICDAR Young Investigator Award for innovative research in human perception‐based document analysis. In 2016, he received a Google Research Award for pursuing research in the line of modelling the interplay between visual and textual information in images.

He counts with extensive technology transfer experience. In 2007 he set up with fellow‐researchers the spin‐off company TruColour Ltd, UK, specializing on perception based colour calibration solutions. He conceived and led the creation of the “Library Living Lab” (L3), converting a public library in Barcelona into an open, participatory innovation space.

Dr Karatzas has served the international community in various roles. He is the chair of the Technical Committee 11 (Reading Systems) of the International Association of Pattern Recognition (IAPR), a member of the IAPR Industrial Liaison Committee, the IEEE, while a founding member and past member of the executive committee of the UK Chapter of the SPIE. He has been involved in the organisation of numerous international events in document image analysis, including leading the Robust Reading Competition series, counting to date with more than 3,000 registered researchers and 10,000 evaluated submissions.