| |
Last updated on August 12, 2018. This conference program is tentative and subject to change
Technical Program for Friday August 24, 2018
|
FrAMOT1 |
Ballroom C, 1st Floor |
FrAMOT1 Applications of Classification and Learning (Ballroom C, 1st Floor) |
Oral Session |
|
10:30-10:50, Paper FrAMOT1.1 | |
Expected Hypervolume Improvement with Constraints |
Abdolshah, Majid | Deakin Univ |
Shilton, Alistair | Deakin Univ |
Rana, Santu | Deakin Univ |
Gupta, Sunil Kumar | Deakin Univ |
Venkatesh, Svetha | Deakin Univ |
Keywords: Applications of pattern recognition and machine learning, Reinforcement learning
Abstract: Bayesian optimisation has become a powerful framework for global optimisation of black-box functions that are expensive to evaluate and possibly noisy. In addition to expensive evaluation of objective functions, many real-world optimisation problems deal with similarly expensive black-box constraints. However, there are few studies regarding the role of constraints in multi-objective Bayesian optimisation. In this paper, we extend the Expected Hypervolume Improvement by introducing expectation of constraints satisfaction and merging them into a new acquisition function called Expected Hypervolume Improvement with Constraints (EHVIC). We analyse the performance of our algorithm by estimating the feasible region dominated by Pareto front using 4 benchmark functions. The proposed method is also evaluated on a real-world problem of Alloy Design. We demonstrate that EHVIC is an effective algorithm that provides a promising performance by comparing to a well-known related method.
|
|
10:50-11:10, Paper FrAMOT1.2 | |
Multi-Scale Generative Adversarial Networks for Crowd Counting |
Yang, Jianxing | Tianjin Univ |
Zhou, Yuan | Tianjin Univ |
Kung, Sun-Yuan | Princeton Univ |
Keywords: Applications of pattern recognition and machine learning, Visual surveillance
Abstract: we investigate generative adversarial networks as an effective solution to the crowd counting problem. These networks not only learn the mapping from crowd image to corresponding density map, but also learn a loss function to train this mapping. There are many challenges to the task of crowd counting, such as severe occlusions in extremely dense crowd scenes, perspective distortion, and high visual similarity between pedestrians and background elements. To address these problems, we proposed multi-scale generative adversarial network to generate high-quality crowd density maps of arbitrary crowd density scenes. We utilized the adversarial loss from discriminator to improve the quality of the estimated density map, which is critical to accurately predict crowd counts. The proposed multi-scale generator can extract multiple hierarchy features from the crowd image. The results showed that the proposed method provided better performance compared to current state-of-the-art methods.
|
|
11:10-11:30, Paper FrAMOT1.3 | |
Hybrid Path Planner for Efficient Navigation in Urban Road Networks through Analysis of Trajectory Traces |
Sinha, Sayan | IIT Kharagpur |
Nirala, Mehul Kumar | IIT Kharagpur |
Ghosh, Shreya | IIT Kharagpur |
Ghosh, Soumya K. | IIT Kharagpur |
Keywords: Data mining, Applications of pattern recognition and machine learning, Support vector machine and kernel methods
Abstract: Computing optimal routes in a road network is both space and time-consuming. This paper proposes an innovative way of finding a route, given the traffic conditions of every edge present in the map. This article aims to bring about a convergence between methods used in routing data over a network and path planning used in AI technique. It makes use of certain concepts of network routing tables in order to develop path planners suitable for urban conditions. It also has a linear space complexity.
|
|
11:30-11:50, Paper FrAMOT1.4 | |
Image Exploration Procedure Classification with Spike-Timing Neural Network for the Blind |
Zhang, Ting | Purdue Univ |
Zhou, Tian | Purdue Univ |
Duerstock, Bradley S. | Purdue Univ |
Wachs, Juan | Purdue Univ |
Keywords: Applications of pattern recognition and machine learning, Neural networks, Pattern recognition for human computer interaction
Abstract: Individuals who are blind use exploration procedures (EPs) to navigate and understand digital images. The ability to model and detect these EPs can help the assistive technologies’ community build efficient and accessible interfaces for the blind and overall enhance human-machine interaction. In this paper, we propose a framework to classify various EPs using spike-timing neural networks (SNNs). While users interact with a digital image using a haptic device, rotation and translation-invariant features are computed directly from exploration trajectories acquired from the haptic control. These features are further encoded as model strings through trained SNNs. A classification scheme is then proposed to distinguish these model strings to identify the EPs. The framework adapted a modified Dynamic Time Wrapping (DTW) for spatial-temporal matching with Dempster-Shafer Theory (DST) for multimodal fusion. Experimental results (87.05% as EPs’ detection accuracy) indicate the effectiveness of the proposed framework and its potential application in human-machine interfaces.
|
|
11:50-12:10, Paper FrAMOT1.5 | |
Efficient Statistical Face Recognition Using Trigonometric Series and CNN Features |
Savchenko, Andrey | National Res. Univ. Higher School of Ec |
Keywords: Classification, Face recognition, Density estimation
Abstract: In this paper we deal with unconstrained face recognition with few training samples. The facial images are described with the off-the shelf high-dimensional features extracted with a deep convolutional neural network (CNN), which was preliminarily trained with an external very-large dataset. We focus on drawbacks of conventional probabilistic neural network (PNN), namely, low recognition performance and high memory space complexity. We propose to modify the PNN by replacing the exponential activation function in the Gaussian Parzen kernel to the trigonometric functions and use the orthogonal series density estimation of the CNN features. We demonstrate that the proposed approach significantly decreases the runtime complexity of face recognition if the classes are rather balanced and there are more than five training images per each subject. An experimental study with either traditional VGGNet and Light CNN, or contemporary VGGFace2_ft and MobileNet trained on VGGFace-2 dataset, have shown that our algorithm is very efficient and rather accurate in comparison with the instance-based learning classifiers.
|
|
12:10-12:30, Paper FrAMOT1.6 | |
Multi-Classification of Parkinson's Disease Via Sparse Low-Rank Learning |
Lei, Haijun | Shenzhen Univ |
Zhao, Yujia | Shenzhen Univ |
Huang, Zhongwei | Shenzhen Univ |
Zhou, Feng | Univ. of Michigan |
Huang, Limin | Shenzhen People’s Hospital |
Lei, Baiying | Shenzhen Univ |
Keywords: Dimensionality reduction, Classification, Computer-aided detection and diagnosis
Abstract: Neuroimaging techniques have been widely applied to various neurodegenerative disease analysis to reveal the intricate brain structure. The high dimensional neuroimaging features and limited sample size are the main challenges for the diagnosis task due to the unbalanced input data. To handle it, a sparse low-rank learning framework is proposed, which unveils the underlying relationships between input data and output targets by building a matrix-regularized feature network. Then we obtain the feature weight from the network based on local clustering coefficients. By discarding the irrelevant features and preserving the discriminative structured features, our proposed method can select the most relevant features and identify different stages of Parkinson’s disease (PD) from normal controls. Extensive experimental results evaluated on the Parkinson’s progression markers initiative (PPMI) dataset demonstrate that the proposed method achieves promising classification performance and outperforms the conventional algorithms. Furthermore, it can detect potential brain regions related to PD for future medical analysis.
|
|
FrAMOT2 |
309B, 3rd Floor |
FrAMOT2 Object Recognition and Scene Understanding (309B, 3rd Floor) |
Oral Session |
|
10:30-10:50, Paper FrAMOT2.1 | |
Real-Time Texture-Less Object Recognition on Mobile Devices |
Chan, Jacob | Osaka Univ. Inst. of Datability Science |
Lee, Jimmy Addison | Cixi Inst. of Biomedical Engineering, Chinese Acad. of Sci |
Qian, Kemao | Nanyang Tech. Univ |
Keywords: Object recognition, Object detection, Applications of computer vision
Abstract: This paper presents a technique for real-time texture-less object recognition and tracking on mobile devices. Our proposed algorithm is an even lighter-weight version of the recent state-of-the-art binary-based texture-less object detector BIND (Binary Integrated Net Descriptor), primarily customized for mobile device applications. This modification, termed BINDLite, employs various techniques to overcome the low-computational power of current mobile devices, while mostly retaining the texture-less object detection robustness of the original BIND. On current generation mobile devices, BIND-Lite was able to achieve runtime rates of up to 30 frames per second. To evaluate our algorithm, we have also designed a mobile augmented reality application coined IMPRINT, which renders logos/images onto detected objects to showcase BIND-Lite in a real-time augmented reality setting.
|
|
10:50-11:10, Paper FrAMOT2.2 | |
Selective Multi-Convolutional Region Feature Extraction Based Iterative Discrimination CNN for Fine-Grained Vehicle Model Recognition |
Tian, Yanling | Shannxi Normal Univ |
Zhang, Weitong | Shaanxi Normal Univ |
Zhang, Qieshi | Chinese Acad. of Sciences (CAS) |
Lu, Gang | Shaanxi Normal Univ |
Wu, Xiaojun | Shaanxi Normal Univ |
Keywords: Image classification, Scene understanding, Deep learning
Abstract: With the rapid rise of computer vision and driverless technology, vehicle model recognition plays a huge role in the common application and industry field. While fine-grained vehicle model recognition is often influenced by multi-level information, such as the image perspective, inter-feature similarity, vehicle details. Furthermore, pivotal regions extraction and fine-grained feature learning have become a vital obstacle to the fine-grained recognition of vehicle models. In this paper, we propose an iterative discrimination CNN (ID-CNN) based on selective multi-convolutional region (SMCR) feature extraction. The SMCR features, which consist of global and local SMCR features, are extracted from the original image with higher activation response value. As for ID-CNN, we use the global and local SMCR features iteratively to localize deep pivotal features and concatenate them together into a fully-connected fusion layer to predict the vehicle categories. We get better results and improve the accuracy to 91.8% on Stanford Cars-196 dataset and to 96.2% on CompCars dataset.
|
|
11:10-11:30, Paper FrAMOT2.3 | |
A Shortly and Densely Connected Convolutional Neural Network for Vehicle Re-Identification |
Zhu, Jianqing | Huaqiao Univ |
Huanqiang, Zeng | Huaqiao Univ |
Lei, Zhen | Inst. of Automation, Chinese Acad. of Sciences |
Liao, Shengcai | Inst. of Automation, Chinese Acad. of Sciences |
Lixin, Zheng | Huaqiao Univ |
Cai, Canhui | HuaqiaoUniversity |
Keywords: Applications of computer vision, Object recognition, Deep learning
Abstract: In this paper, we propose a shortly and densely connected convolutional neural network (SDC-CNN) for vehicle re-identification. The proposed SDC-CNN mainly consists of short and dense units (SDUs), necessary pooling and normalization layers. The main contribution lies at the design of short and dense connection mechanism, which would effectively improve the feature learning ability. Specifically, in the proposed short and dense connection mechanism, each SDU contains a short list of densely connected convolutional layers and each convolutional layer is of the same appropriate channels. Consequently, the number of connections and the input channel of each convolutional layer are limited in each SDU, and the architecture of SDC-CNN is simple. Extensive experiments on both VeRi and VehicleID datasets show that the proposed SDC-CNN is obviously superior to multiple state-of-the-art vehicle re-identification methods.
|
|
11:30-11:50, Paper FrAMOT2.4 | |
Visual Relationship Detection Using Joint Visual-Semantic Embedding |
Li, Binglin | Simon Fraser Univ |
Wang, Yang | Univ. of Manitoba |
Keywords: Scene understanding, Deep learning, Object detection
Abstract: Visual relationship detection can serve as the intermediate building block for higher level tasks such as image captioning, visual question answering, image-text matching. Due to the long tail of relationship distribution in real world images, zero-shot predication of relationships that it has never seen before can alleviate stress of collecting every possible relationship. Following zero-shot learning (ZSL) strategies, we propose a joint visual-semantic embedding model for visual relationship detection. In our model, the visual vector and semantic vector are projected to a shared latent space to learn the similarity between the two branches. In the semantic embedding, sequential features in terms of are learned to provide the context information and then concatenated with corresponding component vector of the relationship triplet. Experiments show that the proposed model achieves superior performance in zero-shot visual relationship detection and comparable results in non-zero-shot scenario.
|
|
11:50-12:10, Paper FrAMOT2.5 | |
Time-Dependent Pre-Attention Model for Image Captioning |
Wang, Fuwei | Shanghai Jiao Tong Univ |
Gong, Xiaolong | Shanghai Jiao Tong Univ |
Huang, Linpeng | Shanghai Jiao Tong Univ |
Keywords: Image captioning, Neural networks
Abstract: The task of automatically generating image captions draws a lot of attention in the past few years because it shows great potential in a wide range of application scenarios. The encoder-decoder structure with attention mechanism has been extensively applied to solve this task. However, most researches apply attention mechanism only to pay attention to image features but neglect the relations between image features which we think play an important role in scene understanding. To tackle this problem, we propose a novel attention mechanism named "attention to Time-Dependent Pre-Attention" (TDPA-attention) and the TDPA-attention is combined with a hierarchical LSTM decoder to compose our captioning model (TDPA-model). Within our TDPA-attention, at every time step, every image feature pays attention to all image features according to a semantic context and the attended feature is treated as an aggregated feature that contains relations between this image feature and all image features. All these aggregated features form a new feature set that the hierarchical LSTM decoder attends to. We evaluate our model on public image caption dataset Microsoft COCO and achieve state-of-the-art performance on most evaluation metrics.
|
|
12:10-12:30, Paper FrAMOT2.6 | |
Towards Unconstrained Pointing Problem of Visual Question Answering: A Retrieval-Based Method |
Cheng, Wenlong | Inst. of Automation, Chinese Acad. of Sciences |
Huang, Yan | Inst. of Automation, Chinese Acad. of Sciences |
Wang, Liang | Inst. of Automation, Chinese Acad. of Sciences |
Keywords: Vision and language, Deep learning, Applications of computer vision
Abstract: The pointing problem of visual question answering (VQA) is that given an image and a question which asks for the location of the interested object, find a region that answers the question. It is an important research problem in VQA tasks and has many potential applications in our daily life. But most of the existing work on this task can only solve it in the form of multiple choices, i.e., given groundtruth candidate answer in advance, and then select a correct one. In this paper, we propose a retrieval model, which not only solves the multiple-choices task, but also provides a feasible solution for the no-candidate-answer task. The principle of our method is to pull the question and correct answer close in a common feature space, and push the question and incorrect answer away. To our best knowledge, we are the first to use the retrieval method to solve the pointing problem of VQA. Furthermore, our proposed method outperforms the state-of-the-art methods on the Visual7W dataset in terms of the pointing problem of VQA.
|
|
FrAMOT3 |
310, 3rd Floor |
FrAMOT3 Image and Video Retrieval (310, 3rd Floor) |
Oral Session |
|
10:30-10:50, Paper FrAMOT3.1 | |
Action Recognition with Visual Attention on Skeleton Images |
Yang, Zhengyuan | Univ. of Rochester |
Li, Yuncheng | Snapchat Inc |
Yang, Jianchao | Snapchat Inc |
Luo, Jiebo | -Univ. of Rochester |
Keywords: Video processing and analysis, Affective multimedia processing/analysis
Abstract: Action recognition with 3D skeleton sequences is becoming popular due to its speed and robustness. The recently proposed Convolutional Neural Networks (CNN) based methods have shown good performance in learning spatio-temporal representations for skeleton sequences. Despite the good recognition accuracy achieved by previous CNN based methods, there exist two problems that potentially limit the performance. First, previous skeleton representations are generated by chaining joints with a fixed order. The corresponding semantic meaning is unclear and the structural information among the joints is lost. Second, previous models do not have an ability to focus on informative joints. The attention mechanism is important for skeleton based action recognition because there exist spatio-temporal key stages and the joint predictions can be inaccurate. To solve the two problems, we propose a novel CNN based method for skeleton based action recognition. We first redesign the skeleton representations with a depth-first tree traversal order, which enhances the semantic meaning of skeleton images and better preserves the structural information. We then propose the idea of a two-branch attention architecture that focuses on spatio-temporal key stages and filters out unreliable joint predictions. A base attention model with the simplest structure is first introduced to illustrate the two-branch attention architecture. By improving the structures in both branches, we further propose a Global Long-sequence Attention Network (GLAN). Experiment results on the NTU RGB+D dataset and the SBU Kinetic Interaction dataset show that our proposed approach outperforms the state-of-the-art, as well as the effectiveness of each component.
|
|
10:50-11:10, Paper FrAMOT3.2 | |
Human Activity Recognition Via Discriminative Fusion of Insole Embedded Multi-Pressure Sensors |
Dehzangi, Omid | West Virginia Univ |
Bache, Bhavani | Univ. of Michigan, Dearborn |
Iftikhar, Omar | Univ. of Michigan-Dearborn |
Keywords: Sensor array & multichannel signal processing, Human behavior analysis, Classification
Abstract: Monitoring day to day activities is important for remote health monitoring of patients. Several motion based activity monitoring systems have been developed, where wearable motion sensors, with an accelerometer and gyroscope are used to identify human activities. However, such systems are invasive making it unsuited for continuous monitoring. In this paper, we designed and developed a non-invasive way of monitoring and identifying activities. We acquire data from high density pressure sensors that are embedded at 13 different locations of insoles. The insole data is characterized and analyzed to identify different activities including sitting, standing, walking, running, cycling and jumping. In this paper we developed a unique methodology that investigates the discriminative capability of individual pressure sensors and further conducted fusion of multiple pressure sensors for improved activity monitoring. Based on our experimental results, 91.8% activity recognition accuracy was achieved using the best individual insole pressure sensor. We then investigated multi-sensor fusion to capture an improved class discriminative score space using a Multi Layer Neural Network (MLNN), which improved the activity identification accuracy of the system to the promising value of 97.63%.
|
|
11:10-11:30, Paper FrAMOT3.3 | |
Fine-Grained Video Retrieval Using Query Phrases - Waseda_Meisei TRECVID 2017 AVS System - |
Ueki, Kazuya | Meisei Univ |
Hirakawa, Koji | Waseda Univ |
Kikuchi, Kotaro | Waseda Univ |
Kobayashi, Tetsunori | Waseda Univ |
Keywords: Multimedia analysis, indexing and retrieval, Deep learning for multimedia analysis, Scene understanding
Abstract: In this paper, a joint team from Waseda University and Meisei University (team name: Waseda_Meisei) report their efforts on the ad-hoc video search (AVS) task for the TRECVID benchmark, which is conducted annually by the National Institute of Standards and Technology (NIST). For the AVS task, a system is required to perform a fine-grained search of target videos from a large-scale video database using a query phrase including multiple keywords, such as objects, persons, scenes, and actions. The system we submitted has the following two characteristics. First, to improve the coverage rate of classes corresponding to keywords in query phrases, we prepared a large number of classifiers that can detect objects, persons, scenes, and actions, which were trained using various image and video datasets. Second, when choosing a concept classifier corresponding to a keyword, we introduced a mechanism that allows us to select additional concept classifiers by incorporating natural language processing techniques. We submitted multiple systems with these characteristics to the TRECVID 2017 AVS task and one of our systems ranked the highest among all the submitted systems.
|
|
11:30-11:50, Paper FrAMOT3.4 | |
Discriminate Cross-Modal Quantization for Efficient Retrieval |
Sun, Peng | Beihang Univ |
Yan, Cheng | Beihang Univ |
Wang, Shuai | Beihang Univ |
Bai, Xiao | Beihang Univ |
Keywords: Multimedia analysis, indexing and retrieval, Knowledge and semantic modeling for multimedia
Abstract: Efficient cross-modal retrieval involves searching similar items across different modalities, e.g., using an image(text) to search for texts(images). To speed up cross-modal retrieval, hashing-based methods threshold continuous embeddings into binary codes, inducing substantial loss of accuracy retrieval. To further improve retrieval performance, several quantization-based methods quantize embeddings into real-valued codewords to maximumlly preserve inter-modal and intra-modal similarity relation, while the discrimination between dissimilar data is ignored. To address these challenges, we propose, for the first time, a novel discriminate cross-modal quantization(DCMQ) which nonlinearly maps different modalities into a common space where ir-relevant data points are semantically separable: the points belonging to a class lie in a cluster that is not overlapped with other clusters corresponding to other classes. An effective optimization algorithm is developed for the proposed method to jointly learn the modality-specific mapping functions, the sharing codebooks, the unified binary codes and a linear classifier. Experimental comparison with state-of-the-art algorithms over three benchmark datasets demonstrates that DCMQ achieves significant improvement in search accuracy.
|
|
11:50-12:10, Paper FrAMOT3.5 | |
DeepFirearm: Learning Discriminative Feature Representation for Fine-Grained Firearm Retrieval |
Hao, Jiedong | Inst. of Automation, Chinese Acad. of Sciences |
Dong, Jing | Inst. of Automation, Chinese Acad. of Sciences |
Wang, Wei | Inst. of Automation, Chinese Acad. of Sciences |
Tan, Tieniu | Casia |
Keywords: Multimedia analysis, indexing and retrieval, Deep learning, Applications of computer vision
Abstract: There are great demands for automatically regulating inappropriate appearance of shocking firearm images in social media or identifying firearm types in forensics. Image retrieval techniques have great potential to solve these problems. To facilitate research in this area, we introduce Firearm 14k, a large dataset consisting of over 14,000 images in 167 categories. It can be used for both fine-grained recognition and retrieval of firearm images. Recent advances in image retrieval are mainly driven by fine-tuning state-of-the-art convolutional neural networks for retrieval task. The conventional single margin contrastive loss, known for its simplicity and good performance, has been widely used. We find that it performs poorly on the Firearm 14k dataset due to: (1) Loss contributed by positive and negative image pairs is unbalanced during training process. (2) A huge domain gap exists between this dataset and ImageNet. We propose to deal with the unbalanced loss by employing a double margin contrastive loss. We tackle the domain gap issue with a two-stage training strategy, where we first fine-tune the network for classification, and then fine-tune it for retrieval. Experimental results show that our approach outperforms the conventional single margin approach by a large margin (up to 88.5% relative improvement) and even surpasses the strong triplet-loss-based approach.
|
|
12:10-12:30, Paper FrAMOT3.6 | |
Person Re-Identification Using Two-Stage Convolutional Neural Network |
Zhang, Yonghui | Univ. of Electronic Science and Tech. of China |
Shao, Jie | Univ. of Electronic Science and Tech. of China |
Ouyang, Deqiang | Univ. of Electronic Science and Tech. of China |
Shen, Heng Tao | Univ. of Electronic Science and Tech. of China |
Keywords: Deep learning for multimedia analysis, Video processing and analysis, Video analysis
Abstract: Person re-identification is a fundamental task in automated video surveillance and has been an area of intensive research in the past few years. Several person re-identification methods based on deep learning have been proposed and achieved remarkable performance. However, extraction of more useful spatial and temporal information from input images and design of a more effective approach to match the same persons are still challenging. In this paper, we present a novel Two-Stage Convolution Neural Network (TSCNN), which effectively extracts the spatio-temporal feature with two-stream network in two directions, and matches the person with a novel convolutional neural network. Extensive experiments are conducted on three public benchmarks, i.e., iLIDS-VID, PRID2011 and MARS datasets. The experimental results demonstrate that the performance of our TSCNN is better in comparison with the state-of-the-art methods. The code of TSCNN is available at https://github.com/zyoohv/TSCNN.
|
|
FrPMP |
South Lobby, Outside of Ballroom C, 1st Floor |
Poster Session FrPMP, Coffee Break (South Lobby, Outside of Ballroom C, 1st
Floor) |
Poster Session |
|
14:00-16:00, Paper FrPMP.1 | |
Human Gait Recognition with Micro-Doppler Radar and Deep Autoencoder |
Le, Hoang Thanh | Univ. of Wollongong |
Phung, Son Lam | Univ. of Wollongong |
Bouzerdoum, Abdesselam | Univ. of Wollongong |
Keywords: Gait recognition, Signal analysis, Deep learning for multimedia analysis
Abstract: The micro-Doppler signals from moving objects contain useful information about their motions. This paper introduces a novel approach for human gait recognition based on backscattered signals from a micro-Doppler radar. Three different signal techniques are utilized for the extraction of micro-Doppler features via time-frequency and time-scale representations. To classify the human motions into various types, this paper presents a deep autoencoder with the use of local patches extracted along the spectrogram and scalogram. The network configuration and the learning parameters of the deep autoencoder, which are considered as hyperparameters, are optimized by a Bayesian optimization algorithm. Experimental results produced by the proposed technique on real radar data show a significant improvement compared to several existing approaches.
|
|
14:00-16:00, Paper FrPMP.2 | |
Deep Difference Analysis in Similar-Looking Face Recognition |
Zhong, Yaoyao | Beijing Univ. of Posts and Telecommunications |
Deng, Weihong | Beijing Univ. of Posts and Telecommunications |
Keywords: Face recognition, Pattern recognition for human computer interaction, Deep learning
Abstract: Deep convolutional neural networks (DCNNs) have recently demonstrated impressive performance in face recognition. However there is no clear understanding of what difference they find between two similar-looking faces. In this paper, we propose a visualization method that gives insight into difference of similar-looking faces found by DCNNs. This method, used as an assistant role, could help human to identify people who try to invade the biometric system using a similar-looking face. We design a crowdsourcing task to evaluate our method. With assistance of our method, accuracy of participants is greatly increased by 8%, which is also better than the accuracy of network, while participants get little improvement with assistance of Deconvolutional network or Gradient Back-propagation. The experiment result suggests that our method makes a difference in human-machine cooperation.
|
|
14:00-16:00, Paper FrPMP.3 | |
One-Class Random Maxout Probabilistic Network for Mobile Touchstroke Authentication |
Choi, Seokmin | Yonsei Univ |
Chang, Inho | Yonsei Univ |
Teoh, Andrew | Yonsei Univ |
Keywords: Other biometrics
Abstract: Continuous authentication (CA) with touch stroke dynamics is an emerging problem for mobile identity management. In this paper, we focus on one of the essential problems in CA namely one-class classification problem. We propose a novel analytic probabilistic one-class classifier coined One-Class Random MaxOut Probabilistic Network (OCRMPNet). The OC-RMPNet is a single hidden layer network that is tailored to capture individual users’ touch-stroke profiles. The input-hidden layer of the network is meant to project the input vector onto the high dimensional random maxout feature space and the hidden-output layer acts as an OC probabilistic predictor that trained by means of least-square principle, hence require no iterative learning. We also put forward a feature sequential fusion mechanism for accuracy improvement. We scrutinize and compare the proposed methods with existing works on touchanalytics and HMOG datasets. The empirical results reveal that the OC-RMPNet prevails over its predecessor in touch-stroke authentication tasks on mobile phones.
|
|
14:00-16:00, Paper FrPMP.4 | |
Multimodal Gesture Recognition Using Densely Connected Convolution and BLSTM |
Li, Dexu | Shanghai Univ |
Chen, Yimin | Shanghai Univ |
Gao, Mingke | Shanghai Univ |
Jiang, Siyu | Shanghai Univ |
Huang, Chen | Shanghai Univ |
Keywords: Gesture recognition, Deep learning
Abstract: In this paper, we present a multimodal gesture recognition method based on densely connected convolution and bidirectional long-short-term-memory (BLSTM). The proposed method learns spatial features of gestures through the densely connected convolutional network, and then learns long-term temporal features by BLSTM networks. In addition, fusion methods are evaluated on our model, and we find that fusion of features with different information can significantly improve the recognition accuracy. This purely data driven approach achieves state-of-the-art recognition accuracy on the ChaLearn LAP 2014 dataset and the Sheffield Kinect gesture (SKIG) dataset.(98.80% on the ChaLearn LAP dataset and 99.07% on SKIG)
|
|
14:00-16:00, Paper FrPMP.5 | |
Detecting Disguise Attacks on Multi-Spectral Face Recognition through Spectral Signatures |
Ramachandra, Raghavendra | NTNU |
Vetrekar, Narayan | Goa Univ |
Kiran, Raja | Norwegian Univ. of Science and Tech |
Gad, Rajendra | Goa Univ |
Busch, Christoph | NTNU |
Keywords: Biometric anti-spoofing, Biometric systems and applications, Biometric sensors
Abstract: Presentation attacks on Face Recognition System (FRS) have incrementally posed challenges to create new detection methods. Among the various presentation attacks, disguise attacks allow concealing the identity of the attacker thereby increasing the vulnerability of the FRS. In this paper, we present a new approach for attack detection in multi-spectral systems, where face disguise attacks are carried out. The approach is based on using spectral signatures obtained from a spectral camera operating in eight narrow spectral bands across the Visible (VIS) and Near Infra-Red (NIR) (530nm to 1000nm) spectrum and learning deeply coupled auto-encoders. The robustness of the proposed approach is validated using a newly collected spectral face database of subjects conducting both bona fide (i.e. real) presentations and disguise attack presentations. The database is designed to capture 2 different kinds of attacks from 54 subjects, amounting to a total number of 6480 samples. Extensive experiments carried on the multi-spectral face database indicate the robust performance of proposed scheme when benchmarked with three different state-of-the-art methods.
|
|
14:00-16:00, Paper FrPMP.6 | |
Temporal Hierarchical Dictionary with HMM for Fast Gesture Recognition |
Chen, Haoyu | Univ. of Oulu |
Liu, Xin | Univ. of Oulu |
Zhao, Guoying | Univ. of Oulu |
Keywords: Gesture recognition, Neural networks, Probabilistic graphical model
Abstract: In this paper, we propose a novel temporal hierarchical dictionary with hidden Markov model (HMM) for gesture recognition task. Dictionaries with spatio-temporal elements have been commonly used for gesture recognition. However, the existing spatio-temporal dictionary based methods need the whole pre-segmented gestures for inference, thus are hard to deal with non-stationary sequences. The proposed method combines HMM with Deep Belief Networks (DBN) to tackle both gesture segmentation and recognition by inference at the frame level. Besides, we investigate the redundancy in the dictionary and introduce relative entropy to measure the information richness of a dictionary, which can be used to compress the dictionary with less redundancy. Furthermore, when inferring an element, a temporal hierarchy-flat dictionary will be searched entirely every time in which the temporal structure of gestures isn't utilized sufficiently. The proposed temporal hierarchical dictionary is organized in HMM states and can limit the search range to distinct states. Our framework includes three key novel properties: (1) a temporal hierarchical structure with HMM, which makes both the HMM transition and Viterbi decoding more efficient; (2) a relative entropy model to compress the dictionary with less redundancy; (3) an unsupervised hierarchical clustering algorithm to build a hierarchical dictionary automatically. Our method is evaluated on two gesture datasets and consistently achieves state-of-the-art performance. The results indicate that the dictionary redundancy has a significant impact on the performance which can be tackled by a temporal hierarchy and an entropy model.
|
|
14:00-16:00, Paper FrPMP.7 | |
Facial Expression Recognition by Multi-Scale CNN with Regularized Center Loss |
Li, Zhenghao | Southwest Univ |
Wu, Song | Southwest Univ |
Xiao, Guoqiang | Southwest Univ |
Keywords: Facial expression recognition, Deep learning, Deep learning for multimedia analysis
Abstract: Facial Expression Recognition (FER) has attracted considerable attention due to its potential applications in computer vision. Recently, convolutional neural network (CNN) has shown excellent performance on FER. However, most established deeper, wider and more complex network structures trained by small facial expression training dataset have a risk of overfitting. Moreover, most existing CNN models utilize the softmax loss as a supervision signal to penalize the deviation of classification, which enhances inter-class separation, yet intra-class compactness is not taken into consideration. In this paper, we propose a novel multi-scale CNN integrated with an attention-based learning layer (AMSCNN) for robust facial expression recognition. The attention-based learning layer is designed to automatically learn the importance of different receptive fields in the face during training. Moreover, the multi-scale CNN is optimized by the proposed Regularized Center Loss (RCL). Regularized center loss learns a center for deep features of each class and penalizes the distance between deep features and corresponding center, aiming to strengthen the discriminability of different facial expression. Extensive experiments conducted on two popular human FER benchmarks (CK+ and Oulu-CASIA dataset) demonstrated the effective of our proposed AMSCNN, and it obtained competitive results compared to the state-of-the-art.
|
|
14:00-16:00, Paper FrPMP.8 | |
Cancelable Biometrics Using Noise Embedding |
Lee, Dae-Hyun | Seoul National Univ |
Lee, Sang Hwa | Seoul National Univ |
Cho, Nam Ik | Seoul National Univ |
Keywords: Security and privacy in biometrics, Iris and ocular recognition, Biometric systems and applications
Abstract: This paper presents a cancelable biometric (CB) scheme for iris recognition system. The CB approaches are roughly classified into two categories depending on whether the method stresses more on non-invertibility or on discriminability. The former is to use non-invertibly transformed data for the recognition instead of the original, so that the impostors cannot retrieve the original biometric information from the stolen data. The latter is to use a salting method that mixes random signals generated by user-specific keys so that the imposters cannot retrieve the original data without the key. The proposed CB can be considered a combination of these methods, which applies a non-invertible transform to the salted data for binary biocode input. We use the reduced random permutation and binary salting (RRP-BS) method as the biometric salting, and use the Hadamard product for enhancing the non-invertibility of salted data. Moreover, we generate several templates for an input, and define non-coherent and coherent matching regions among these templates. We show that salting the non-coherent matching regions is less influential on the overall performance. Specifically, embedding the noise in this region does not affect the performance, while making the data difficult to be inverted to the original.
|
|
14:00-16:00, Paper FrPMP.9 | |
Saliency Deteciton Using Iterative Dynamic Guided Filtering |
Wang, Chen | Northwestern Pol. Univ |
Fan, Yangyu | Northwestern Pol. Univ |
Keywords: Pattern recognition for human computer interaction, Low-level vision, Image processing and analysis
Abstract: Saliency detection is a basic and complex technology in computer vision, which can guide computer to extract key information from image by simulating human visual habit. When image characteristics are unevenly distributed, the accuracy of the saliency detection methods will decrease. Unfortunately, this issue is common in natural images and forms a challenge for contrast based methods. We propose an iterative dynamic guided filtering approach to analyze saliency cues. A new and simple kernel function is designed by combining the information of filtering results and input image, which can ensure a good structure transfer from input image to guidance image. The saliency of image pixel is defined based on a novel contrast model using im-age boundary and center regions. At last, we highlight the result by an exponential function. Experimental results show that the proposed method is superior to the others in terms of detection accuracy and recall rate.
|
|
14:00-16:00, Paper FrPMP.10 | |
Face Anti-Spoofing with Multi-Scale Information |
Luo, Shiying | Chinese Acad. of Sciences |
Kan, Meina | Chinese Acad. of Sciences |
Wu, Shuzhe | Chinese Acad. of Sciences |
Chen, Xilin | Inst. of Computing Tech |
Shan, Shiguang | Inst. of Computing Tech. ChineseAcademy of Sciences |
Keywords: Biometric anti-spoofing
Abstract: Face anti-spoofing has encountered increasing demand as one of the key technologies for reliable and safe authentication with faces. Current face anti-spoofing methods generally take a single crop of face region as input for classification, i.e. exploiting information at only one scale. This single-scale scheme mainly focuses on facial characteristics but not utilize the surrounding information, causing poor generalization for different scenarios with varied means of attacks. Besides, it is tedious or highly empirical to determine an optimal scale of face crops. To overcome the limitations of single-scale methods, in this work we propose to integrate Multi-Scale information for better Face ANti-Spoofing (MS-FANS). Specifically, the proposed MS-FANS method takes multiple face crops at different scales as input followed by a convolutional neural network (CNN) for feature extraction. Then the features from different scales form as a sequence, which are fed into a Long Short-Term Memory (LSTM) network for adaptive fusion of multi-scale information, constructing the final representation for classification. Benefited from this multi-scale design, MS-FANS can adaptively utilize context information from multiple scales, leading to promising performance on two challenging face anti-spoofing datasets, Idiap REPLAY-ATTACK and CASIA-FASD, with significant improvement compared with the existing methods.
|
|
14:00-16:00, Paper FrPMP.11 | |
Extended YouTube Faces: A Dataset for Heterogeneous Open-Set Face Identification |
Ferrari, Claudio | Univ. of Florence |
Berretti, Stefano | Univ. of Florence |
Del Bimbo, Alberto | Univ. of Florence |
Keywords: Face recognition, Deep learning, Performance evaluation
Abstract: In this paper, we propose an extension of the famous YouTube Faces (YTF) dataset. In the YTF dataset, the goal was to state whether two videos contained the same subject or not (video-based face verification). We enrich YTF with still images and an identification protocol. In the classic face identification, given a probe image (or video), the correct identity has to be retrieved among the gallery ones; the main peculiarity of such protocol is that each probe identity has a correspondent in the gallery (closed-set). To resemble a realistic and practical scenario, we devised a protocol in which probe identities are not guaranteed to be in the gallery (open-set). Compared to a closed-set identification, the latter is definitely more challenging in as much as the system needs firstly to reject impostors (i.e., probe identities missing from the gallery), and subsequently, if the probe is accepted as genuine, retrieve the correct identity. In our case, the probe set is composed of full-length videos from the original dataset, while the gallery is composed of templates, i.e., sets of still images. To collect the images, an automatic application was developed. The main motivations behind this work can be found in both the lack of open-set identification protocols defined in the literature and the undeniable complexity of such. We also argued that extending an existing and widely used dataset could make its distribution easier and that data heterogeneity would make the problem even more challenging and realistic. We named the dataset Extended YTF (E-YTF). Finally, we report baseline recognition results using two well known DCNN architectures.
|
|
14:00-16:00, Paper FrPMP.12 | |
Confidence-Driven Network for Point-To-Set Matching |
Leng, Mengjun | Univ. of Houston |
Kakadiaris, Ioannis | Univ. of Houston |
Keywords: Face recognition, Deep learning, Segmentation, features and descriptors
Abstract: The goal of point-to-set matching is to match a single image with a set of images from a subject. Within an image set, different images contain various levels of discriminative information and thus should contribute differently to the results. However, the discriminative level is not accessible directly. To this end, we propose a confidence driven network to perform point-to-set matching. The proposed system comprises a feature extraction network (FEN) and a performance prediction network (PPN). Given an input image, the FEN generates a template, while the PPN generates a confidence score which measures the discriminative level of the template. At the matching time, the template is used to compute a point-to-point similarity. The similarity scores from different samples in the set are integrated at a score level, weighted by the predicted confidence scores. Extensive multi-probe face recognition experiments on the IJB-A and UHDB-31 datasets demonstrate performance improvements over state of the art algorithms.
|
|
14:00-16:00, Paper FrPMP.13 | |
Video Gesture Analysis for Autism Spectrum Disorder Detection |
Zunino, Andrea | IIT |
Morerio, Pietro | Istituto Italiano Di Tecnologia |
Cavallo, Andrea | Univ. of Torino |
Ansuini, Caterina | Italian Inst. of Tech |
Podda, Jessica | Italian Inst. of Tech |
Battaglia, Francesca | Istituto G. Gaslini |
Veneselli, Edvige | Istituto G. Gaslini |
Becchio, Cristina | Italian Inst. of Tech |
Murino, Vittorio | Istituto Italiano Di Tecnologia |
Keywords: Gesture recognition, Applications of computer vision, Video analysis
Abstract: Autism is a behavioral neurological disorder affecting a significant percentage of worldwide population. It especially starts manifesting at very low ages, but it is difficult to early diagnose it since there is not a specific exam or trial that is able to spot it safely. Its detection is in fact mainly dependent from the medical expertise used to assess the patient behavior during direct interviews. This work aims at providing an automatic objective support to the doctor for the assessment of (early) diagnosis of possible autistic subjects by only using video sequences. The underlying idea and rationale come from the psychological and neuroscience studies claiming that the executions of simple motor acts are different between pathological and healthy subjects, and this can be sufficient to discriminate between them. To this end, we devised an experiment in which we recorded, using a standard video camera, patient and healthy children performing the same simple gesture of grasping a bottle. By only processing the video clips depicting the grasping action using a recurrent deep neural network, we are able to discriminate with a good accuracy between the 2 classes of subjects. The designed deep model is also able to provide a sort of attention map in which the zones in the video of major interest are identified in space and time: this ``explains" in a certain way which areas the model deems more relevant to the classification purpose, which could also be used by the doctor to make the diagnosis. In the end, this work constitutes a first step towards the development of an automatic computational system devoted to the early diagnosis of autistic subjects, providing the medical expert of a supportive objective method, potentially simple to use in clinical and also more open settings.
|
|
14:00-16:00, Paper FrPMP.14 | |
Action Recognition from 3D Skeleton Sequences Using Deep Networks on Lie Group Features |
Rhif, Manel | Univ. of Lille - Univ. Lille 1 Univ. of Mannouba |
Wannous, Hazem | Univ. of Lille |
Farah, Imed Riadh | Univ. of Manouba |
Keywords: Gesture recognition, Behavior recognition, Deep learning
Abstract: This paper addresses the problem of human action recognition from sequences of 3D skeleton data. For this purpose, we combine a deep learning network with geometric features extracted from data lie on a non-Euclidean space, which have been recently shown to be very effective to capture the geometric structure of the human pose. In particular, our approach claims to incorporate the intrinsic nature of the data characterized by Lie Group into deep neural networks and to learn more adequate geometric features for 3D action recognition problem. First, geometric features are extracted from 3D joints of skeleton sequences using the Lie group representation. Then, the network model is built from stacked units of 1-dimensional CNN across the temporal domain. Finally, CNN-features are then used to train an LSTM layer to model dependencies in the temporal domain, and to perform the action recognition. The experimental evaluation is performed on three public datasets containing various challenges: UT-Kinect, Florence 3D-Action and MSR-Action 3D. Results reveal that our approach achieves most of the state-of-the-art performance.
|
|
14:00-16:00, Paper FrPMP.15 | |
Dynamic Facial Expression Recognition Based on Convolutional Neural Networks with Dense Connections |
Dong, Jiayu | Sun Yat-Sen Univ |
Zheng, Huicheng | Sun Yat-Sen Univ |
Lian, Lina | Sun Yat-Sen Univ |
Keywords: Facial expression recognition, Emotion recognition, Image classification
Abstract: Facial expression recognition (FER) is a challenging problem with important applications. Applying deep learning techniques for dynamic FER is advantageous in terms of automatically generating discriminative expression features. However, inadequate training data in small expression databases aggravates overfitting and hinders the performance of deep networks. Recent studies have shown that dense connectivity in convolutional neural networks can encourage feature sharing and alleviate overfitting when training with small datasets. Still, traditional dense structures are too deep for FER with insufficient training samples. In this paper, we propose a relatively shallow CNN structure with densely connected short paths for FER. Instead of using transition layers to down-sample feature maps between dense blocks, we introduce dense connectivity across pooling to enforce feature sharing in the shallow CNN structure. Extensive experiments show that our method achieves competitive performance on benchmark datasets CK+ and Oulu-CASIA.
|
|
14:00-16:00, Paper FrPMP.16 | |
Person Recognition at a Distance: Improving Face Recognition through Body Static Information |
Gonzalez-Sosa, Ester | Univ. Autónoma De Madrid |
Vera-Rodriguez, Ruben | Univ. Autonoma De Madrid |
Hernandez-Ortega, Javier | Univ. Autonoma De Madrid |
Fierrez, Julian | Univ. Autonoma De Madrid |
Keywords: Soft biometrics, Visual surveillance, Multi-biometrics
Abstract: In this paper we evaluate body static information to improve the performance of face recognition at a distance. To this aim, we assess one state-of-the-art face recognition system based on deep features and three body-based person recognition systems, namely: i) row profiles with correlation coefficient, ii) row and column profiles with Support Vector Machines, and iii) contour coordinates with Dynamic Time Warping. Results are reported using the Multi-Biometric Tunnel Database, emphasizing on three distance settings: far, medium, and close, ranging from full body exposure to head and shoulders exposure. Several conclusions can be drawn from this work: a) row and column profiles are more robust than contour coordinates, b) face-based systems perform poorly at far distances, being body-based information more reliable at that distances, c) in general face-based systems perform better than body-based approaches at medium and close distances, and d) the multimodal fusion approach manages to outperform face-only recognition at distance in all distance-settings considered.
|
|
14:00-16:00, Paper FrPMP.17 | |
Deep Learning-Based Face Recognition and the Robustness to Perspective Distortion |
Damer, Naser | Fraunhofer Inst. for Computer Graphics Res. IGD |
Wainakh, Yaza | Fraunhofer IGD |
Henniger, Olaf | Fraunhofer IGD |
Croll, Christian | KIS PhotoMe Group |
Berthe, Benoit | IDEMIA |
Braun, Andreas | Fraunhofer IGD |
Kuijper, Arjan | Fraunhofer IGD |
Keywords: Face recognition, Biometric systems and applications, Visual surveillance
Abstract: Face recognition technology is spreading into a wide range of applications. This is mainly driven by social acceptance and the performance boost achieved by the deep learning-based solutions in the recent years. Perspective distortion is an understudied distortion in face recognition that causes converging verticals when imaging 3D objects depending on the distance to the object. The effect of this distortion on face recognition was previously studied for algorithms based on hand-crafted features with a clear negative effect on verification performance. Possible solutions were proposed by compensating the distortion effect on the face image level, which requires knowing the camera settings and capturing a high quality image. This work investigates the effect of perspective distortion on the performance of a deep learning-based face recognition solution. It also provides a device parameter-independent solution to decrease this effect by creating more perspective-robust face representations. This was achieved by training the deep learning model on perspective-diverse data, without increasing the size of the training data. Experiments performed on the deep model in hand and a specifically collected database concluded that the perspective distortion effects face verification performance if not considered in the training process, and that this can be improved by our proposal of creating robust face representations by properly selecting the training data.
|
|
14:00-16:00, Paper FrPMP.18 | |
CNN+RNN Depth and Skeleton Based Dynamic Hand Gesture Recognition |
Lai, Kenneth | Univ. of Calgary |
Yanushkevich, Svetlana | Univ. of Calgary |
Keywords: Gesture recognition, Deep learning, Neural networks
Abstract: Human activity and gesture recognition is an important component of rapidly growing domain of ambient intelligence, in particular in assisting living and smart homes. In this paper, we propose to combine the power of two deep learning techniques, the convolutional neural networks (CNN) and the recurrent neural networks (RNN), for automated hand gesture recognition using both depth and skeleton data. Each of these types of data can be used separately to train neural networks to recognize hand gestures. While RNN were reported previously to perform well in recognition of sequences of movement for each skeleton joint given the skeleton information only, this study aims at utilizing depth data and apply CNN to extract important spatial information from the depth images. Together, the tandem CNN+RNN is capable of recognizing a sequence of gestures more accurately. As well, various types of fusion are studied to combine both the skeleton and depth information in order to extract temporal-spatial information. An overall accuracy of 85.46% is achieved on the dynamic hand gesture-14/28 dataset.
|
|
14:00-16:00, Paper FrPMP.19 | |
A Joint Density Based Rank-Score Fusion for Soft Biometric Recognition at a Distance |
Guo, Bingchen | Univ. of Southampton |
Nixon, Mark | Univ. of Southampton |
Carter, John | Univ. of Southampton |
Keywords: Soft biometrics, Multi-biometrics, Biometric systems and applications
Abstract: In order to improve recognition performance, fusion has become a key technique in the recent years. Compared with single-mode biometrics, the recognition rate of multi-modal biometric systems is improved and the final decision is more confident. This paper introduces a novel joint density distribution based rank-score fusion strategy that combines rank and score information. Recognition at a distance has only recently been of interest in soft biometrics. We create a new soft biometric database containing the human face, body and clothing attributes at three different distances to investigate the influence by distance on soft biometric fusion. A comparative study about our method and other state of the art rank level and score level fusion methods are also conducted in this paper. The experiments are performed using a soft biometric database we created. The results demonstrate the recognition performance is significantly improved by our proposed method.
|
|
14:00-16:00, Paper FrPMP.20 | |
Optimizing Energies for Pose-Invariant Face Recognition |
Hanselmann, Harald | RWTH Aachen |
Ney, Hermann | RWTH Aachen Univ |
Keywords: Face recognition, Image classification, Graph matching
Abstract: One of the most difficult challenges in face recognition is the large variation in pose. One approach to handle this problem is to use a 2D-Warping algorithm in a nearest-neighbor classifier. The 2D-Warping algorithm optimizes an energy function that captures the cost of matching pixels between two images while respecting the 2D dependencies defined by local pixel neighborhoods. Optimizing this energy function is an NP-complete problem and is therefore approached with algorithms that aim to approximate the optimal solution. In this paper we compare two algorithms that do this without discarding any 2D dependencies and we study the effect of the quality of the approximate solutions on the classification performance. Additionally, we propose a new algorithm that is capable of finding better solutions and obtaining better energies than the other methods. The experimental evaluation on the CMU-MultiPIE database shows that the proposed algorithm also achieves state-of-the-art recognition accuracies.
|
|
14:00-16:00, Paper FrPMP.21 | |
Multi-Level Feature Abstraction from Convolutional Neural Networks for Multimodal Biometric Identification |
Soleymani, Sobhan | West Virginia Univ |
Dabouei, Ali | West Virginia Univ |
Kazemi, Hadi | West Virginia Univ |
Dawson, Jeremy | West Virginia Univ |
Nasrabadi, Nasser | LCSEE/WVU |
Keywords: Multi-biometrics, Neural networks, Deep learning
Abstract: In this paper, we propose a deep multimodal fusion network to fuse multiple modalities (face, iris, and fingerprint) for person identification. The proposed deep multimodal fusion algorithm consists of multiple streams of modality-specific Convolutional Neural Networks (CNNs), which are jointly optimized at multiple feature abstraction levels. Multiple features are extracted at several different convolutional layers from each modality-specific CNN for joint feature fusion, optimization, and classification. Features extracted at different convolutional layers of a modality-specific CNN represent the input at several different levels of abstract representations. We demonstrate that an efficient multimodal classification can be accomplished with a significant reduction in the number of network parameters by exploiting these multi-level abstract representations extracted from all the modality-specific CNNs. We demonstrate an increase in multimodal person identification performance by utilizing the proposed multi-level feature abstract representations in our multimodal fusion, rather than using only the features from the last layer of each modality-specific CNNs. We show that our deep multi-modal CNNs with multimodal fusion at several different feature level abstraction can significantly outperform the unimodal representation accuracy. We also demonstrate that the joint optimization of all the modality-specific CNNs excels the score and decision level fusions of independently optimized CNNs.
|
|
14:00-16:00, Paper FrPMP.22 | |
Person Re-Identification Based on Feature Fusion and Triplet Loss Function |
Xiang, Jun | South-Central Univ. for Nationalities |
Lin, Ranran | South-Central Univ. for Nationalities |
Hou, Jianhua | South-Central Univ. for Nationalities |
Huang, Wenjun | Wuhan Univ |
Keywords: Visual surveillance, Face recognition, Applications of pattern recognition and machine learning
Abstract: The task of Person re-identification (re-ID) is to recognize an individual observed by non-overlapping cameras. Robust feature representation is a crucial problem in re-ID. With the rise of deep learning, most current approaches adopt convolutional neural networks (CNN) to extract features. However, the feature representation learned by CNN is often global and lacks detailed local information. To address this issue, this paper proposes a simple CNN architecture consisting of a re-ID subnetwork and an attribute sub-network. In re-ID sub-network, global feature and semantic feature are extracted and fused in a weighted manner, and triplet loss is adopted to further improve the discriminative ability of the learned fusion feature. On the other hand, attribute sub-network focuses on local aspects of a person and offers local structural information that is helpful for re-ID. The two sub-networks are combined on the loss level and their complementary aspects are leveraged to improve the re-ID accuracy. Comparative evaluations demonstrate that our method outperforms several state-of-the-art ones. On the challenging Market1501 and DukeMTMC datasets, 86.3% rank-1 accuracy and 69.4% mAP, and 72.1% rank-1 accuracy and 53.4% mAP are achieved respectively.
|
|
14:00-16:00, Paper FrPMP.23 | |
Icushion: A Pressure Map Algorithm for High Accuracy Human Identification |
Ai, Haojun | Wuhan Univ |
Zhang, Liezhuo | Wuhan Univ |
Yuan, Zhiyu | Wuhan Univ |
Huang, Haitao | Wuhan Univ |
Keywords: Biometric systems and applications, Sensor array & multichannel signal processing
Abstract: Intelligent Cushion (iCushion) technology is recently booming with embedded pressure array sensors to enable individual-specific sitting experiences. iCushion has the build-in functionality to identify users throughout its use in a continuous and non-intrusive manner. Due to the variability in sitting posture and the angle of seated deflection, the accuracy of user identification remains unstable or unclear with existing solutions. Aiming at this problem, this study develops a two-stage pressure map algorithm based on robust spatial-temporal features. First, pressure maps are collected constantly without limiting the user’s posture, based on which an accumulated identity library is established for sitting postures by extracting features from pressure maps. To be specifically, we create a decision tree to classify maps by distances between two ischia and then variances in two areas around ischia in maps are analyzed. Second, the similarity between two maps are measured by the Euclidean distance between feature vectors around ischia for matching maps data. A k-NN voting mechanism is developed to achieve reliability of identification. The resulted iCushion prototype has successfully identified 92.2% of maps with three randomly chosen individuals through four-hour non-stop testing. It holds potentials of non-intrusive and reliable activity recognition in other pervasive applications.
|
|
14:00-16:00, Paper FrPMP.24 | |
FV-Net: Learning a Finger-Vein Feature Representation Based on a CNN |
Hu, Hui | South China Univ. of Tech |
Kang, Wenxiong | South China Univ. of Tech |
Lu, Yuting | South China Univ. of Tech |
Fang, Yuxun | South China Univ. of Tech |
Liu, Hongda | South China Univ. of Tech |
Zhao, Junhong | South China Univ. of Tech |
Deng, Feiqi | South China Univ. of Tech |
Keywords: Other biometrics, Biometric systems and applications
Abstract: In this paper, we propose a deep convolutional neural network (CNN) model to represent the features of finger vein that is more discriminative and robust than handcrafted features. Firstly, to solve the issue of insufficient samples in the application of deep learning for vein pattern recognition, we focus our efforts on the following aspects: 1) propose a database expansion strategy by combining heterogeneous databases merging and certain data augmentation strategy to enlarge the training set, which we call the vein-like dataset since it is composed of vein and palmprint images, 2) transplant a pre-trained model as the bottom structure of our proposed FV-net to further reduce the demand for training samples. Secondly, to address the issue of translation and rotation in vein imaging, we propose a template-like matching strategy while design the top architecture of the FV-net to extract features with spatial information. Finally, the extensive experimental results show that our proposed method can achieve excellent performance on several public datasets.
|
|
14:00-16:00, Paper FrPMP.25 | |
Incorporating High-Level and Low-Level Cues for Pain Intensity Estimation |
Yang, Ruijing | Northwest Univ |
Hong, Xiaopeng | Univ. of Oulu |
Peng, Jinye | Northwest Univ |
Feng, Xiaoyi | Northwestern Pol. Univ |
Zhao, Guoying | Univ. of Oulu |
Keywords: Facial expression recognition
Abstract: Pain is a transient physical reaction that exhibits on human faces. Automatic pain intensity estimation is of great importance in clinical and health-care applications. Pain expression is identified by a set of deformations of facial features. Hence, features are essential for pain estimation. In this paper, we propose a novel method that encodes low-level descriptors and powerful high-level deep features by a weighting process, to form an efficient representation of facial images. To obtain a powerful and compact low-level representation, we explore the way of using second-order pooling over the local descriptors. Instead of direct concatenation, we develop an efficient fusion approach that unites the low-level local descriptors and the high-level deep features. To the best of our knowledge, this is the first approach that incorporates the low-level local statistics together with the high-level deep features in pain intensity estimation. Experiments are evaluated on the benchmark databases of pain. The results demonstrate that the proposed low-to-high-level representation outperforms other methods and achieves promising results.
|
|
14:00-16:00, Paper FrPMP.26 | |
Advancing Surface Feature Description and Matching for More Accurate Biometric Recognition |
Cheng, Kevin Ho Man | The Hong Kong Pol. Univ |
Kumar, Ajay | The Hong Kong Pol. Univ |
Keywords: Other biometrics, Biometric systems and applications, Soft biometrics
Abstract: Accurate and efficient feature descriptors are crucial for the success of many pattern recognition tasks including human identification. Existing studies have shown that features extracted from 3D depth images are more reliable than those from 2D intensity images because intensity images are generally noisy and sensitive to illumination variation, which is challenging for many real-world applications like biometrics. Recently introduced 3D feature descriptors like Binary Shape and Surface Code have been shown improved effectiveness for 3D palm recognition. However, both methods lack theoretical support for the construction of the feature templates, which limits their matching accuracy and efficiency. In this paper, we further advance the Surface Code method and introduce the Efficient Surface Code, which describes whether a point tends to be concave or convex using only one bit per pixel. Our investigation also reveals that the discriminative abilities of the convex and concave regions are not necessarily equal. For example, line patterns on human palms and finger knuckles are expected to reveal more discriminative information than non-line regions. Therefore, we also propose a weighted similarity method in conjunction with the Efficient Surface Code instead of the traditional Hamming distance adopted in both Binary Shape and Surface Code. Comparative experimental results on both 3D palmprint and 3D finger knuckle databases illustrate superior performance to the aforementioned state-of-the-art methods, which validates our theoretical arguments.
|
|
14:00-16:00, Paper FrPMP.27 | |
FaceLiveNet: End-To-End Networks Combining Face Verification with Interactive Facial Expression-Based Liveness Detection |
Ming, Zuheng | Univ. of La Rochelle |
Joseph, Chazalon | Univ. of La Rochelle |
Luqman, Muhammad Muzzamil | L3i Lab. Univ. of La Rochelle, France |
Visani, Muriel | Univ. of La Rochelle |
Burie, Jean-Christophe | Univ. of La Rochelle |
Keywords: Facial expression recognition, Biometric anti-spoofing, Face recognition
Abstract: The effectiveness of the state-of-the-art face verification/ recognition algorithms and the convenience of face recognition greatly boost the face-related biometric authentication applications. However, existing face verification architectures seldom integrate any liveness detection or keep such stage isolated from face verification as if it was irrelevant. This may potentially result in the system being exposed to spoof attacks between the two stages. This work introduces FaceLiveNet, a holistic endto- end deep networks which can perform face verification and liveness detection simultaneously. An interactive scheme for facial expression recognition is proposed to perform liveness detection, providing better generalization capacity and higher security level. The proposed framework is low-cost as it relies on commodity hardware instead of costly sensors, and lightweight with much fewer parameters comparing to the other popular deep networks such as VGG16 and FaceNet. Experimental results on the benchmarks LFW, YTF, CK+, OuluCASIA, SFEW, FER2013 demonstrate that the proposed FaceLiveNet can achieve state-ofart performance or better for both face verification and facial expression recognition. We also introduce a new protocol to evaluate the global performance for face authentication with the fusion of face verification and interactive facial expression-based liveness detection.
|
|
14:00-16:00, Paper FrPMP.28 | |
Effect of Artefact Removal Techniques on EEG Signals for Video Category Classification |
Mutasim, Aunnoy K | Independent Univ. Bangladesh |
Bashar, M. Raihanul | Independent Univ. Bangladesh |
Tipu, Rayhan Sardar | Independent Univ. Bangladesh |
Islam, Md. Kafiul | Independent Univ. Bangladesh |
Amin, M Ashraful | Independent Univ. Bangladesh |
Keywords: Pattern recognition for human computer interaction, Classification, Brain-computer interface
Abstract: Pre-processing, Feature Extraction, Feature Selection and Classification are the four submodules of the Signal Processing module of a typical BCI system. Pattern recognition is mainly involved in this Signal Processing module and in this paper, we experimented with different state-of-the-art algorithms for each of these submodules on two separate datasets we acquired using Emotiv EPOC and the Muse headband from 38 college-aged young adults. For our experiment, we used two artefact removal techniques, namely Stationary Wavelet Transform (SWT) based denoising technique and an extended SWT technique (SWTSD). We found SWTSD improves average classification accuracy up to 7.2% and performs better than SWT. However, that does not state that SWTSD will outperform SWT when implemented on other BCI paradigms or on other EEG-based applications. In our study, the highest average accuracy achieved by the data of the Muse headband and Emotiv EPOC were 77.7% and 66.7% respectively and from our results we conclude that, the performance of different BCI systems depends on several different factors including artefact removal techniques, filters, feature extraction and selection algorithms, classifiers, etc. and appropriate choice and usage of such methods can have a significant positive impact on the end results.
|
|
14:00-16:00, Paper FrPMP.29 | |
High-Quality Facial Keypoints Matching with Motion Smoothness Constraint and 3D Model Constraint |
Zeng, Xianxian | Guangdong Univ. of Tech |
Wang, Xiaodong | GuangDong Univ. of Tech |
Chen, Kairui | Guangdong Univ. of Tech |
Ye, Peichu | Guangdong Univ. of Tech |
Hu, Xiaorui | Guangdong Univ. of Tech |
Li, Dong | Guangdong Univ. of Tech |
Zhang, Yun | Guangdong Univ. of Tech |
Keywords: Other biometrics, Face recognition
Abstract: Pore-scale facial features, similar to fingerprints and irises, are effective to distinguish human identities. Nonetheless, there is a few of pore-scale facial feature database, which constrains deep learning methods to be employed of generating pore-scale facial features. In this paper, we propose a novel method by merging motion smoothness constraint and 3D model constraint, to generate a large and complex pore-scale facial feature database. The proposed method uses a powerful motion smoothness constraint in feature matching, rather than the standard ratio-test, and utilizes 3D model constraint to eliminate the incorrect matching. In the experiment, by using the proposed method, the matched numbers into high matched quality are two times higher than state-of-the-art with the same local features.
|
|
14:00-16:00, Paper FrPMP.30 | |
Air Signature Recognition Using Deep Convolutional Neural Network-Based Sequential Model |
Behera, Santosh Kumar | Indian Inst. of Tech. Bhubaneswar |
Dash, Ajaya Kumar | IIT Bhubaneswar |
Dogra, Debi Prosad | IIT Bhubaneswar |
Roy, Partha Pratim | IIT |
Keywords: Biometric systems and applications, Signature verification and writer identification, Pattern recognition for human computer interaction
Abstract: Deep convolutional neural networks are becoming extremely popular in classification, especially when the inputs are non-sequential in nature. Though it seems unrealistic to adopt such networks as sequential classifiers, however, researchers have started to use them for applications that primarily deal with sequential data. It is possible, if the sequential data can be represented in the conventional way the inputs are provided in CNNs. Signature recognition is one of the important tasks for biometric applications. Signatures represent the signers' identity. Air signatures can make traditional biometric systems more secure and robust than conventional pen-paper or stylus guided interfaces. In this paper, we propose a new set of geometrical features to represent 3D air signatures captured using Leap motion sensor. The features are then arranged such that they can be fed to a deep convolutional neural network architecture with application specific tuning of the model parameters. It has been observed that the proposed features in combination with the CNN architecture can act as a good sequential classifier when tested on a moderate size air signature dataset. Experimental results reveal that the proposed biometric system performs better as compared to the state-of-the-art geometrical features with average accuracy improvement of 4%.
|
|
14:00-16:00, Paper FrPMP.31 | |
Uniface: A Unified Network for Face Detection and Recognition |
Liao, Zhouyingcheng | Shanghai Jiao Tong Univ |
Zhou, Peng | Shanghai Jiao Tong Univ |
Wu, Qinlong | China Mobile Suzhou Software Tech. Co., Ltd |
Ni, Bingbing | Shanghai Jiao Tong Univ |
Keywords: Face recognition, Object detection, Image classification
Abstract: Typically, cropped and aligned face images are required as the input of a face recognition model. In contrast, popular object detectors based on deep convolutional network usually locate and classify objects simultaneously, which eliminates redundant computation. This work presents a single-network model called Uniface network for simultaneous face detection, landmark localization and recognition. We develop a feature sharing infrastructure for seamlessly integrate both the detection/localization module and the recognition module. To facilitate large-scale end-to-end training, we propose a method by encouraging top-level features of our model to mimic those of a well-trained single-task face recognition model. Comprehensive experiments on face detection, landmark localization and verification tasks demonstrate that the proposed network achieves competing performance in both face recognition benchmark (99.0% on LFW for a single model) and face detection benchmark (86.4% against 2000 false positives on FDDB for a single model).
|
|
14:00-16:00, Paper FrPMP.32 | |
Adaptive Convolution Local and Global Learning for Class-Level Joint Representation of Face Recognition with Single Sample Per Person |
Wen, Wei | Shenzhen Univ |
Wang, Xing | Shenzhen Univ |
Shen, Linlin | Shenzhen Univ |
Yang, Meng | Sun Yat-Sen Univ |
Keywords: Face recognition, Sparse learning, Neural networks
Abstract: Due to the absence of samples with intra-class variation, extracting discriminative facial features and building powerful classifiers are the bottlenecks of improving the performance of face recognition (FR) with single sample per person (SSPP). In this paper, we propose to learn regional adaptive convolution features which are locally and globally discriminative to face identity and robust to face variation. With collected generic facial variations, a novel class-level joint representation framework is presented to exploit the distinctiveness and class-level commonality of different facial features. In the proposed class-level joint representation with regional adaptive convolution feature (CJR-RACF), both discriminative facial features robust to various facial variations and powerful representation for classification with generic facial variations that can overcome the small-sample-size problem are fully exploited. CJR-RACF has been evaluated on several popular databases, including large-scale CMU Multi-PIE and LFW databases. Experimental results demonstrate the much higher robustness and effectiveness of CJR-RACF to complex facial variations compared to the state-of-the-art methods.
|
|
14:00-16:00, Paper FrPMP.33 | |
Meta Transfer Learning for Facial Emotion Recognition |
Nguyen, Dung | Queensland Univ. of Tech |
Nguyen, Kien | Queensland Univ. of Tech |
Sridha, Sridharan | Queensland Univ. of Tech |
Abbasnejad, Iman | Queensland Univ. of Tech |
Dean, David Brendan | Queensland Univ. of Tech |
Fookes, Clinton | Queensland Univ. of Tech |
Keywords: Facial expression recognition, Transfer learning, Deep learning
Abstract: The use of deep learning techniques for automatic facial expression recognition has recently attracted great interest but developed models are still unable to generalize well due to the lack of large emotion datasets for deep learning. To overcome this problem, in this paper, we propose utilizing a novel transfer learning approach relying on PathNet and investigate how knowledge can be accumulated in a given dataset and how the knowledge captured from one emotion dataset can be transferred into another in order to improve the overall performance. To evaluate the robustness of our system, we have conducted various sets of experiments on two emotion datasets: SAVEE and eNTERFACE. The experimental results demonstrate that our proposed system leads to improvement in performance of emotion recognition and performs significantly better than the recent state-of-the art schemes adopting fine-tuning/pre-trained approach.
|
|
14:00-16:00, Paper FrPMP.34 | |
Speaker Clustering Using Dominant Sets |
Hibraj, Feliks | Ca' Foscari Univ |
Vascon, Sebastiano | Univ. Ca' Foscari of Venice |
Stadelmann, Thilo | Zurich Univ. of Applied Sciences |
Pelillo, Marcello | Ca' Foscari Univ |
Keywords: Speaker recognition, Clustering
Abstract: Speaker Clustering is the task of forming speaker- specific groups based on a set of utterances. In this paper, we address this task by using Dominant Sets (DS). DS is a graph- based clustering algorithm with interesting properties that fits well to our problem and has never been applied before to speaker clustering. We report on a comprehensive set of experiments on the TIMIT dataset against standard clustering techniques and specific speaker clustering methods. Moreover, we compare performances under different features by using ones learned via deep neural network directly on TIMIT and other ones extracted from a pre-trained VGGVox net. To asses the stability, we perform a sensitivity analysis on the free parameters of our method, showing that performance is stable under parameter changes. The extensive experimentation carried out confirms the validity of the proposed method, reporting state-of-the-art results under three different standard metrics. We also report reference baseline results for speaker clustering on the entire TIMIT dataset for the first time.
|
|
14:00-16:00, Paper FrPMP.35 | |
Show Me Your Face and I Will Tell You Your Height, Weight and BMI |
Dantcheva, Antitza | INRIA Méditerranée |
Bremond, Francois | INRIA (Inst. National De Recherche En Informatique Etautomat |
Bilinski, Piotr | Univ. of Oxford |
Keywords: Soft biometrics
Abstract: Body height, weight, as well as the associated and composite body mass index (BMI) are human attributes of pertinence due to their use in a number of applications including surveillance, re-identification, image retrieval systems, as well as healthcare. Previous work on automated estimation of height, weight and BMI has predominantly focused on 2D and 3D full-textit{body} images and videos. Little attention has been given to the use of face for estimating such traits. Motivated by the above, we here explore the possibility of estimating height, weight and BMI from single-shot facial images by proposing a regression method based on the 50-layers ResNet-architecture. In addition, we present a novel dataset consisting of 1026 subjects and show results, which suggest that facial images contain discriminatory information pertaining to height, weight and BMI, comparable to that of body-images and videos. Finally, we perform a gender-based analysis of the prediction of height, weight and BMI.
|
|
14:00-16:00, Paper FrPMP.36 | |
Generation Textured Contact Lenses Iris Images Based on 4DCycle-GAN |
Zou, Hang | Minzu Univ. of China |
Zhang, Hui | Beijing IrisKing Co., Ltd |
Li, Xingguang | Beijing IrisKing Co., Ltd |
Liu, Jing | Beijing IrisKing Co., Ltd |
He, Zhaofeng | Beijing IrisKing Co., Ltd |
Keywords: Biometric anti-spoofing, Deep learning, Classification
Abstract: With the development of iris recognition, many identity authentication applications began to use this inherent biometric ID. Despite the breakthroughs in the identification with iris recognition technology, one primary problem remains unsolved: the presentation spoof attack. In this paper, we present a novel algorithm 4DCycle-GAN for expanding the spoof iris image database by synthesizing fake iris images wearing textured contact lenses. The proposed 4DCycle-GAN follows the Cycle-Consistent Adversarial Networks (Cycle-GAN) framework which translating between one kind images (genuine iris images) and one other kind images (textured contact lenses iris images). The 4DCycle-GAN introduces two more discriminators to improve the Cycle-GAN at the defect of lack of diversity. The two new discriminators 'prefer' images generated by the generators, while the original discriminators in Cycle-GAN 'prefer' real captured images. These new added confrontations make the 4DCycle-GAN avoid generating a certain kind of contact lenses texture which is larger percentage of the training iris database. The synthesized textured contact lenses iris images are used for spoofing iris detection training to improve the robustness of classification algorithm. Both the Cycle-GAN and the 4DCycle-GAN synthesizing images can improve the spoof classification results. Moreover, by using the 4DCycle-GAN, the spoof classification results are distinctly improved for unrelated non-homologous database experiments. Extensive experimental results show that the proposed method can improve the anti-spoof ability of iris recognition system.
|
|
14:00-16:00, Paper FrPMP.37 | |
MSU-AVIS Dataset: Fusing Face and Voice Modalities for Biometric Recognition in Indoor Surveillance Videos |
Chowdhury, Anurag | Michigan State Univ |
Atoum, Yousef | Univ |
Tran, Luan | Michigan State Univ |
Liu, Xiaoming | Michigan State Univ |
Ross, Arun | Michigan State Univ |
Keywords: Multi-biometrics, Face recognition, Speaker recognition
Abstract: Indoor video surveillance systems often use the face modality to establish the identity of a person of interest. However, the face image may not offer sufficient discriminatory information in many scenarios due to substantial variations in pose, illumination, expression, resolution and distance between the subject and the camera. In such cases, the inclusion of an additional biometric modality can benefit the recognition process. In this regard, we consider the fusion of voice and face modalities for enhancing recognition accuracy. The contribution of this work is three-fold. First, we construct a multimodal (face and voice), semi-constrained, indoor video surveillance dataset referred to as the Audio-Video Indoor Surveillance (AVIS) dataset. We use a consumer-grade camera with built-in microphone to acquire data for this purpose. Second, we design robust deep-learning based methods to perform face and speaker recognition on surveillance-type data. Third, we explore multiple fusion schemes to combine the face and speaker recognition algorithms to perform effective person recognition on audio-video surveillance data. Experiments convey the efficacy of the proposed multimodal fusion scheme (face and voice) over unimodal approaches in surveillance scenarios.
|
|
14:00-16:00, Paper FrPMP.38 | |
Hard Zero Shot Learning for Gesture Recognition |
Madapana, Naveen | Purdue Univ |
Wachs, Juan | Purdue Univ |
Keywords: Gesture recognition, Transfer learning
Abstract: Gesture based systems allow humans to interact with devices and robots in a natural way. Yet, current gesture recognition systems can not recognize the gestures outside a limited lexicon. This opposes the idea of lifelong learning which require systems to adapt to unseen object classes. These issues can be best addressed using Zero Shot Learning (ZSL), a paradigm in machine learning that leverages the semantic information to recognize new classes. ZSL systems developed in the past used hundreds of training examples to detect new classes and assumed that test examples come from unseen classes. This work introduces two complex and more realistic learning problems referred as Hard Zero Shot Learning (HZSL) and Generalized HZSL (G-HZSL) necessary to achieve Life Long Learning. The main objective of these problems is to recognize unseen classes with limited training information and relax the assumption that test instances come from unseen classes. We propose to leverage one shot learning (OSL) techniques coupled with ZSL approaches to address and solve the problem of HZSL for gesture recognition. Further, supervised clustering techniques are used to discriminate seen classes from unseen classes. We assessed and compared the performance of various existing algorithms on HZSL for gestures using two standard datasets: MSRC-12 and CGD2011. For four unseen classes, results show that the marginal accuracy of HZSL - 15.2% and G-HZSL - 14.39% are comparable to the performance of conventional ZSL. Given that we used only one instance and do not assume that test classes are unseen, the performance of HZSL and G-HZSL models were remarkable.
|
|
14:00-16:00, Paper FrPMP.39 | |
SynRhythm: Learning a Deep Heart Rate Estimator from General to Specific |
Niu, Xuesong | Inst. of Computing Tech. CAS |
Han, Hu | Inst. of Computing Tech. CAS |
Shan, Shiguang | Inst. of Computing Tech. ChineseAcademy of Sciences |
Chen, Xilin | Inst. of Computing Tech |
Keywords: Affective computing, Video analysis, Face recognition
Abstract: Remote photoplethysmography (rPPG) based non-contact heart rate (HR) measurement from a face video has drawn increasing attention recently because of its potential applications in many scenarios such as training aid, health monitoring, and nursing care. Although a number of methods have been proposed, most of them are designed under certain assumptions and could fail when such assumptions do not hold. At the same time, while deep learning based methods have been reported to achieve promising results in many computer vision tasks, their use in rPPG-based heart rate estimation has been limited due to the very limited data available in public domain. To overcome this limitation and leverage the strong modeling ability of deep neural networks, in this paper, we propose a novel spatial-temporal representation for the HR signal and design a general-to-specific transfer learning strategy to train a deep heart rate estimator from a large volume of synthetic rhythm signals and a limited number of available face video data. Experiment results on the public-domain databases show the effectiveness of the proposed approach.
|
|
14:00-16:00, Paper FrPMP.40 | |
Revised Contrastive Loss for Robust Age Estimation from Face |
Pan, Hongyu | Chinese Acad. of Sciences |
Han, Hu | Inst. of Computing Tech. CAS |
Shan, Shiguang | Inst. of Computing Tech. ChineseAcademy of Sciences |
Chen, Xilin | Inst. of Computing Tech |
Keywords: Estimation of demographic factors, Face recognition
Abstract: Age estimation has broad application prospect in many fields, such as video surveillance, social networks, and human-computer interaction. Many of existing approaches treat age estimation as a classification problem; however, the individual age values are not independent classes; they have a relatively ordinal relationship. Classification loss such as softmax is not able to model such kind of relationship. In this paper, we propose a new loss function, called revised contrastive loss, to model the ordinal relationship between ages. Specifically, the revised contrastive loss is introduced to penalize the distance between each pair of deep features according to their age differences, which makes the deep features more discriminative for age estimation task. The revised contrastive loss and softmax loss are embedded to Convolutional Neural Networks (CNNs), and the networks are optimized via Stochastic Gradient Descent (SGD) in an end-to-end learning way. Experimental results on a number of challenging face aging databases (FG-NET, MORPH Album uppercaseexpandafter{romannumeral2}, and CLAP2016) show that the proposed approach outperforms the state-of-the-art methods by a large margin using a single model.
|
|
14:00-16:00, Paper FrPMP.41 | |
Automatic Facial Attractiveness Prediction by Deep Multi-Task Learning |
Gao, Lian | Beihang Univ |
Li, Weixin | Beihang Univ |
Huang, Zehua | Tusimple |
Huang, Di | Beihang Univ |
Wang, Yunhong | Beihang Univ |
Keywords: Face recognition, Multitask learning, Deep learning
Abstract: Facial Attractiveness Prediction (FAP) is a useful yet challenging problem in the domain of computer vision. In this paper, we propose a deep learning based approach. Different from the existing deep methods, the proposed one models both the texture and shape clues within a multi-task learning framework consisting of attractiveness score prediction and fiducial landmark localization, thus highlighting both of their roles in assessing attractiveness of faces. Considering that the training data are not extensive, a lightweight CNN is designed to jointly learn the facial representation, landmark location, and facial attractiveness score. The proposed method is evaluated on the SCUT-FBP database, and a prediction correlation 0.92, is delivered, which shows the effectiveness of our method. Furthermore, two additional experiments in terms of comparison between facial images before and after make-up or beautification are conducted. The results also prove the advantage of the proposed method.
|
|
14:00-16:00, Paper FrPMP.42 | |
Unobtrusive Driver Drowsiness Prediction Using Driving Behavior from Vehicular Sensors |
Dehzangi, Omid | Univ. of Michigan-Dearborn |
Selvamani, Masilamani | Univ. of Michigan |
Keywords: Human behavior analysis, Computer-aided detection and diagnosis, Modeling, simulation and visualization
Abstract: Falling asleep is an eventual result of drowsiness, while driving it might also be a cause of major disasters. Driver's oblivious attempt of driving a vehicle when they are drowsy will lead to life-threatening accidents and even fatality. In this paper, we developed a framework to capture the drowsiness state of the driver using vehicle measures in an unobtrusive way. A well experimented VR-based simulated driving environment was employed to monitor the driver's drowsiness based on the acceleration, braking, and steering wheel axis pattern of the vehicle, in tandem with the self-estimated rating from the subject using KSS (Karolinska Sleepiness Scale). We proposed two prediction models to accomplish the drowsiness detection. We used the Classification as well the Regression techniques which produced a maximum accuracy rate of 99.10% and a minimum error rate of 0.34 RMSE. It was evaluated, based on its performance through the Ensemble classifier and Decision-Tree algorithms. As a result, it is identified that a system built using the Decision-Tree with the proposed segmentation of 4 sec window, could determine the driver's drowsiness at the earliest of 4.4 sec in the Classification and 4.5 sec with Regression, respectively. KEYWORDS - Driver Drowsiness, KSS, Correlation analysis, On-line Prediction, Classification and Regression.
|
|
14:00-16:00, Paper FrPMP.43 | |
Fused Text Segmentation Networks for Multi-Oriented Scene Text Detection |
Dai, Yuchen | Shanghai Jiao Tong Univ |
Huang, Zheng | Shanghai Jiao Tong Univ |
Gao, Yuting | Shanghai Jiao Tong Univ |
Xu, Youxuan | Xiamen No.1 High School of Fujian |
Chen, Kai | ShangHai JiaoTong Univ |
Guo, Jie | Shanghai Jiao Tong Univ |
Qiu, Weidong | Shanghai Jiao Tong Univ |
Keywords: Scene text detection and recognition, Segmentation, features and descriptors, Deep learning
Abstract: In this paper, we introduce a novel end-end framework for multi-oriented scene text detection from an instance-aware semantic segmentation perspective. We present Fused Text Segmentation Networks, which combine multi-level features during the feature extracting as text instance may rely on finer feature expression compared to general objects. It detects and segments the text instance jointly and simultaneously, leveraging merits from both semantic segmentation task and region proposal based object detection task. Not involving any extra pipelines, our approach surpasses the current state of the art on multi-oriented scene text detection benchmarks: ICDAR2015 Incidental Scene Text and MSRA-TD500 reaching Hmean 84.1% and 82.0% respectively. Morever, we report a baseline on total-text containing curved text which suggests effectiveness of the proposed approach.
|
|
14:00-16:00, Paper FrPMP.44 | |
R2CNN: Rotational Region CNN for Arbitrarily-Oriented Scene Text Detection |
Jiang, Yingying | Samsung R&D Inst. China - Beijing |
Zhu, Xiangyu | Samsung R&D Inst. China - Beijing |
Wang, Xiaobing | Samsung R&D Inst. China - Beijing |
Yang, Shuli | Samsung R&D Inst. China - Beijing |
Li, Wei | Samsung R&D Inst. China - Beijing |
Wang, Hua | Samsung R&D Inst. China - Beijing |
Fu, Pei | Samsung R&D Inst. China - Beijing |
Luo, Zhenbo | Beijing Samsung Telecom R&D Center |
Keywords: Scene text detection and recognition, Object detection, Applications of computer vision
Abstract: Scene text detection is challenging as the input may have different orientations, sizes, font styles, lighting conditions, perspective distortions and languages. This paper addresses the problem by designing a Rotational Region CNN (R2CNN). R2CNN includes a Text Region Proposal Network (Text-RPN) to estimate approximate text regions and a multi-task refinement network to get the precise inclined box. Our work has the following features. First, we use a novel multi-task regression method to support arbitrarily-oriented scene text detection. Second, we introduce multiple ROIPoolings to address the scene text detection problem for the first time. Third, we use an inclined Non-Maximum Suppression (NMS) to post-process the detection candidates. Experiments show that our method outperforms the state-of-the-art on standard benchmarks: ICDAR 2013, ICDAR 2015, COCO-Text and MSRA-TD500.
|
|
14:00-16:00, Paper FrPMP.45 | |
Word Image Representation Based on Visual Embeddings and Spatial Constraints for Keyword Spotting on Historical Documents |
Wei, Hongxi | Inner Mongolia Univ |
Zhang, Hui | Inner Mongolian Univ |
Gao, Guanglai | Inner Mongolia Univ |
Keywords: Document retrieval, Historical document analysis, Applications of deep learning to document analysis
Abstract: This paper proposed a visual embeddings approach to capturing semantic relatedness between visual words. To be specific, visual words are extracted and collected from a word image collection under the Bag-of-Visual-Words framework. And then, a deep learning procedure is used for mapping visual words into embedding vectors in a semantic space. To integrate spatial constraints into the representation of word images, one word image is segmented into several sub-regions with equal size along rows and columns. After that, each sub-region can be represented as an average of embedding vectors, which is the centroid of the embedding vectors of all visual words within the same sub-region. By this way, one word image can be converted into a fixed-length vector by concatenating the corresponding average embedding vectors from its all sub-regions. Euclidean distance can be calculated to measure similarity between word images. Experimental results demonstrate that the proposed representation approach outperforms Bag-of-Visual-Words, visual language model, spatial pyramid matching, latent Dirichlet allocation, average visual word embeddings and recurrent neural network.
|
|
14:00-16:00, Paper FrPMP.46 | |
CG-DIQA: No-Reference Document Image Quality Assessment Based on Character Gradient |
Li, Hongyu | ZhongAn Information Tech. Service Co., Ltd |
Zhu, Fan | ZhongAn Information Tech. Service Co., Ltd |
Qiu, Junhua | ZhongAn Information Tech. Service Co., Ltd |
Keywords: Document image processing, Document understanding, Image quality assessment
Abstract: Document image quality assessment (DIQA) is an important and challenging problem in real applications. In order to predict the quality scores of document images, this paper proposes a novel no-reference DIQA method based on character gradient, where the OCR accuracy is used as a ground-truth quality metric. Character gradient is computed on character patches detected with the maximally stable extremal regions (MSER) based method. Character patches are essentially significant to character recognition and therefore suitable for use in estimating document image quality. Experiments on a benchmark dataset show that the proposed method outperforms the state-of-the-art methods in estimating the quality score of document images.
|
|
14:00-16:00, Paper FrPMP.47 | |
Page Object Detection from PDF Document Images by Deep Structured Prediction and Supervised Clustering |
Li, Xiao-Hui | Inst. of Automation of Chinese Acad. of Sciences (CASIA) |
Yin, Fei | Inst. of Automation of CAS |
Liu, Cheng-Lin | Inst. of Automation, Chinese Acad. of Sciences |
Keywords: Document understanding, Object detection, Structured prediction
Abstract: Page object detection in document images remains a challenge because the page objects are diverse in scale and aspect ratio, and an object may contain largely apart components. In this paper, we propose a hybrid method combining deep structured prediction and supervised clustering to detect formulas, tables and figures in PDF document images within a unified framework. The primitive region proposals extracted from each column region are classified and clustered with conditional random field (CRF) based graphical models which can integrate both local and contextual information. Both the unary and pairwise potentials of CRFs are formulated as convolutional neural networks (CNNs) to better exploit spatial contextual information. The CRF for clustering predicts the linked/cut label of between-region links. After CRF inference, the line regions of same class within a cluster are grouped into a page object. The state-of-the-art performance obtained on the public available ICDAR2017 POD competition dataset demonstrates the effectiveness and superiority of the proposed method.
|
|
14:00-16:00, Paper FrPMP.48 | |
Watercolor, Segmenting Images Using Connected Color Components |
Eskenazi, Sébastien | L3i, Univ. of La Rochelle |
Gomez-Krämer, Petra | Univ. of La Rochelle |
Ogier, Jean-Marc | Univ. De La Rochelle |
Keywords: Computational document forensics, Segmentation, features and descriptors, Color analysis
Abstract: In the context of document security systems, there is a growing need for a stable segmentation method. State-of-the-art document image segmentation methods are not stable as they use several parameters and thresholds such as binarization. Hence, this paper presents a new method for segmentation based on a new definition of connected color components and a new model of human vision. Our algorithm produces results that are three to four times more stable than state-of-the-art superpixel segmentation methods while maintaining a similar segmentation accuracy.
|
|
14:00-16:00, Paper FrPMP.49 | |
Handwriting Trajectory Recovery Using End-To-End Deep Encoder-Decoder Network |
Bhunia, Ayan Kumar | Inst. of Engineering and Management, Kolkata |
Bhowmick, Abir | Inst. of Engineering & Management |
Bhunia, Ankan Kumar | Jadavpur Univ |
Konwer, Aishik | Inst. of Engineering & Management |
Banerjee, Prithaj | Inst. of Engineering & Management |
Roy, Partha Pratim | IIT |
Pal, Umapada | Indian Statistical Inst |
Keywords: Document understanding, Pen-based document analysis, Deep learning
Abstract: In this paper, we introduce a novel technique to recover the pen trajectory of offline characters which is a crucial step for handwritten character recognition. Generally, online acquisition approach has more advantage than its offline counterpart as the online technique keeps track of the pen movement. Hence, pen tip trajectory retrieval from offline text can bridge the gap between online and offline methods. Our proposed framework employs sequence to sequence model which consists of an encoder-decoder LSTM module. The proposed encoder module consists of Convolutional LSTM network, which takes an offline character image as the input and encodes the feature sequence to a hidden representation. The output of the encoder is fed to a decoder LSTM and we get the successive coordinate points from every time step of the decoder LSTM. Although the sequence to sequence model is a popular paradigm in various computer vision and language translation tasks, the main contribution of our work lies in designing an end-to-end network for a decade old popular problem in document image analysis community. Tamil, Telugu and Devanagari characters of LIPI Toolkit dataset are used for our experiments. Our proposed method has achieved superior performance compared to the other conventional approaches.
|
|
14:00-16:00, Paper FrPMP.50 | |
Word Level Font-To-Font Image Translation Using Convolutional Recurrent Generative Adversarial Networks |
Bhunia, Ankan Kumar | Jadavpur Univ |
Bhunia, Ayan Kumar | Inst. of Engineering and Management, Kolkata |
Banerjee, Prithaj | Inst. of Engineering & Management |
Konwer, Aishik | Inst. of Engineering & Management |
Bhowmick, Abir | Inst. of Engineering & Management |
Roy, Partha Pratim | IIT |
Pal, Umapada | Indian Statistical Inst |
Keywords: Applications of deep learning to document analysis, Document image processing, Document understanding
Abstract: Conversion of one font to another font is very useful in real life applications. In this paper, we propose a Convolutional Recurrent Generative model to solve the word level font transfer problem. Our network is able to convert the font style of any printed text images from its current font to the required font. The network is trained end-to-end for the complete word images. Thus it eliminates the necessary pre-processing steps, like character segmentations. We extend our model to conditional setting that helps to learn one-to-many mapping function. We employ a novel convolutional recurrent model architecture in the Generator that efficiently deals with the word images of arbitrary width. It also helps to maintain the consistency of the final images after concatenating the generated image patches of target font. Besides, the Generator and the Discriminator network, we employ a Classification network to classify the generated word images of converted font style to their subsequent font categories. Most of the earlier works related to image translation are performed on square images. Our proposed architecture is the first of its kind which can handle images of varying widths. Word images generally have varying width depending on the number of characters present. Hence, we test our model on a synthetically generated font dataset. We compare our method with some of the state-of-the-art methods for image translation. The superior performance of our network on the same dataset proves the ability of our model to learn the font distributions.
|
|
14:00-16:00, Paper FrPMP.51 | |
Weighted-Gradient Features for Handwritten Line Segmentation |
Khare, Vijeta | Univ. Malaya |
Palaiahnakote, Shivakumara | National Univ. of Singapore |
Navya, B. J | Univ. of Mysore |
Swetha, G. C | Univ. of Mysore |
Guru, D. S | Univ. of Mysore |
Pal, Umapada | Indian Statistical Inst |
Lu, Tong | State Key Lab. for Software Tech. Nanjing Univ |
Keywords: Signature verification and writer identification, Pen-based document analysis, Document image processing
Abstract: Text line segmentation from handwritten documents is challenging when the document contains severe touching. In this paper, we propose a new idea based on weighted gradient features for segmenting text lines. The proposed method finds the number of zero crossing points for every row of Canny edge image of the input image, which is considered as the weights of respective rows. The weights are then multiplied with gradient values of respective rows of the image to widen the gap between pixels in middle portion of text and other portions. Next, k-means clustering is performed on weighted gradient matrix to classify middle and other pixels of text. The method performs morphological operation to obtain word components as patches for the result clustering. The patches in both the clusters are matched to find common patch areas, which helps in reducing touching effect. Then the proposed method checks linearity and non-linearity iteratively based on patch direction to segment text lines. The method is tested on our own dataset, standard datasets, namely, Alaei, ICDAR 2013 robust competition on handwriting context and ICDAR 2015-HTR to evaluate the performance. Further, the method is compared with the state of art methods to show its effectiveness and usefulness.
|
|
14:00-16:00, Paper FrPMP.52 | |
Multi-Gradient Directional Features for Gender Identification |
Navya, B. J | Univ. of Mysore |
Swetha, G. C | Univ. of Mysore |
Palaiahnakote, Shivakumara | National Univ. of Singapore |
Roy, Sangheeta | Kolkata |
Guru, D. S | Univ. of Mysore |
Pal, Umapada | Indian Statistical Inst |
Lu, Tong | State Key Lab. for Software Tech. Nanjing Univ |
Keywords: Document image processing, Document analysis systems, Historical document analysis
Abstract: Gender identification based on handwriting analysis has received a special attention to the researchers in the field of document image analysis as it is useful for several real time applications like forensic, population counting, etc. In this paper, we explore Multi-Gradient Directional (MGD) features which provide direction of dominant pixels obtained by Canny edge image and gradient direction symmetry. The proposed method further performs histogram operation for the gradient angle information of dominant pixels of respective multi-gradient directional images to select the angle which contributes the highest peak. This results in feature vectors. The process of feature vector formation continues for the segmented first, second and third text lines in the images by male or female. The correlation is estimated for the vector of the first line with successive lines until the converging or diverging criteria is met. If the convergence happens, a document is considered as female else is considered male. The method is tested on our own dataset which includes images of different scripts, writers, papers, pens and ages, and the standard database, namely, QUWI which includes Arabic and English text to demonstrate the efficacy of the proposed method. Comparative studies with the state of the art methods show that the proposed method is effective and useful.
|
|
14:00-16:00, Paper FrPMP.53 | |
Scene Text Rectification Using Glyph and Character Alignment Properties |
Kil, Taeho | Samsung Electronics |
Koo, Hyung Il | Seoul National Univ |
Cho, Nam Ik | Seoul National Univ |
Keywords: Scene text detection and recognition, Document image processing, Layout analysis
Abstract: Scene text images usually suffer from perspective distortions, and hence their rectification has been an essential pre-processing step for many applications. Existing methods for scene text rectification mainly exploited the glyph property, which means that the characters in many languages have horizontal/vertical strokes and also have some symmetries in their shapes. In this paper, we propose to use an additional property that the characters need to be well aligned when rectified. For this, character alignment, as well as glyph properties, are encoded in the proposed cost function, and its minimization generates the transformation parameters. For encoding the alignment constraints, we perform the character segmentation using a projection profile method before optimizing the cost function. Since better segmentation needs better rectification and vice versa, the overall algorithm is designed to perform character segmentation and rectification iteratively. We evaluate our method on real and synthetic scene text images, and the experimental results show that our method achieves higher optical character recognition (OCR) rate than the previous approaches and also yields visually pleasing results.
|
|
14:00-16:00, Paper FrPMP.54 | |
Detecting Phishing Websites and Targets Based on URLs and Webpage Links |
Yuan, Huaping | Guangdong Univ. of Tech |
Chen, Xu | Guangdong Univ. of Tech |
Li, Yukun | Guangdong Univ. of Tech |
Yang, Zhenguo | Guangdong Univ. of Tech |
Liu, Wenyin | Guangdong Univ. of Tech |
Keywords: Applications of document analysis, Document analysis systems, Document retrieval
Abstract: In this paper, we propose to extract features from URLs and webpage links to detect phishing websites and their targets. In addition to the basic features of a given URL, such as length, suspicious characters, number of dots, a feature matrix is also constructed from these basic features of the links in the given URL’s webpage. Furthermore, certain statistical features are extracted from each column of the feature matrix, such as mean, median, and variance. Lexical features are also extracted from the given URL, the links and content in its webpage, such as title and textual content. A number of machine learning models have been investigated for phishing detection, among which Deep Forest model shows competitive performance, achieving a true positive rate of 98.3% and a false alarm rate of 2.6%. In particular, we design an effective strategy based on search operator via search engines to find the phishing targets, which achieves an accuracy of 93.98%.
|
|
14:00-16:00, Paper FrPMP.55 | |
Script Identification of Central Asia Based on Fused Texture Features |
Han, Xing-kun | Xinjiang Univ |
Aysa, Alim | Xinjiang Univ |
Mamat, Hornisa | Xinjiang Univ |
Yadikar, Nurbiya | Xinjiang Univ |
Ubul, Kurban | Xinjiang Univ |
Keywords: Applications of document analysis, Document understanding, Document image processing
Abstract: Script identification is an important step in multi-script recognition. Despite the achieved results in this field, the identification of Central Asian scripts has not been considered in-depth. In the Central Asian region, there are many similar scripts, and the traditional texture features can not discriminate them accurately. This paper proposes a script identification method based on fused texture features for Central Asian document images. On preprocessed multilingual document images, the method first performs Non-subsampled Contourlet Transform (NSCT), and then extracts Tamura texture features of the generated sub-bands. A Support Vector Machine (SVM) classifier is trained for classification. For experimental evaluation, we collected a dataset of 30,000 document images for 10 scripts of Arabic, Russian, Kazakhstan, Chinese, Kyrgyzstan, Turkish, Uyghur, Tibetan, Mongolian and English. The experimental results show that the proposed method can extract multi-scale and multi-directional texture features, and the fusion of texture features leads to superior performance of script identification.
|
|
14:00-16:00, Paper FrPMP.56 | |
Trajectory-Based Radical Analysis Network for Online Handwritten Chinese Character Recognition |
Zhang, Jianshu | Univ. of Science and Tech. of China |
Zhu, Yixing | Univ. of Science and Tech. of China |
Du, Jun | Univ. of Science and Tech. of China |
Dai, Li-Rong | Univ. of Science and Tech. of China |
Keywords: Character and text recognition, Pen-based document analysis, Neural networks
Abstract: Recently, great progress has been made for online handwritten Chinese character recognition due to the emergence of deep learning techniques. However, previous research mostly treated each Chinese character as one class without explicitly considering its inherent structure, namely the radical components with complicated geometry. In this study, we propose a novel trajectory-based radical analysis network (TRAN) to firstly identify radicals and analyze two-dimensional structures among radicals simultaneously, then recognize Chinese characters by generating captions of them based on the analysis of their internal radicals. The proposed TRAN employs recurrent neural networks (RNNs) as both an encoder and a decoder. The RNN encoder makes full use of online information by directly transforming handwriting trajectory into high-level features. The RNN decoder aims at generating the caption by detecting radicals and spatial structures through an attention model. The manner of treating a Chinese character as a two-dimensional composition of radicals can reduce the size of vocabulary and enable TRAN to possess the capability of recognizing unseen Chinese character classes, only if the corresponding radicals have been seen. Evaluated on CASIA-OLHWDB database, the proposed approach significantly outperforms the state-of-the-art whole-character modeling approach with a relative character error rate (CER) reduction of 10%. Meanwhile, for the case of recognition of 500 unseen Chinese characters, TRAN can achieve a character accuracy of about 60% while the traditional whole-character method has no capability to handle them.
|
|
14:00-16:00, Paper FrPMP.57 | |
A Fusion Strategy for the Single Shot Text Detector |
Yu, Zheng | East China Normal Univ |
Lyu, Shujing | East China Normal Univ |
Lu, Yue | East China Normal Univ |
Wang, Patrick | Northeastern Univ |
Keywords: Scene text detection and recognition, Image processing and analysis
Abstract: Abstract—In this paper, we propose a new fusion strategy for scene text detection. The system is based on a single fully convolution network, which outputs the coordinates of text bounding boxes at multiple scales. We improve the performance of text detection by combining a fusion strategy. This strategy obtains precise text bounding boxes according to the confidence of candidate text boxes. It exhibits surprising robustness and discriminative power by fusing text boxes. Experimental results on ICDAR2011 and ICDAR2013 datasets indicate the effectiveness and robustness of the proposed fusion strategy with an F-measure of 87%, which outperforms the base network 2%.
|
|
14:00-16:00, Paper FrPMP.58 | |
A Randomized Hierarchical Trees Indexing Approach for Camera-Based Information Spotting |
Dang, Quoc Bao | L3i Lab. Univ. of La Rochelle |
Coustaty, Mickaël | Univ. of La Rochelle |
Luqman, Muhammad Muzzamil | L3i Lab. Univ. of La Rochelle, France |
Ogier, Jean-Marc | Univ. De La Rochelle |
De Cao, Tran | Can Tho Univ |
Keywords: Document retrieval, Applications of computer vision, Large scale document analysis
Abstract: In this paper, we propose an indexing approach for camera-based document image retrieval and spotting systems. The proposed approach is based on randomized hierarchical trees without storing database vector points in the memory. To construct the trees, k-means-based clustering is used for splitting the data points of every non-leaf node into 2 distinct groups. Instead of using the entire dimensions, only a small number of dimensions is chosen randomly and they are combined with the dimension with the highest variance which is computed along all dimensions and the maximum variance is selected. Experimental results demonstrate the usefulness of the proposed approach for limited memory situations, as the proposed random trees could approximately reach the accuracy of state-of-the-art methods on Tobacco dataset without storing the database descriptors in memory.
|
|
14:00-16:00, Paper FrPMP.59 | |
An End-To-End Neural Network for Multi-Line License Plate Recognition |
Cao, Yu | Beijing Univ. of Posts and Telecommunications |
Fu, Huiyuan | Beijing Univ. of Posts and Telecommunications |
Ma, Huadong | Beijing Univ. of Posts and Telecommunications |
Keywords: Character and text recognition, Visual surveillance, Object recognition
Abstract: Currently, license plate recognition plays an important role in numerous applications and a number of technologies have been proposed. However, most of them can only work with single-line license plates. In the practical application scenarios, there are also existing many multi-line license plates. The traditional approaches need to segment the original input images for double-line license plates. This is a very difficult problem in the complex scenes. In order to solve this problem, we propose an end-to-end neural network for both single-line and double-line license plate recognition. It is segmentation-free for the original input license plate images. We view each of these whole images as a unit on feature maps after deep convolution neural network directly. A large number of experiments show that our method is effective. It is better than the state-of-the-art algorithms in SYSU-ITS license plate library data.
|
|
14:00-16:00, Paper FrPMP.60 | |
DeepScores – a Dataset for Segmentation, Detection and Classifi Cation of Tiny Objects |
Tuggener, Lukas | Zürich Univ. of Applied Sciences, Univ. Della Svizzera |
Elezi, Ismail | Ca' Foscari Univ. of Venice; Zurich Univ. of Applied S |
Schmidhuber, Jürgen | Istituto Dalle Molle Di Studi Sull'intelligenza Artificial (IDSI |
Pelillo, Marcello | Ca' Foscari Univ |
Stadelmann, Thilo | Zurich Univ. of Applied Sciences |
Keywords: Large scale document analysis, Applications of deep learning to document analysis, Applications of computer vision
Abstract: We present the DeepScores dataset with the goal of advancing the state-of-the-art in small object recognition by placing the question of object recognition in the context of scene understanding. DeepScores contains high quality images of musical scores, partitioned into 300'000 sheets of written music that contain symbols of different shapes and sizes. With close to a hundred million small objects, this makes our dataset not only unique, but also the largest public dataset. DeepScores comes with ground truth for object classification, detection and semantic segmentation. DeepScores thus poses a relevant challenge for computer vision in general, and optical music recognition (OMR) research in particular. We present a detailed statistical analysis of the dataset, comparing it with other computer vision datasets like PASCAL VOC, SUN, SVHN, ImageNet, MS-COCO, as well as with other OMR datasets. Finally, we provide baseline performances for object classification, intuition for the inherent difficulty that DeepScores poses to state-of-the-art object detectors like YOLO or R-CNN, and give pointers to future research based on this dataset.
|
|
14:00-16:00, Paper FrPMP.61 | |
Learning Topics Using Semantic Locality |
Zhao, Ziyi | Syracuse Univ |
Pugdeethosapol, Krittaphat | Syracuse Univ |
Lin, Sheng | Syracuse Univ |
Li, Zhe | Syracuse Univ |
Ding, Caiwen | Syracuse Univ |
Wang, Yanzhi | Syracuse Univ |
Qiu, Qinru | Syracuse Univ |
Keywords: Document retrieval, Document understanding, Applications of deep learning to document analysis
Abstract: The topic modeling discovers the latent topic probability of the given text documents. To generate the more meaningful topic that better represents the given document, we proposed a new feature extraction technique which can be used in the data preprocessing stage. The method consists of three steps. First, it generates the word/word-pair from every single document. Second, it applies a two-way TF-IDF algorithm to word/word-pair for semantic filtering. Third, it uses the K-means algorithm to merge the word pairs that have the similar semantic meaning. Experiments are carried out on the Open Movie Database (OMDb), Reuters Dataset and 20NewsGroup Dataset. The mean Average Precision score is used as the evaluation metric. Comparing our results with other state-of-the-art topic models, such as Latent Dirichlet allocation and traditional Restricted Boltzmann Machines. Our proposed data preprocessing can improve the generated topic accuracy by up to 12.99%.
|
|
14:00-16:00, Paper FrPMP.62 | |
Historical Document Image Binarization Using Background Estimation and Energy Minimization |
Xiong, Wei | Hubei Univ. of Tech |
Jia, Xiuhong | Hubei Univ. of Tech |
Xu, Jingjing | HPE |
Xiong, Zijie | Hubei Univ. of Tech |
Liu, Min | Hubei Univ. of Tech |
Wang, Juan | Hubei Univ. of Tech |
Keywords: Document image processing, Historical document analysis, Applications of document analysis
Abstract: This paper presents an enhanced historical document image binarization technique that makes use of background estimation and energy minimization. Given a degraded historical document image, mathematical morphology is first carried out to compensate the document background with a disk-shaped mask, whose size is determined by the stroke width transform (SWT). The Laplacian energy based segmentation is then performed on the enhanced document image. Finally, the post-processing is further applied to improve the binarization results. The proposed technique has been extensively evaluated over the recent DIBCO and H-DIBCO benchmark datasets. Experimental results show that our proposed method outperforms other state-of-the-art document image binarization techniques.
|
|
14:00-16:00, Paper FrPMP.63 | |
Improve Word Mover's Distance with Part-Of-Speech Tagging |
Chen, Xiaojun | Inst. of Information Engineering |
Bai, Li | Univ. of Chinese Acad. of Sciences |
Wang, Dakui | Inst. of Information Engineering |
Shi, Jinqiao | Inst. of Information Engineering |
Keywords: Document retrieval
Abstract: Word Mover’s Distance (WMD) is a document distance metric with free parameter, intelligible interpretation, unprecedented accuracy on document classification. WMD is on the basis of word embedding and largely focuses on semantic relationship rather than syntactic relationship, that would bring some limitations on measuring document distance. To enhance the impact of syntactic information, we proposed a new method called WMD with Part-of-Speech (PWMD) that integrates part-of-speech (POS) into the original WMD model. POS is a kind of syntactic information, providing more valuable features combined with WMD in document distance metric. Two combination strategies of the POS tagging are provided in PWMD, “word level” and “document level”. The results of contrastive experiments have shown that the PWMD is able to get better document distance than WMD.
|
|
14:00-16:00, Paper FrPMP.64 | |
Handwritten Digit String Recognition Using Convolutional Neural Network |
Zhan, Hongjian | East China Normal Univ |
Lu, Shujing | East China Normal Univ |
Lu, Yue | East China Normal Univ |
Keywords: Character and text recognition
Abstract: String recognition is one of the most important tasks in computer vision applications. Recently the combinations of convolutional neural network (CNN) and recurrent neural network (RNN) have been widely applied to deal with the issue of string recognition. However RNNs are not only hard to train but also time-consuming. In this paper, we propose a new architecture which is based on CNN only, and apply it to handwritten digit string recognition (HDSR). This network is composed of three parts from bottom to top: feature extraction layers, feature dimension transposition layers and an output layer. Motivated by its super performance of DenseNet, we utilize dense blocks to conduct feature extraction. At the top of the network, a CTC (connectionist temporal classification) output layer is used to calculate the loss and decode the feature sequence, while some feature dimension transposition layers are applied to connect feature extraction and output layer. The experiments have demonstrated that, compared to other methods, the proposed method obtains significant improvements on ORAND-CAR-A and ORAND-CAR-B datasets with recognition rates 92.2% and 94.02%, respectively.
|
|
14:00-16:00, Paper FrPMP.65 | |
Sliding Line Point Regression for Shape Robust Scene Text Detection |
Zhu, Yixing | Univ. of Science and Tech. of China |
Du, Jun | Univ. of Science and Tech. of China |
Keywords: Scene text detection and recognition, Deep learning, Neural networks
Abstract: Traditional text detection methods mostly focus on quadrangle text. In this study we propose a novel method named sliding line point regression (SLPR) in order to detect arbitrary-shape text in natural scene. SLPR regresses multiple points on the edge of text line and then utilizes these points to sketch the outlines of the text. The proposed SLPR can be adapted to many object detection architectures such as faster R-CNN and R-FCN. Specifically, we first generate the smallest rectangular box including the text with region proposal network (RPN), then isometrically regress the points on the edge of text by using the vertically and horizontally sliding lines. To make full use of information and reduce redundancy, we calculate x-coordinate or y-coordinate of target point by the rectangular box position, and just regress the remaining y-coordinate or x-coordinate. Accordingly we can not only reduce the parameters of system, but also restrain the points which will generate more regular polygon. Our approach achieved competitive results on traditional ICDAR2015 Incidental Scene Text benchmark and curve text detection dataset CTW1500.
|
|
14:00-16:00, Paper FrPMP.66 | |
Robust Scene Text Detection with Deep Feature Pyramid Network and CNN Based NMS Model |
Mohanty, Sabyasachi | IIT (BHU) Varanasi |
Dutta, Tanima | IIT (BHU) Varanasi |
Gupta, Hari Prabhat | IIT (BHU) Varanasi |
Keywords: Scene text detection and recognition, Deep learning
Abstract: Scene text detection has attracted great interest from the computer vision and pattern recognition communities since text information plays an important role in image indexing and scene understanding. Deep neural networks have become popular for the task of scene text detection, especially for their ability to learn strong text features. However, existing deep learning based state-of-the-art scene text detection methods detect texts only from a single feature map which is unable to capture semantic information at all scales. In this paper, we propose a novel deep learning based model that leverages the pyramid structure of feature maps for accurate scene text detection. We also design a deep convolutional neural network model for non-maximum suppression. In addition, we develop a novel loss function and training method for end-to-end training. The experimental results validate that our end-to-end network is simple, fast, and achieves high accuracy on standard datasets, namely, ICDAR 2015 and MSRA-TD500. We also create a dataset for scene text detection.
|
|
14:00-16:00, Paper FrPMP.67 | |
Scene Text Detection Via Deep Semantic Feature Fusion and Attention-Based Refinement |
Song, Yu | Chinese Acad. of Sciences |
Cui, Yuanshun | Chinese Acad. of Sciences |
Han, Hu | Inst. of Computing Tech. CAS |
Shan, Shiguang | Inst. of Computing Tech. ChineseAcademy of Sciences |
Chen, Xilin | Inst. of Computing Tech |
Keywords: Scene text detection and recognition, Object detection
Abstract: Despite tremendous progress in scene text detection in the past few years, efficient text detection in the wild remains challenging, particularly for the texts have large rotations, and the complicated background areas that are easily confused with text. In this paper, we propose an effective approach for scene text detection, which consists of initial text detection using the proposed deep semantic feature fusion of a fully convolutional network (FCN), and text detection refinement by our attention based text vs. non-text classifier learned in a fine-to-coarse fashion. The proposed approach outperforms the state-of-the-art scene text detection algorithms on the public-domain ICDAR2015 dataset, achieving an accuracy of 0.83 in terms of F-measure.
|
|
14:00-16:00, Paper FrPMP.68 | |
Exploring Discriminative HMM States for Improved Recognition of Online Handwriting |
Mandal, Subhasis | Indian Inst. of Tech. Guwahati |
Choudhury, Himakshi | Indian Inst. of Tech. Guwahati |
Prasanna, S.R. Mahadeva | Indian Inst. of Tech. Guwahati |
Sundaram, Suresh | Indian Inst. of Tech |
Keywords: Character and text recognition, Pen-based document analysis, Document analysis systems
Abstract: In this paper, we propose a novel approach for online handwriting recognition (HR) based on hidden Markov model (HMM). In a conventional HMM-based HR system, the input test sample is recognized by first measuring the log-likelihood score from each class-specific HMM, and then the class with the highest score is assigned as the recognized class. It is observed that, for a given test sample, the difference in log-likelihood scores of top-2 outputs (classes) is often less for faithful classification. The problem intensifies for those scripts that have a large set of similar shape characters such as the Indic script. To address this problem, first, we analyze the HMM states corresponding to the top-2 classes and identify a subset of states that most discriminate the two classes. Afterwards, the final recognition among the two classes is carried out by comparing the log-likelihood scores of these chosen states. Since the proposed methodology focuses only on the most discriminative states of the two classes, therefore it enhances the classification confidence as well as overall recognition accuracy with least added complexity. The proposal is demonstrated for character and limited vocabulary word recognition tasks and evaluated on the locally collected Assamese character and word databases. The experimental results are promising over the conventional HMM-based HR system.
|
|
14:00-16:00, Paper FrPMP.69 | |
Focus on Scene Text Using Deep Reinforcement Learning |
Wang, Haobin | South China Univ. of Tech |
Huang, Shuangping | SouthChina Univ. of Tech |
Jin, Lianwen | South China Univ. of Tech |
Keywords: Scene text detection and recognition, Reinforcement learning, Deep learning
Abstract: Scene text detection has been attracting increasing interests in recent years and a rich body of approaches has been proposed. These previous works of detecting scene text have been dominated by region proposals based approaches, which always generate too many text candidates relative to the number of ground truth bounding boxes. Only a few of those candidates are output as true predictions, and most of the other is fruitlessly involved in regression or classification predictions that consume a great amount of time and storage. Thus emerges the problem of low efficiency of generating text candidates. To address the issue, we propose a method for focusing on scene text gradually guided by an active model. The model allows an agent to take the whole image as the only region proposal in each episode when locating text and therefore significantly reduces the region proposals needed. The agent is trained by deep reinforcement strategy to learn how to estimate future returns of given states and sequentially make decisions to find scene text. Considering the characteristics of scene text, we additionally propose a flexible action scheme and a new reward scheme together with lazy punishment. The experiments on the ICDAR 2013 dataset shows that the proposed method achieve a promising performance while using region proposals as few as the ground truth bounding boxes.
|
|
14:00-16:00, Paper FrPMP.70 | |
Variational Mode Decomposition-Based Heart Rate Estimation Using Wrist-Type Photoplethysmography During Physical Exercise |
He, Wenwen | Univ. of Electronic Science and Tech. of China |
Ye, Yalan | Univ. of Electronic Science and Tech. of China |
Li, Yunxia | Univ. of Electronic Science and Tech. of China |
Xu, Haijin | Univ. of Electronic Science and Tech. of China |
Lu, Li | Univ. of Electronic Science and Tech. of China |
Huang, Wenxia | West China Hospital of Sichuan Univ |
Sun, Ming | Univ. of Electronic Science and Tech. of China; Chengdu |
Keywords: Bioinformatics
Abstract: Heart rate (HR) monitoring based on Photoplethysmography (PPG) has drawn increasing attention in modern wearable devices due to its simple hardware implementation and low cost. In this work, we propose a variational mode decomposition(VMD)-based HR estimation method using wrist-type PPG signals during physical exercise. To remove motion artifacts (MA), VMD was first used and then a post-processing method after VMD was proposed to guarantee the robustness of MA removal. The performance of our proposed method was evaluated on two PPG datasets used in 2015 IEEE Signal Processing Cup. The method achieved the average absolute error of 1.45 beat per minute (BPM) on the 12 training sets and 3.19 BPM on the 10 testing sets, confirmed by the experimental results.
|
|
14:00-16:00, Paper FrPMP.71 | |
Reliable Fussed Lasso Approach for Recurrent Copy Number Variation Identification |
Alshawaqfeh, Mustafa | German Jordanian Univ |
Al Kawam, Ahmad | Texas A&M Univ |
Serpedin, Erchin | Texas A&M Univ |
Keywords: Bioinformatics, Modeling, simulation and visualization, Biological image and signal analysis
Abstract: The detection of recurrent CNVs enables a more detailed study of the regions in which they occur and their relationship with the disease onset and evolution. However, due to inter-sample variability and high noise levels, simple pattern detection methods experience significant challenges in recovering the recurrent CNV regions. This paper proposes a novel method for reliable identification of recurrent CNV regions from noisy aCGH data. The proposed method exploits a special decomposition of aCGH data and a fussed lasso approach to estimate accurately the recurrent CNVs. The observed aCGH matrix data is decomposed into three matrices: a full-rank matrix consisting of weighted piecewise generating signals to capture the recurrent CNVs, a sparse matrix to model the inter-sample variability, and a Gaussian noise matrix to capture experimentation errors and other possible sources of errors. The ability of the proposed method to detect recurrent CNVs is corroborated through comprehensive simulations over judiciously designed artificial datasets as well as realistic publicly available datasets. Extensive computer simulations confirm that the proposed method for identification of recurrent CNVs is more accurate than the existing state-of-the-art methods, and presents the additional benefits of yielding clean aCGH signals and being robust to noise and outlier probe values.
|
|
14:00-16:00, Paper FrPMP.72 | |
Deep Learning Based Bioresorbable Vascular Scaffolds Detection in IVOCT Images |
Cao, Yihui | Shenzhen Vivolight Medical Device & Tech. Co., Ltd |
Lu, Yifeng | Shenzhen Vivolight Medical Device & Tech. Co., Ltd |
Li, Jianan | Shenzhen Vivolight Medical Device & Tech. Co., Ltd |
Rui, Zhu | Shenzhen Vivolight Medical Device & Tech. Co., Ltd |
Jin, Qinhua | Department of Cardiology, Chinese PLA General Hospital |
Jing, Jing | Department of Cardiology, Chinese PLA General Hospital |
Yundai, Chen | Department of Cardiology, Chinese PLA General Hospital |
Keywords: Medical image and signal analysis, Deep learning, Object detection
Abstract: Bioresorbable Vascular Scaffolds (BVS) are currently one of the most frequently-used type of stent during percutaneous coronary intervention. It's very important to conduct struts malappostion analysis during operation. Currently, BVS malappostion analysis in intravascular optical coherence tomography (IVOCT) images is mainly conducted manually, which is labor intensive and time consuming. In our previous work, a novel framework was presented to automatically detect and segment BVS struts for malappostion analysis. However, limited by the detection performance, the framework faced some challenges under complex background. In this paper, we proposed a robust BVS struts detection method based on Region-based Fully Convolutional Network (R-FCN). The detection model mainly consisted of two modules: 1) a Region Proposal Network (RPN), used to extract struts region of interest (ROIs) in the image and, 2) a detection module, used to classify the ROIs and regress a bounding box for each ROI. The network was initialized by pre-trained ImageNet model and then trained based on our labeled data which contained 1231 IVOCT images. Tested on a total of 480 IVOCT images with 4096 BVS struts, our method achieved 97.9% true positive rate with 4.79% false positive rate. It concludes that the proposed method is efficient and robust for BVS struts detection.
|
|
14:00-16:00, Paper FrPMP.73 | |
Mining NMR Spectroscopy Using Topic Models |
Bicego, Manuele | Univ. of Verona |
Lovato, Pietro | Univ. of Verona |
De Bona, Marco | Univ. of Verona |
Guzzo, Flavia | Univ. of Verona |
Assfalg, Michael | Univ. of Verona |
Keywords: Bioinformatics
Abstract: Pattern Recognition techniques have been successfully exploited for the biomedical analysis of NMR spectra. In this context, it is crucial to derive a suitable representation for the data: among others, a successful line of research exploits the Bag of Words representation (called here ``Bag of Peaks''). However, despite its success, the Bag of Peaks paradigm has not been fully explored: for example, appropriate probabilistic models (such as topic models) can further distill the information contained in the Bag of Words, allowing for more interpretable and accurate solutions for the task-at-hand. This paper is aimed at filling this gap, by investigating the usefulness of topic models in the analysis of NMR spectra. In particular, we first introduce an unsupervised approach, based on topic models, that performs soft biclustering of NMR spectra -- this kind of unsupervised analysis being new in the NMR literature. Second, we show that descriptors extracted from topic models can be successfully employed for classification of NMR samples: compared to the original Bag of Words, we prove that our descriptors provide higher accuracies. Finally, we perform an empirical evaluation involving a complex dataset of spectra derived from fruits, and two datasets of medical NMR spectra: our analysis confirms the suitability of such models in the NMR spectra analysis.
|
|
14:00-16:00, Paper FrPMP.74 | |
Automatic Segmentation of Kidney and Renal Tumor in CT Images Based on 3D Fully Convolutional Neural Network with Pyramid Pooling Module |
Yang, Guanyu | Southeast Univ |
Li, Guoqing | Southeast Univ |
Pan, Tan | Southeast Univ |
Kong, Youyong | Southeast Univ |
Wu, Jia Song | Southeast Univ |
Tang, Lijun | Nanjing Medical Univ |
Zhu, Xiaomei | Nanjing Medical Univ |
Dillenseger, Jean Louis | Univ. De Rennes 1 |
Shu, Huazhong | Southeast Univ |
Luo, Limin | Southeast Univ |
Coatrieux, Jean Louis | LTSI, Univ. De Rennes 1, Rennes, France |
Keywords: Medical image and signal analysis, Deep learning, Segmentation, features and descriptors
Abstract: Renal cancer is one of ten most common cancers in human beings. The laparoscopic partial nephrectomy (LPN) becomes the main therapeutic approach in treating renal cancer. Accurate kidney and tumor segmentation in CT images is a prerequisite step in the surgery planning. However, automatic and accurate kidney and renal tumor segmentation in CT images remains a challenge. In this paper, we propose a new method to perform a precise segmentation of kidney and renal tumor in CT angiography images. This method relies on a three-dimensional (3D) fully convolutional network (FCN) which combines a pyramid pooling module (PPM). The proposed network is implemented as an end-to-end learning system directly on 3D volumetric images. It can make use of the 3D spatial contextual information to improve the segmentation of the kidney as well as the tumor lesion. The experiments conducted on 140 patients show that these target structures can be segmented with a high accuracy. The resulting average dice coefficients obtained for kidney and renal tumor are equal to 0.931 and 0.802 respectively. These values are higher than those obtained from the other two neural networks.
|
|
14:00-16:00, Paper FrPMP.75 | |
Multi-Label Semantic Decoding from Human Brain Activity |
Li, Dan | Inst. of Automation, Chinese Acad. of Sciences |
Du, Changde | Inst. of Automation, Chinese Acad. of Sciences |
Huang, Lijie | Inst. of Automation, Chinese Acad. of Sciences |
Chen, Zhiqiang | Inst. of Automation, Chinese Acad. of Sciences |
He, Huiguang | Inst. of Automation, Chinese Acad. of Sciences |
Keywords: Brain and cognitive engineering, Multilabel learning, Deep learning
Abstract: It is meaningful to decode the semantic information from functional magnetic resonance imaging (fMRI) brain signals evoked by natural images. Semantic decoding can be viewed as a classification problem. Since a natural image may contain many semantic information of different objects, the single label classification model is not appropriate to cope with semantic decoding problem, which motivates the multi-label classification model. However, most multi-label models always treat each label equally. Actually, if dataset is associated with a large number of semantic labels, it will be difficult to get an accurate prediction of semantic label when the label appears with a low frequency in this dataset. So we should increase the relative importance degree to the labels that associate with little instances. In order to improve multi-label prediction performance, in this paper, we firstly propose a multinomial label distribution to estimate the importance degree of each associated label for an instance by using conditional probability, and then establish a deep neural network (DNN) based model which contains both multinomial label distribution and label co-occurrence information to realize the multi-label classification of semantic information in fMRI brain signals. Experiments on three fMRI recording datasets demonstrate that our approach performs better than the state-of-the-art methods on semantic information prediction.
|
|
14:00-16:00, Paper FrPMP.76 | |
Despeckling CNN with Ensembles of Classical Outputs |
Mishra, Deepak | Indian Inst. of Tech. Delhi |
Tyagi, Sarthak | Indian Inst. of Tech. Delhi |
Chaudhury, Santanu | Indian Inst. of Tech. Delhi |
Sarkar, Mukul | Indian Inst. of Tech. Delhi |
Soin, Arvinder Singh | Medanta Hospital |
Keywords: Medical image and signal analysis, Enhancement, restoration and filtering
Abstract: Ultrasound (US) image despeckling is a problem of high clinical importance. Machine learning solutions to the problem are considered impractical due to the unavailability of speckle-free US image dataset. On the other hand, the classical approaches, which are able to provide the desired outputs, have limitations like input dependent parameter tuning. In this work, a convolutional neural network (CNN) is developed which learns to remove speckle from US images using the outputs of these classical approaches. It is observed that the existing approaches can be combined in a complementary manner to generate an output better than their individual outputs. Thus, the CNN is trained using the individual outputs as well as the output ensembles. It eliminates the cumbersome process of parameter tuning required by the existing approaches for every new input. Further, the proposed CNN is able to outperform the state-of-the-art despeckling approaches and produces the outputs even better than the ensembles for certain images.
|
|
14:00-16:00, Paper FrPMP.77 | |
Unsupervised Clustering of Mammograms for Outlier Detection and Breast Density Estimation |
Tlusty, Tal | IBM |
Ben-Ari, Rami | IBM-Res |
Amit, Guy | IBM Res |
Keywords: Computer-aided detection and diagnosis, Clustering, Neural networks
Abstract: The flourishing of machine learning use for cognitive tasks has driven an increased demand for large annotated training datasets. In the medical imaging domain, such datasets are scarce, and the process of labeling them is costly, error prone and requires high expertise. Unsupervised learning is therefore an attractive approach for analyzing unlabeled medical images. In this paper we describe an unsupervised analysis method, consisting of feature learning by Stacked Auto-Encoders, K-means clustering for building a data model, and encoding of new images using the model. We utilize this method for image-level and patch-level analysis of breast mammograms. At the image-level, we demonstrate that our cluster-based image encoding is able to identify outlier images such as images with implants or non-standard acquisition views. At the patch-level, we show that image signatures using patch clustering can be used for unsupervised semantic segmentation of breast tissues, as well as for separating mammograms with high and low breast density. We evaluate our suggested methods on large datasets and discuss potential applications for data curation, machine-guided annotation and automatic interpretation of medical images.
|
|
14:00-16:00, Paper FrPMP.78 | |
Fully Convolutional Neural Networks for Prostate Cancer Detection Using Multi-Parametric Magnetic Resonance Images: An Initial Investigation |
Wang, Yunzhi | Univ. of Oklahoma |
Zheng, Bin | Univ. of Oklahoma |
Gao, Dashan | 12 Sigma Tech |
Wang, Jiao | 12 Sigma Tech |
Keywords: Computer-aided detection and diagnosis, Medical image and signal analysis
Abstract: Prostate cancer is one of the leading causes of cancer deaths in men population in United States. The motivation of this study is to investigate the feasibility of developing a deep learning based computer-aided detection (CAD) scheme for prostate cancer detection from multi-parametric magnetic resonance images (mpMRIs). The proposed scheme consists of a prostate segmentation stage and a tumor detection stage. In the first stage, we adopt a state-of-art fully convolutional network (FCN) architecture with residual connections to segment prostate areas from mpMRIs including T2-weighted (T2W), T1 and diffusion-weighted imaging (DWI). We demonstrate that the proposed mpMRI based segmentation scheme yield better performance than the previous T2W based schemes. In the second stage, we present a cascaded training strategy to train a FCN with a weighted cross entropy loss function for tumor detection from mpMRIs including T2W, DWI, apparent diffusion coefficient (ADC) and K-trans. By experiments we demonstrate promising detection performance of the proposed CAD scheme.
|
|
14:00-16:00, Paper FrPMP.79 | |
Automatic Prostate Segmentation on MR Images Using Enhanced Holistically-Nested Networks |
Ji, Dong | Hefei Univ. of Tech |
Zhan, Shu | Hefei Univ. of Tech |
Qian, Jinzhao | Tsinghua Univ |
Kurihara, Toru | Kochi Univ. of Tech |
Keywords: Medical image and signal analysis, Image processing and analysis, Deep learning
Abstract: Magnetic resonance(MR) imaging has shown to be succeed in detecting and visualizing the prostate location. The accurate segmentation of the prostate gland from MR images is necessary for clinical applications. However, the segmentation of prostate is also a challenging task because of the shape of prostate varies significantly and the inhomogeneous intensity distributions in different scans. In this paper, we present a automatic deep learning method for prostate MR images segmentation using the enhanced holistically-nested framework. The network Holistically-Nested Networks(HNN) was first proposed as an image-to-image solution to extract object edges and boundaries visually. We modify HNN via putting additional skip connections from later stages to early stages in order to combine both low-level features and high-level features. The deeper framework exploits multi-level and multi-scale information for the image-to-image prediction in a holistic manner. Experimental evaluation demonstrates that significant segmentation accuracy has been achieved by our proposed enhanced holistically-nested networks compared to other deep learning approaches.
|
|
14:00-16:00, Paper FrPMP.80 | |
DeephESC: An Automated System for Generating and Classification of Human Embryonic Stem Cells |
Theagarajan, Rajkumar | Univ. of California, Riverside |
Guan, Benjamin | Univ. of California, Riverside |
Bhanu, Bir | Univ. of California |
Keywords: Biological image and signal analysis
Abstract: Human Embryonic Stem Cells (hESC’s) are promising for the treatment of many diseases such as cancer, Parkinsons, Huntingtons, diabetes mellitus etc. and for toxicological testing. Automated detection and classification of human embryonic stem cell (hESC) videos is of great interest among biologists for quantified analysis of various states of hESC in experimental work. To date, the biologists who study hESC’s have to analyze stem cell videos manually. In this paper we introduce a hierarchical classification system consisting of Convolutional Neural Networks (CNN) and Triplet CNN’s to classify hESC images into six different classes. We also design an ensemble of Generative Adversarial Networks (GAN) for generating synthetic images of hESC’s. We validate the quality of the generated hESC images by training all of our CNN’s exclusively on the synthetic images generated by the GAN’s and evaluating them on the original hESC images. Experimental results shows that we classify the original hESC images, with an accuracy of 85.67% using the CNN alone, 91.38% accuracy using the CNN and Triplet CNN and 94.11% accuracy by fusing the outputs of the CNN and Triplet CNN’s, out performing existing state-of-the-art approaches.
|
|
14:00-16:00, Paper FrPMP.81 | |
2D and 3D Convolutional Neural Network Fusion for Predicting the Histological Grade of Hepatocellular Carcinoma |
Dou, Tianyou | Guangzhou Univ. of Chinese Medicine, |
Zhou, Wu | Guangzhou Univ. of Chinese Medicine |
Keywords: Medical image and signal analysis, Classification, Deep learning
Abstract: Preoperative Knowledge of the histological grade of hepatocellular carcinoma (HCC) is significant for patient management and prognosis in clinical practice. Recent studies reported that 3D Convolutional Neural Network (CNN) outperformed 2D CNN for lesion characterization. Since 2D and 3D deep feature derived from CNN embed different spatial information of neoplasm, the performance of lesion characterization might be improved if taking full advantage of both 2D and 3D characterization. In this work, we propose a 2D and 3D CNN fusion architecture to integrate both 2D and 3D spatial information of neoplasm for lesion characterization. Specifically, correlated and individual component analysis (CICA) is performed to fuse the 2D deep features in three orthogonal views and the 3D deep feature derived from 2D CNN and 3D CNN separately for lesion characterization. Experimental results of 46 clinical patients with HCCs demonstrate several encouraging features of the proposed 2D and 3D deep feature fusion framework as follows: (1) Fusion of 2D deep features derived from three orthogonal views using CICA yields better results than that of 3D deep feature for predicting the histological grade of HCC. (2) Fusion of 2D and 3D deep feature using CICA achieves better results than those based on 2D or 3D deep feature alone. (3) Deep feature fusion by CICA generates better results than those of the conventional deep feature concatenation and the method of deep correlation model.
|
|
14:00-16:00, Paper FrPMP.82 | |
Fully Automatic Segmentation of the Left Ventricle Using Multi-Scale Fusion Learning |
Yuan, Tianchen | Wuhan Univ |
Tong, Qianqian | Wuhan Univ |
Liao, Xiangyun | Shenzhen Inst. of Advanced Tech. Chinese Acad. of S |
Du, Xinling | Huazhong Univ. of Science and Tech |
Zhao, Jianhui | Wuhan Univ |
Keywords: Medical image and signal analysis, Deep learning
Abstract: Segmentation of the left ventricle (LV) is essential for quantitative calculation of clinical indices for analyzing the cardiac contractile function. However, it is challenging to automatically segment small-contour cardiac magnetic resonance (CMR) images for traditional convolutional neural networks (ConvNets) because of their low robustness to scale variation. In this paper, we propose a multi-scale fusion learning method to advance the performance of ConvNets for the LV segmentation. To realize our multi-scale fusion learning, single-scale input and multi-scale output (SIMO) networks are firstly trained to construct a SIMO-based multi-scale fusion network (SIMO-based MSF_Net). The trained SIMO networks produce different-scale coarse prediction results which are then fused into another multi-scale network. Finally, the coarse results are progressively refined to yield finer segmentation results. Our multi-scale fusion learning is evaluated on MICCAI 2009 challenging database for the LV segmentation. Experimental results demonstrate the robustness of our SIMO-based MSF_Net for the segmentation of challenging CMR images and the metric of "Good contours" achieves 98.35% on the testing set, which is greatly improved compared with the state-of-the-art methods.
|
|
14:00-16:00, Paper FrPMP.83 | |
Model-Based Graph Segmentation in 2-D Fluorescence Microscopy Images |
Abreu, Arnaud | Univ. of Strasbourg |
Frenois, François-Xavier | Inst. Univ. Du Cancer De Toulouse |
Valitutti, Salvatore | INSERM |
Brousset, Pierre | INSERM |
Denefle, Patrice | Inst. Roche |
Naegel, Benoît | Univ. De Strasbourg |
Wemmert, Cédric | Strasbourg Univ |
Keywords: Biological image and signal analysis, Segmentation, features and descriptors, Applications of pattern recognition and machine learning
Abstract: In biology and pathology, immunofluorescence microscopy approaches are leading techniques for deciphering molecular mechanisms of cell activation and disease progression. Although several solutions for image analysis exist, totally non-subjective image analysis remains difficult. There is therefore a strong need for analysis procedures highly reproducible, avoiding thresholds and selection of objects of interest by hand. To address this need, we describe a fully automatic segmentation of cell nuclei in 2-D immunofluorescence images. The method merges segments of the image to fit with a nuclei model learned by a trained Random Forest classifier. The merging procedure explores efficiently the fusion configurations space of an over-segmented image by using minimum spanning tree of its region adjacency graph.
|
|
14:00-16:00, Paper FrPMP.84 | |
Mammographic Mass Detection Based on Convolution Neural Network |
Li, Yanfeng | Beijing Jiaotong Univ |
Chen, Houjin | Beijing Jiaotong Univ |
Zhang, Linlin | Beijing Jiaotong Univ |
Cheng, Lin | Peking Univ. People's Hospital |
Keywords: Computer-aided detection and diagnosis, Medical image and signal analysis, Image processing and analysis
Abstract: Mammography is one of the broadly used imaging modality for breast cancer screening and detection. Locating mass from the whole breast is an important work in computer-aided detection. Traditionally, handcrafted features are employed to capture the difference between a mass region and a normal region. Recently convolution neural network (CNN) which automatically discovers features from the images shows promising results in many pattern recognition tasks. In this paper, three mass detection schemes based on CNN are evaluated. First, a suspicious region locating method based on heuristic knowledge is employed. Then three different CNN schemes are designed to classify the suspicious region as mass or normal. The proposed schemes are evaluated on a dataset of 352 mammograms. Compared with several handcrafted features, CNN-based methods shows better mass detection performance in terms of free receiver operating characteristic (FROC) curve.
|
|
14:00-16:00, Paper FrPMP.85 | |
Encoded Texture Features to Characterize Bone Radiograph Images |
Su, Ran | Tianjin Univ |
Chen, Weijun | Anyang Normal Univ |
Wei, Leyi | TIanjin Univ |
Li, Xiuting | Nanyang Tech. Univ |
Jin, Qiangguo | Tianjin Univ |
Tao, Wenyuan | Tianjin Univ |
Keywords: Medical image and signal analysis, Computer-aided detection and diagnosis, Modeling, simulation and visualization
Abstract: Osteoporosis is the most common reason that causes the fracture among the elderly. For the purpose of convenience and safety, 2D texture analysis has been used to diagnose osteoporosis. In this study, a supervised method using proposed texture features to identify osteoporotic cases from healthy was proposed. We designed two groups of new features, Encoded GLCM and Encoded LBP, each of which contains two subgroups through encoding the Gabor and Hessian information into the Gray Level Co-Occurrence Matrix (GLCM) features and Local Binary Patterns (LBP) features respectively. These two groups of features, together with the raw feature group containing the GLCM and LBP features, totally 560 features, were categorized into various groups and used to train the Random Forest classifier. Classification performances using these features were compared inter - and intra - groups/subgroups. And the performance using each individual feature was also provided. We conducted feature selection based on Recursive Feature Elimination (RFE) inside a voting scheme to further increase the efficiency. The inter - and intra - groups/subgroups results indicate that the Encoded GLCM and Encoded LBP, are more discriminative than the raw GLCM and LBP features for the identification of the osteoporosis; The best individual feature is from the Encoded LBP group and can achieve 70% of balanced accuracy; Furthermore, using only ten of the proposed features through feature selection, the balanced accuracy can even be improved from 60% to 71%. This shows that the proposed method is promising to assist the early diagnosis of osteoporosis.
|
|
14:00-16:00, Paper FrPMP.86 | |
Towards Personalized Autism Diagnosis: Promising Results |
Elnakieb, Yaser A. | Univ. of Louisville |
Nitzken, Matthew | Univ. of Louisville |
Shalaby, Ahmed | UofL |
Dekhil, Omar | Univ. of Louisville |
Mahmoud, Ali | Bioengineering Department, Univ. of Louisville, Louisville, |
Switala, Andrew | Univ. of Louisville |
Elmaghraby, Adel | Univ. of Louisville |
Keyntone, Robert | Univ. of Louisville |
Ghazal, Mohammed | Electrical and Computer Engineering Department, Abu Dhabi Univ |
Khalil, Ashraf | Computer Science and Information Tech. Department, Abu Dhab |
Barnes, Gregory | Univ. of Louisville |
El-Baz, Ayman | Univ. of Louisville |
Keywords: Computer-aided detection and diagnosis, Medical image and signal analysis
Abstract: The ultimate goal of this paper is to develop a novel personalized comprehensive computer aided diagnostic (CAD) system for precise diagnosis of autism spectrum disorder (ASD) based on the 3D shape analysis of the cerebral cortex (Cx). To achieve the main goal of the proposed system, we used structural MRI modality (sMRI) to be able to extract the shape features of the brain cortex. After segmenting the brain cortex from sMRI, we used a spherical harmonics analysis to measure the surface complexity, in addition to studying surface curvatures. Finally,amulti-stagedeepnetworkbasedonseveralautoencoders and softmax classifiers is constructed to provide the final global diagnosis. The presented CAD system was tested on several datasets, achieving an average accuracy of 92.15%. In addition to its global diagnostic accuracy, the local diagnostic accuracies of the most significant areas also demonstrated the ability of the proposed system to construct very promising local maps of ASD-related brain abnormalities, which can be considered an important step towards personalized medicine for autistic individuals.
|
|
14:00-16:00, Paper FrPMP.87 | |
WGAN Latent Space Embeddings for Blast Identification in Childhood Acute Myeloid Leukaemia |
Licandro, Roxane | TU Wien, Medical Univ. of Vienna |
Schlegl, Thomas | Medical Univ. of Vienna |
Reiter, Michael | TU Wien |
Diem, Markus | Vienna Univ. of Tech |
Dworzak, Michael | Medical Univ. of Vienna, Labdia Lab. GmbH |
Schumich, Angela | Labdia Lab. GmbH |
Langs, Georg | Medical Univ. of Vienna |
Kampel, Martin | Vienna Univ. of Tech |
Keywords: Medical image and signal analysis, Semi-supervised learning, Dimensionality reduction
Abstract: Acute Myeloid Leukaemia (AML) is a rare type of childhood acute leukaemia. During treatment, the assessment of the number of cancer cells is particularly important to determine treatment response and consequently adapt the treatment scheme if necessary. Minimal Residual Disease (MRD) is a diagnostic measure based on Flow CytoMetry (FCM) data that captures the amount of blasts in a blood sample and is a clinical tool for planning patients’ individual therapy, which requires reliable blast identification. In this work we propose a novel semi-supervised learning approach, which is acquired whenever large amounts of unlabeled data and only a small amount of annotated data is available. The proposed semi-supervised learning approach is based on Wasserstein Generative Adversarial Network (WGAN) latent space embeddings learned in an unsupervised fashion and a simple fully connected neural network (FNN) trained on labeled data leveraging the learned embedding. We apply our proposed learning approach for semi-supervised classification of blasts vs. non-blasts. We compare our approach with two baseline approaches, 1) semi-supervised learning based on Principal Component Analysis (PCA) embedding, and 2) a deep FNN that is trained only on the annotated data without leveraging an embedding. Results suggest that our proposed semi-supervised WGAN embedding outperforms semi-supervised learning based on PCA embeddings and if only small amounts of annotated data is available it even outperforms an FNN classifier.
|
|
14:00-16:00, Paper FrPMP.88 | |
Nonlocal Low-Rank and Total Variation Constrained PET Image Reconstruction |
Xie, Nuobei | Zhejiang Univ |
Chen, Yunmei | Univ. of Florida |
Liu, Huafeng | Zhejiang Univ |
Keywords: Medical image and signal analysis, Biological image and signal analysis
Abstract: Many efforts have been made for decades in order to improve the accuracy of radioactivity map in positron emission tomography (PET) images, which has important clinical implications for better diagnosis and understanding of diseases. However, there is still a challenging problem for reconstructing high resolution image with the limited acquired photon counts. In this paper, we present a nonlocal self-similar constraint for the purpose of exploiting structured sparsity within the PET reconstructed images. It is based on image patches and approached by low-rank approximation. Moreover, we adopt total variation regulation into our method to further denoise and compensate the demerits inherited in patch-based methods. These two regulation terms are firstly employed in the Poisson model, and are jointly solved in a distributed optimization framework. Experiments have presented that our proposed PNLTV method substantially outperforms existing state-of-the-art methods in PET reconstruction.
|
|
14:00-16:00, Paper FrPMP.89 | |
Automatic Multi-Atlas Segmentation for Abdominal Images Using Template Construction and Robust Principal Component Analysis |
Zhao, Yu | Tech. Univ. München |
Li, Hongwei | Tech. Univ. of Munich |
Zhou, Rong | Southeast Univ. Inst. of Automation, Chinese Acad. O |
Tetteh, Giles | Tech. Univ. Muenchen |
Menze, Bjoern | Tum |
Keywords: Medical image and signal analysis, Computer-aided detection and diagnosis, Biological image and signal analysis
Abstract: The automatic and accurate segmentation of different organs is a critical step for computer-aided diagnosis, treatment planning, and clinical decision support. However, for small organs such as the gallbladder, pancreas, and thyroid, accurate segmentation remains challenging due to their high anatomical variability, and inhomogeneity. This paper presents a new fully automated multi-atlas segmentation approach to segment small organs using template construction, robust principal component analysis, and K-nearest neighbor classifier. Additionally, a patch-based pipeline is employed to further improve the segmentation accuracy. Qualitative and quantitative evaluation has been evaluated on the VISCERAL challenge dataset. Experimental results show that the proposed system outperforms other multi-atlas based methods and forest-based methods in the segmentation of small organs.
|
|
14:00-16:00, Paper FrPMP.90 | |
Early Diagnosis of Diabetic Retinopathy in OCTA Images Based on Local Analysis of Retinal Blood Vessels and Foveal Avascular Zone |
Eladawi, Nabila | Information Systems Dept., Faculty of Computers and Information, |
Elmogy, Mohammed | Faculty of Computers and Information, Mansoura Univ |
Fraiwan, Luay | Electrical and Computer Engineering Department, Abu Dhabi Univ |
Pichi, Francesco | Cleveland Clinic, Abu Dhabi, UAE |
Ghazal, Mohammed | Electrical and Computer Engineering Department, Abu Dhabi Univ |
Abouelfetouh, Ahmed | Information Systems Dept., Faculty of Computers and Information, |
Riad, Alaa | Information Systems Dept., Faculty of Computers and Information, |
Keyntone, Robert | Univ. of Louisville |
Schaal, Shlomit | Department of Ophthalmology & Visual Sciences, Univ. of Ma |
El-Baz, Ayman | Univ. of Louisville |
Keywords: Medical image and signal analysis, Computer-aided detection and diagnosis, Image processing and analysis
Abstract: This paper introduces a diagnosis system for detecting early signs of diabetic retinopathy (DR) using optical coherence tomography angiography (OCTA) images. We developed a segmentation technique that was able to extract blood vessels from both retinal superficial and deep maps. It is based on a higher order joint Markov-Gibbs random field (MGRF) model, which combines both current and spatial appearance information of retinal blood vessels. To be able to train/test a support vector machine (SVM) classifier, three local features were extracted from the segmented images. These extracted features are the density and appearance of the retinal blood vessels in addition to the distance map of the foveal avascular zone (FAZ). Then, we used SVM with linear kernel to distinguish sub-clinical DR patients from normal cases. By using 105 subjects, the presented computer-aided diagnosis (CAD) system demonstrated an overall accuracy (ACC) of 97.3% and a Dice similarity coefficient (DSC) of 97.9%.
|
|
14:00-16:00, Paper FrPMP.91 | |
A Novel Two-Stage Deep Method for Mitosis Detection in Breast Cancer Histology Images |
Ma, Minglin | Nanjing Univ |
Shi, Yinghuan | Nanjing Univ |
Li, Wenbin | Nanjing Univ |
Gao, Yang | Nanjing Univ |
Xu, Jun | Nanjing Univ. of Information Science and Tech |
Keywords: Medical image and signal analysis, Computer-aided detection and diagnosis
Abstract: The accurate detection and counting of mitosis in breast cancer histology images is very important for computer-aided diagnosis, which is manually completed by the pathologist according to her or his clinic experience. However, this procedure is extremely time consuming and tedious. Moreover, it always results in low agreement among different pathologists. Although several computer-aided detection methods have been developed recently, they suffer from high FN (false negative) and FP (false positive) with simply treating the detection task as a binary classification problem. In this paper, we present a novel two-stage detection method with multi-scale and similarity learning convnets (MSSN). Firstly, large amount of possible candidates will be generated in the first stage in order to reduce FN (i.e., prevent treating mitosis as non-mitosis), by using the different square and non-square filters, to capture the spatial relation from different scales. Secondly, a similarity prediction model is subsequently performed on the obtained candidates for the final detection to reduce FP, which is realized by imposing a large margin constraint. On both 2014 and 2012 ICPR MITOSIS datasets, our MSSN achieved a promising result with a highest Recall (outperforming other methods by a large margin) and a comparable F-score.
|
|
14:00-16:00, Paper FrPMP.92 | |
A New 3D CNN-Based CAD System for Early Detection of Acute Renal Transplant Rejection |
Abdeltawab, Hisham | Univ. of Louisville |
Shehata, Mohamed | BioImaging Lab, Bioengineering Department, Univ. of Louisvi |
Shalaby, Ahmed | UofL |
Mesbah, Samineh | Univ. of Louisville |
El-Baz, Maryam | BioImaging Lab, Bioengineering Department, Univ. of Louisvi |
Ghazal, Mohammed | Electrical and Computer Engineering Department, Abu Dhabi Univ |
Al Khalil, Yasmina | Electrical and Computer Engineering Department, Abu Dhabi Univ |
Abou El-Ghar, Mohamed | Urology and Nephrology Department, Univ. of Mansoura, Egypt |
Dwyer, Amy | Kidney Transplantation--Kidney Disease Center, Univ. of Lou |
El-Melegy, Moumen | Department of Electrical Engineering, Assiut Univ. Assiut |
El-Baz, Ayman | Univ. of Louisville |
Keywords: Computer-aided detection and diagnosis, Neural networks, Classification
Abstract: Recently, diffusion-weighted magnetic resonance imaging (DW-MRI) has been explored for non-invasive assessment of renal transplant function. In this paper, a computer-aided diagnostic (CAD) system based on a 3D convolutional neural network (CNN) is developed to assess renal transplant functionality using diffusion MRI-derived markers extracted from (3D + b-value) DW-MRI. Our framework performs the following image processing steps: (i) 3D DW-MRI kidney segmentation using a level-set approach guided by shape and visual appearance features; (ii) feature extraction, namely, voxel-wise apparent diffusion coefficients (3D ADCs) of the segmented DW-MRIs are estimated at different b-values (i.e. gradient field strengths and duration); and (iii) renal transplant status classification, while the extracted 3D ADCs are used as input to train and test a 3D CNN-based classifier to evaluate renal transplant status. The proposed CAD system achieved a 94% accuracy, a 94% sensitivity, and a 94% specificity using a leave-one-subject-out cross-validation scenario in distinguishing non-rejection (NR) from acute rejection (AR) renal transplants. These preliminary results hold a strong promise that the presented CAD system is of a high reliability to non-invasively diagnose renal transplant status.
|
|
14:00-16:00, Paper FrPMP.93 | |
Automatic Quantification of Stomata for High-Throughput Plant Phenotyping |
Bhugra, Swati | Indian Inst. of Tech. Delhi |
Mishra, Deepak | Indian Inst. of Tech. Delhi |
Anupama, Anupama | IIT Delhi |
Chaudhury, Santanu | Indian Inst. of Tech. Delhi |
Lall, Brejesh | Indian Inst. of Tech. Delhi |
Chugh, Archana | IIT Delhi |
Keywords: Biological image and signal analysis, Applications of pattern recognition and machine learning
Abstract: Stomatal morphology is a key phenotypic trait for plants’ response analysis under various environmental stresses (e.g. drought, salinity etc.). Stomata exhibit diverse characteristics with respect to orientation, size, shape and varying degree of papillae occlusion. Thus, the biologists currently rely on manual or semi-automatic approaches to accurately compute its morphological traits based on scanning electron microscopic (SEM) images of leaf surface. In contrast to these subjective and low-throughput methods, we propose a novel automated framework for stomata quantification. It is realized based on a hybrid approach where the candidate stomata region is first detected by a convolutional neural network (CNN) and the occlusion is dealt with an inpainting algorithm. In addition, we propose stomata segmentation based quantification framework to solve the problem of shape, scale and occlusion in an end-to-end manner. The performance of the proposed automated frameworks is evaluated by comparing the derived traits with manually computed morphological traits of stomata. With no prior information about its size and location, the hybrid and end-to-end machine learning frameworks shows a correlation of 0.94 and 0.93, respectively on rice stomata images. Furthermore, they successfully enable wheat stomata quantification showing generalizability in terms of cultivars.
|
|
14:00-16:00, Paper FrPMP.94 | |
Spatial Pyramid Dilated Network for Pulmonary Nodule Malignancy Classification |
Zhang, Guokai | Tongji Univ |
Luo, Ye | Tongji Univ |
Zhu, Dandan | Tongji Univ |
Xu, Yixuan | Tongji Univ |
Sun, Yunxin | Tongji Univ |
Lu, Jianwei | Tongji Univ |
Keywords: Medical image and signal analysis, Deep learning, Classification
Abstract: Lung cancer has been the most prevalent cancer in the world and an effective way to diagnose the cancer at the early stage is to detect the pulmonary nodule by computer-aided system. However, the size of the pulmonary nodules varies and the one with small diameter is generally one of the most difficult cases to diagnose. Under this condition, traditional convolution network based nodule classification methods fail to achieve satisfied result due to the miss of tiny but vital features by the pooling operation. To tackle this problem, we propose a novel 3D spatial pyramid dilated convolution network to classify the malignancy of the pulmonary nodules. Instead of using the pooling layers, we utilize the 3D dilated convolution to capture and preserve more detailed characteristic information of the nodules. Moreover, a multiple receptive field fusion strategy is applied to extract the multi-scale features from the nodule CT images. Extensive experimental results show that our model achieves a better result with an accuracy of 88.6% which outperforms other state-of-the-art methods.
|
|
14:00-16:00, Paper FrPMP.95 | |
Breast Segmentation in MRI Via U-Net Deep Convolutional Neural Networks |
Piantadosi, Gabriele | Federico II Di Napoli |
Sansone, Mario | Univ. of Naples Federico II |
Sansone, Carlo | Univ. of Naples Federico II |
Keywords: Medical image and signal analysis, Computer-aided detection and diagnosis, Deep learning
Abstract: Dynamic Contrast Enhanced-Magnetic Resonance Imaging (DCE-MRI) has demonstrated, in recent years, a great potential as a complementary diagnostic method for early detection and diagnosis of breast cancer. However, due to the large amount of data, DCE-MRI manual inspection is error prone and can hardly be handled without the use of a Computer Aided Diagnosis (CAD) system. In a typical CAD processing, the segmentation of the breast parenchyma is a crucial stage aimed to reduce computational effort and to increase reliability. In the last years, deep convolutional networks have outperformed the state-of-the-art in many visual tasks, such as image classification and object recognition. However, very few proposals based on a deep learning approach have been applied so far for segmentation tasks in the biomedical field. The aim of this work is to apply a suitably modified convolutional neural network for fully-automating the non-trivial breast tissues segmentation task in 3D MR data, in order to accurately segment breast parenchyma from the air and other tissues (such as chest-wall). The proposed approach has been validated over 42 DCE-MRI studies. The median segmentation accuracy and Dice similarity index were 98.93 (±0,15) and 95.90 (±0,74) respectively with p < 0.05, and 100% of neoplastic lesion coverage.
|
|
14:00-16:00, Paper FrPMP.96 | |
A Novel ADCs-Based CNN Classification System for Precise Diagnosis of Prostate Cancer |
Reda, Islam | Mansoura Univ. - Univ. of Louisville |
Ghazal, Mohammed | Electrical and Computer Engineering Department, Abu Dhabi Univ |
Shalaby, Ahmed | UofL |
Elmogy, Mohammed | Faculty of Computers and Information, Mansoura Univ |
Abouelfetouh, Ahmed | Information Systems Dept., Faculty of Computers and Information, |
Ayinde, Babajide | Univ. of Louisville |
Abou El-Ghar, Mohamed | Urology and Nephrology Department, Univ. of Mansoura, Egypt |
Elmaghraby, Adel | Univ. of Louisville |
Keyntone, Robert | Univ. of Louisville |
El-Baz, Ayman | Univ. of Louisville |
Keywords: Computer-aided detection and diagnosis, Image classification, Deep learning
Abstract: This paper addresses the issue of early diagnosis of prostate cancer from diffusion-weighted magnetic resonance imaging (DWI) using a convolutional neural network (CNN) based computer-aided diagnosis (CAD) system. The proposed CNN-based CAD system first segments the prostate using a geometric deformable model. The evolution of this model is guided by a stochastic speed function that exploits first- and second-order appearance models besides shape prior. The fusion of these guiding criteria is accomplished using a nonnegative matrix factorization (NMF) model. Then, the apparent diffusion coefficients (ADCs) within the segmented prostate are calculated at each b-value. They are used as imaging markers for the blood diffusion of the scanned prostate. For the purpose of classification/diagnosis, a three dimensional CNN has been trained to extract the most discriminatory features of these ADC maps for distinguishing malignant from benign prostate tumors. The performance of the proposed CNN-based CAD system is evaluated using DWI datasets acquired from 45 patients (20 benign and 25 malignant) at seven different b-values. The acquisition of these DWI datasets is performed using two different scanners with different magnetic field strengths (1.5 Tesla and 3 Tesla). The conducted experiments on in-vivo data confirm that the use of ADCs makes the proposed system invariant to the magnetic field strength.
|
|
14:00-16:00, Paper FrPMP.97 | |
An Efficient Approach for Polyps Detection in Endoscopic Videos Based on Faster R-CNN |
Mo, Xi | Univ. of Kansas |
Tao, Ke | The First Hospital of Jilin Univ |
Wang, Quan | The First Hospital of Jilin Univ |
Wang, Guanghui | Univ. of Kansas |
Keywords: Medical image and signal analysis, Deep learning for multimedia analysis, Object detection
Abstract: Polyp has long been considered as one of the major etiologies to colorectal cancer which is a fatal disease around the world, thus early detection and recognition of polyps plays an crucial role on clinical routines. Accurate diagnoses of polyps by endoscopes operated by physicians becomes a chanllenging task not only due to the varying expertise of physicians, but also the inherent nature of endoscopic inspections. To facilitate this process, computer-aid techniques that emphasize on fully-conventional image processing and novel machine learning enhanced approaches have been dedicatedly designed for polyp detection in endoscopic videos or images. Among all proposed algorithms, deep learning based methods take the lead in terms of multiple metrics in evolutions for algorithmic performance. In this work, a highly effective model, namely the faster region-based convolutional neural network (Faster R-CNN) is implemented for polyp detection. In comparison with the reported results of the state-of-the-art approaches on polyps detection, extensive experiments demonstrate that the Faster R-CNN achieves very competing results, and it is an efficient approach for clinical practice.
|
|
14:00-16:00, Paper FrPMP.98 | |
A Hybrid Framework for Tumor Saliency Estimation |
Xu, Fei | Utah State Univ |
Xian, Min | Univ. of Idaho |
Zhang, Yingtao | Harbin Inst. of Tech |
Huang, Kuan | Utah State Univ |
Cheng, Heng-Da | Utah State Univ |
Zhang, Boyu | Utah State Univ |
Ding, Jianrui | Harbin Inst. of Tech |
Ning, Chunping | The Affiliated Hospital of Qingdao Univ |
Wang, Ying | The Second Hospital of Hebei Medical Univ |
Keywords: Medical image and signal analysis, Modeling, simulation and visualization, Object detection
Abstract: Automatic tumor segmentation of breast ultrasound (BUS) image is quite challenging due to the complicated anatomic structure of breast and poor image quality. Most tumor segmentation approaches achieve good performance on BUS images collected in controlled settings; however, the performance degrades greatly with BUS images from different sources. Tumor saliency estimation (TSE) has attracted increasing attention to solve the problem by modeling radiologists’ attention mechanism. In this paper, we propose a novel hybrid framework for TSE, which integrates both high-level domain-knowledge and robust low-level saliency assumptions and can overcome drawbacks caused by direct mapping in traditional TSE approaches. The new framework integrated the Neutro-Connectedness (NC) map, the adaptive-center, the correlation and the layer structure-based weighted map. The experimental results demonstrate that the proposed approach outperforms state-of-the-art TSE methods.
|
|
14:00-16:00, Paper FrPMP.99 | |
Nuclei Segmentation of Cervical Cell Images Based on Intermediate Segment Qualifier |
Wang, Rui | Waseda Univ. Japan |
Kamata, Sei-ichiro | Waseda Univ |
Keywords: Medical image and signal analysis
Abstract: The accurate nuclei segmentation of cervical cell images is a very vital step in automated cervical diseases diagnosis. However, segmentation challenges exist because of problems such as nuclei embedment into cytoplasm folding or overlapping areas, impurity interference, low contrast and nuclei variation in shape and size. These problems can cause the nuclei segmentation results not so ideal. This paper presents an automated method for cells nuclei detection in cervical cell images. We propose an intermediate segment qualifier (ISQ) to categorize the nuclei segmentation results after the nuclei segmentation based on the integration of convolutional neural network (CNN) and simple linear iterative clustering (SLIC) superpixel method. Then we apply a gradient vector flow (GVF) snake model for further refinement. We evaluate the proposed method using the ISBI 2014 challenge dataset. In the experiments, we demonstrate that our method performs well and is preferable to the state-of-the-art approaches.
|
|
14:00-16:00, Paper FrPMP.100 | |
SequentialSegNet: Combination with Sequential Feature for Multi-Organ Segmentation |
Zhang, Yao | Inst. of Computing Tech. Chinese Acad. of Sciences |
Jiang, Xuan | Lenovo Res. Beijing |
Zhong, Cheng | Lenovo |
Zhang, Yang | Lenovo Company |
Shi, Zhongchao | Lenovo Company |
Li, Zhensheng | Lenovo Res |
He, Zhiqiang | Lenovo Company |
Keywords: Medical image and signal analysis, Computer-aided detection and diagnosis, Deep learning
Abstract: Multi-organ segmentation from computed tomography (CT) images is essential for computer aided diagnosis (CAD), and recent advances in fully convolutional networks (FCNs) for volumetric image segmentation have demonstrated the importance of leveraging spatial information. In this paper, we propose a novel framework called SequentialSegNet, which efficiently combines features within a single CT image (intra- slice) and among multiple adjacent images (inter-slice) for a multi-organ segmentation. Experimental results show that our approach can effectively improve the segmentation performance on both large-size and small-size abdominal organs including liver, spleen and gallbladder.
|
| |