| |
Last updated on August 12, 2018. This conference program is tentative and subject to change
Technical Program for Wednesday August 22, 2018
|
WeAMOT1 |
Ballroom C, 1st Floor |
WeAMOT1 Deep Learning 2 (Ballroom C, 1st Floor) |
Oral Session |
|
11:10-11:30, Paper WeAMOT1.1 | |
Deep Temporal Feature Encoding for Action Recognition |
Li, Lin | CASIA |
Zhang, Zhaoxiang | Inst. of Automation, Chinese Acad. of Sciences |
Huang, Yan | Inst. of Automation, Chinese Acad. of Sciences |
Wang, Liang | Inst. of Automation, Chinese Acad. of Sciences |
Keywords: Deep learning, Video processing and analysis, Human behavior analysis
Abstract: Human action recognition is an important task in computer vision. Recently, deep learning methods for video action recognition have developed rapidly. A popular way to tackle this problem is known as two-stream methods which take both spatial and temporal modalities into consideration. These methods often treat sparsely-sampled frames as input and video labels as supervision. Because of such sampling strategy, they are typically limited to processing shorter sequences, which might cause the problems such as suffering from the confusion by partial observation. In this paper we propose a novel video feature representation method, called Deep Temporal Feature Encoding (DTE). It could aggregate frame-level features into a robust and global video-level representation. Firstly, we sample enough RGB frames and optical flow stacks across the whole video. Then we use a deep temporal feature encoding layer to construct a strong video feature. Lastly, end-to-end training is applied so that our video representation could be global and sequence-aware. Comprehensive experiments are conducted on two public datasets: HMDB51 and UCF101. Experimental results demonstrate that DTE achieves the competitive state-of-the-art performance on both datasets.
|
|
11:30-11:50, Paper WeAMOT1.2 | |
Learning an Order Preserving Image Similarity through Deep Ranking |
Gupta, Nitin | IBM Res |
Mujumdar, Shashank | IBM Res. India |
Samanta, Suranjana | IBM Res |
Mehta, Sameep | IBM Res |
Keywords: Deep learning, Applications of computer vision, Multimedia analysis, indexing and retrieval
Abstract: Recently, deep learning frameworks have been shown to learn a feature embedding that captures fine-grained image similarity using image triplets or quadruplets that consider pairwise relationships between image pairs. In real-world datasets, a class contains fine-grained categorization that exhibits within-class variability. In such a scenario, these frameworks fail to learn the relative ordering between - (i) samples belonging to the same category, (ii) samples from a different category within a class and (iii) samples belonging to a different class. In this paper, we propose the quadlet loss function, that learns an order-preserving fine-grained image similarity by learning through quadlets (query:q, positive:p, intermediate:i, negative:n) where p is sampled from the same category as q, i belongs to a fine-grained category within the class of q and n is sampled from a different class than that of q. We propose a deep quadlet network to learn the feature embedding using the quadlet loss function. We present an extensive evaluation of our proposed ranking model against state-of-the-art baselines on three datasets with fine-grained categorization. The results show significant improvement over the baselines for both order-preserving fine-grained ranking task and general image ranking task.
|
|
11:50-12:10, Paper WeAMOT1.3 | |
Anomaly Detection Via Minimum Likelihood Generative Adversarial Networks |
Wang, Chu | Inst. of Automation, Chinese Acad. of Sciences |
Zhang, Yan-Ming | Inst. of Automation, Chinese Acad. of Sciences |
Liu, Cheng-Lin | Inst. of Automation, Chinese Acad. of Sciences |
Keywords: Deep learning, Data mining
Abstract: Anomaly detection aims to detect abnormal events by a model of normality. It plays an important role in many domains such as network intrusion detection, criminal activity identity and so on. With the rapidly growing size of accessible training data and high computation capacities, deep learning based anomaly detection has become more and more popular. In this paper, a new domain-based anomaly detection method based on generative adversarial networks (GAN) is proposed. Minimum likelihood regularization is proposed to make the generator produce more anomalies and prevent it from converging to normal data distribution. Proper ensemble of anomaly scores is shown to improve the stability of discriminator effectively. The proposed method has achieved significant improvement than other anomaly detection methods on Cifar10 and UCI datasets.
|
|
12:10-12:30, Paper WeAMOT1.4 | |
Deep Generative Adversarial Networks for the Sparse Signal Denoising |
Wu, Kailun | Tsinghua Univ |
Zhang, Changshui | Tsinghua Univ |
Keywords: Sparse learning, Neural networks, Deep learning
Abstract: In many practical denoising problems, noisy signal contains lots of sparse information, which is helpful for denoising. However, common denoising methods such as low-pass filtering methods and Wavelet denoising methods have a number of limitations like the rigid projection space or loss of high frequency component. In this paper, we propose a deep learning framework based on Generative Adversarial Networks (GANs) to deal with the sparse denoising tasks. We design the Generative Network (G-net) as denoising model with three parts, which are decoder part, denoising part and linear recovery part. To maintain the original features of the data, we utilize the Discriminator Network (D-net) to help the denoising model G-net learn. The experimental results show that our framework is more effective than some traditional methods and state-of-art deep learning methods. In particular, sparse denoising GANs can recover details of picture better in the MNIST image tasks.
|
|
WeAMOT2 |
309B, 3rd Floor |
WeAMOT2 Low Level Vision (309B, 3rd Floor) |
Oral Session |
|
11:10-11:30, Paper WeAMOT2.1 | |
Spatially Coherent Matching for Robust Registration |
Wang, Gang | Shanghai Univ. of Finance and Ec |
Chen, Yufei | Tongji Univ |
Zhang, Haotian | Tongji Univ |
Keywords: Multiple view geometry, Graph matching, Image processing and analysis
Abstract: In order to solve the registration problem, we propose a robust method called Spatially Coherent Matching (SCM), where it can get the underlying correspondences from the given putative sets of feature points for robust matching, and estimate the transformation for robust registration. Recovering correct matches and fitting transformations between image pairs are key components in the field of pattern recognition. The proposed SCM starts with a putative correspondence set which is contaminated by degradations (e.g., occlusion, deformation, rotation, and outliers), and the main goal is to identify the true correspondences and estimate the underlying transformation. Then we formulate this challenging problem by the spatially coherent matching model with a robust exponential distance loss and a spatial constraint. Based on the regularization theory, SCM preserves the topological structure of the adjacent features. Moreover, a sparse approximation strategy is used to improve the efficiency. Finally, the experimental results reveal that the proposed method outperforms current state-of-the-art methods in most test scenarios on several real image datasets and synthesized datasets.
|
|
11:30-11:50, Paper WeAMOT2.2 | |
Masked Label Learning for Optical Flow Regression |
Yang, Guorun | Tsinghua Univ |
Deng, Zhidong | Tsinghua Univ |
Wang, Shiyao | School |
Li, Zeping | Tsinghua Univ |
Keywords: Motion and tracking, Low-level vision, Multiple view geometry
Abstract: Optical flow estimation is a challenging task in computer vision. Recent methods formulate such task as a supervised learning problem. But it often suffers from limited realistic ground truth. In this paper, a compact network, embedded with cost volume, residual encoder and deconvolutional decoder, is presented to regress optical flow in an end-to-end manner. To overcome the lack of flow labels, we propose a novel data-driven strategy called masked label learning, where a large amount of masked labels are generated from the FlowNet 2.0 model and filtered by warping calibration for model training. We also present an extended-Huber loss to handle large displacements. With pretraining on massive masked flow data, followed by finetuning on a small number of sparse labels, our method achieves state-of-the-art accuracy on KITTI flow benchmark.
|
|
11:50-12:10, Paper WeAMOT2.3 | |
Saliency Guided Fast Interpolation for Large Displacement Optical Flow |
Zu, Yueran | Beihang Univ |
Gao, Ke | Inst. of Computing Tech. of Chinese Acad. of Sciences |
Bao, Xiuguo | CNCERT |
Tang, Wenzhong | Beihang Univ |
Keywords: Motion and tracking, Low-level vision, Video processing and analysis
Abstract: The optical flow estimation is still an open question nowadays. One of the bottlenecks of it is the interpolation speed. In this paper, a saliency guide fast interpolation method is proposed which is more than about 2 times faster than the traditional one. The method runs on CPU without any supervision or semantic segmentation information. To make it faster, a fast saliency detection method is introduced to separate the image into two parts. The non-saliency superpixels are interpolated faster with random search only. The salient superpixels are interpolated by propagation and random search. To keep it accurate, the relative initial movement is used to guide the search area when computing the affine model. A soft affine model evaluation is introduced to make the optical flow result more robust. Extensive experiments on challenging datasets MPI-Sintel and KITTI-15 show that our method is efficient and effective.
|
|
12:10-12:30, Paper WeAMOT2.4 | |
BTF Compound Texture Model with Non-Parametric Control Field |
Haindl, Michael | Inst. of Information Theory and Automation |
Havlicek, Vojtech | Inst. of Information Theory and Automation |
Keywords: Illumination and reflectance modeling, Image based modeling, Physics-based vision
Abstract: This paper introduces a novel multidimensional statistical model for realistic modeling, enlargement, editing, and compression of the recent state-of-the-art bidirectional texture function (BTF) textural representation. The presented multispectral compound Markov random field model (CMRF) efficiently fuses a non-parametric random field model with several parametric random fields models. The primary purpose of our modeling texture approach is to reproduce, compress, and enlarge a given measured natural or artificial texture image so that ideally both natural and synthetic texture will be visually indiscernible for any observation or illumination directions. However, the model can be easily applied for BFT material texture editing as well. The CMRF model consists of several parametric sub-models each having different characteristics along with an underlying switching structure model which controls transitions between these sub models. The proposed model uses the non-parametric random field for distributing local texture models in the form of analytically solvable wide-sense BTF Markov representation for single regions among the fields of a mosaic approximated by the random field structure model. The non-parametric control field of BTF-CMRF is reiteratively generated to guarantee identical region-size histograms for all material sub-classes present in the target example texture. The local texture regions (not necessarily continuous) are represented by analytical BTF models modeled by the adaptive 3D causal auto-regressive (3DCAR) random field model which can be analytically estimated as well as synthesized. The visual quality of the resulting complex synthetic textures generally surpasses the outputs of the previously published simpler non-compound BTF-MRF models. The model allows reaching huge compression ratio incomparable with any standard image compression method.
|
|
WeAMOT3 |
310, 3rd Floor |
WeAMOT3 Image Analysis and Segmentation (310, 3rd Floor) |
Oral Session |
|
11:10-11:30, Paper WeAMOT3.1 | |
Convexity Invariance of Voxel Objects under Rigid Motions |
Ngo, Phuc | LORIA - Lorraine Univ |
Passat, Nicolas | Univ. De Reims Champagne-Ardenne |
Kenmochi, Yukiko | Univ. Paris-Est |
Debled-Rennesson, Isabelle | LORIA - Nancy Univ |
Keywords: Image processing and analysis, Shape modeling and encoding, Vision for graphics
Abstract: Volume data can be represented by voxels. In many applications of computer graphics (eg., animation, simulation) and image processing (eg., shape registration), such voxel data require manipulations. Among the simplest manipulations, we are interested in rigid motions, namely motions that do not change the shape of voxel objects but do change their position and orientation. Such motions are well-known as isometric transformations in continuous spaces. However, when they are applied on voxel data, some important properties of geometry and topology are generally lost. In this article, we discuss this issue, and we provide a method for rigid motions of voxel objects that preserves the global convexity properties of objects, with digital topology guarantees. This method is based on the standard notion of H-convexity, and a new notion of quasi-regularity.
|
|
11:30-11:50, Paper WeAMOT3.2 | |
Multi-Scale Cross-Band Encoding of Sectored Local Binary Pattern for Robust Texture Classification |
Song, Tiecheng | Chongqing Univ. of Posts and Telecommunications |
Luo, Lin | Chongqing Univ. of Posts and Telecommunications |
Xin, Liangliang | Chongqing Univ. of Posts and Telecommunications |
Gao, Chenqiang | School of Communication and Information Engineering, Chongqing U |
Keywords: Texture analysis, Image classification, Classification
Abstract: The original Local Binary Pattern (LBP) has limited discriminative power and is sensitive to noise. In view of this, this paper proposes a novel image descriptor called Multi-Scale Cross-Band Encoding of Sectored Local Binary Pattern (MCE-SLBP) for robust texture classification. First, the pyramid decomposition is explored to obtain multi-scale low-frequency and high-frequency (difference) images. To encode more discriminative features, these high-frequency images are further decomposed into positive and negative high-frequency images via the polarity splitting. Then, a robust Sectored Local Binary Pattern (SLBP) is proposed to compute texture feature codes on the decomposed images via cross-band joint coding. Finally, a multi-scale histogram representation is obtained by concatenating histograms of texture codes computed at all decomposition levels. Experiments on three benchmark texture databases (i.e., Outex, Brodatz and CUReT) demonstrate that the proposed method achieves the state-of-the-art classification accuracies both under noise-free conditions and in the presence of different levels of Gaussian noise.
|
|
11:50-12:10, Paper WeAMOT3.3 | |
Locality Preserving Discriminative Complex-Valued Latent Variable Model |
Chen, Sih-Huei | National Central Univ |
Lee, Yuan-Shan | National Central Univ |
Wang, Jia-Ching | National Central Univ |
Keywords: Emotion recognition, Image classification, Dimensionality reduction
Abstract: Techniques for analyzing complex-valued data are required in numerous fields, such as signal processing. This work develops a novel complex-valued latent variable model, named locality-preserving discriminative complex-valued Gaussian process latent variable model (LPD-CGPLVM), for discovering a compressed complex-valued representation of data. The developed LPD-CGPLVM operates on the complex-valued domain. Additionally, we attempt to preserve both global and local data structures while promoting discrimination. A new objective function that imposes a locality-preserving and a discriminative term for complex-valued data is presented. Complex-valued gradient descent is then utilized to obtain a complex-valued representation of high-dimensional data and the hyperparameters in the LPD-CGPLVM. The proposed method was evaluated using two pattern recognition applications- face recognition with occlusion and music emotion recognition. The experimental results thus obtained demonstrated the superior accuracy of the proposed method, especially for situations with only a small number of training data.
|
|
12:10-12:30, Paper WeAMOT3.4 | |
Segmentation Edit Distance |
Pucher, Daniel | TU Wien |
Kropatsch, Walter | TU Vienna |
Keywords: Image processing and analysis, Segmentation, features and descriptors
Abstract: In this paper, we present a novel distance metric called Segmentation Edit Distance (SED) and its use as a segmentation evaluation metric. In segmentation evaluation, the difference or distance of a test segmentation and the associated ground truth segmentation are measured in order to compare different algorithms. Our proposed edit distance extends the idea of other edit distances such as the string edit distance or the graph edit distance to the domain of image segmentations. The distance is based on the cost of edit operations that are needed to transform one segmentation into another. Only one edit operation, the deletion of an error region, is considered. Different to other edit distances, the costs assigned to this operation are based on properties of the error regions and the image processing method used to delete a region. As a segmentation evaluation metric, it combines the assessment of accuracy and efficiency into a single metric. Evaluations on synthetic and real world data show promising results compared to other state of the art segmentation evaluation metrics.
|
|
WeAMOT4 |
311A, 3rd Floor |
WeAMOT6 Medical Image Analysis (311A, 3rd Floor) |
Oral Session |
|
11:10-11:30, Paper WeAMOT4.1 | |
An Automated Airway Segmentation Algorithm for CT Images Using Topological Leakage Detection and Volume Freezing |
Nadeem, Syed Ahmed | Univ. of Iowa |
Hoffman, Eric A | Univ. of Iowa Carver Coll. of Medicine |
Saha, Punam Kumar | Univ. of Iowa |
Keywords: Medical image and signal analysis, Segmentation, features and descriptors
Abstract: Numerous multi-center studies related to chronic obstructive pulmonary disease use computed tomography (CT) based characterization of the lung parenchyma and bronchial tree to understand the disease's status and progression. To our knowledge, there are no fully automated methods for airway tree segmentation that don’t require post-segmentation manual review and intervention. In this paper, we present a novel CT-based airway tree segmentation algorithm using topological leakage detection and volume freezing. The method is fully automated requiring no manual inputs or post-segmentation editing. It uses intensity-based connectivity and novel approaches of leakage detection and volume freezing to iteratively grow an airway tree starting from an initial seed inside the trachea. It begins with a conservative threshold and then, iteratively shifts towards generous threshold values. The method was applied on chest CT scans of ten non-smoking healthy subjects at total lung capacity, and the results were highly promising with no visual segmentation leakages.
|
|
11:30-11:50, Paper WeAMOT4.2 | |
A Method for PET-CT Lung Cancer Segmentation Based on Improved Random Walk |
Liu, Zhe | School of Computer Science and Communication Engineering, Jiangs |
Song, Yuqing | School of Computer Science and Communication Engineering, Jiangs |
Maere, Charlie | Jiangsu Univ |
Liu, Qingfeng | School of Computer Science and Communication Engineering, Jiangs |
Zhu, Yan | Affiliated Hospital of Jiangsu Univ |
Lu, Hu | Fudan Univ |
Yuan, Deqi | Zhenjiang First People's Hospital Branch |
Keywords: Medical image and signal analysis, Image processing and analysis, Computer-aided detection and diagnosis
Abstract: Segmentation methods only work for a single imaging modality usually suffer from the low spatial resolution in positron emission tomography (PET) or low contrast in computed tomography (CT) when the tumor region is inhomogeneous or not obvious. To address this problem, we develop a segmentation method combining the advantages and disadvantages of PET and CT. Firstly, the initial contours are obtained by the pre-segmentation of PET images using region growing and mathematical morphology. The initial contours can be used to automatically obtain the seed points required for random walk on PET and CT images, at the same time, they can be also used as a constraint in the random walk on CT images to solve the shortcoming that the tumor areas are not obvious if the CT images have not been enhanced. For the reason that CT provides essential details on anatomic structures, the anatomic structures of CT can be used to improve the weight of random walk on PET images. Finally, the similarity matrices obtained by random walk on PET and CT images are weighted to obtain identical results on PET and CT images. Our methods achieve an average DSC of 0.8456±0.0703 on 14 patients with lung cancer. Our method has much better performance when the tumors are inhomogeneous on PET images and not obvious on CT images.
|
|
11:50-12:10, Paper WeAMOT4.3 | |
Medical Knowledge Constrained Semantic Breast Ultrasound Image Segmentation |
Huang, Kuan | Utah State Univ |
Cheng, Heng-Da | Utah State Univ |
Zhang, Yingtao | Harbin Inst. of Tech |
Zhang, Boyu | Utah State Univ |
Xing, Ping | The First Affiliated Hospital of Harbin Medical Univ |
Ning, Chunping | The Affiliated Hospital of Qingdao Univ |
Keywords: Medical image and signal analysis, Segmentation, features and descriptors, Computer-aided detection and diagnosis
Abstract: Computer-aided diagnosis (CAD) can help doctors in diagnosing breast cancer. Breast ultrasound (BUS) imaging is harmless, effective, portable, and is the most popular modality for breast cancer detection/diagnosis. Many researchers work on improving the performance of CAD systems. However, there are two main shortcomings: (1) Most of the existing methods are based on prerequisites that there is one and only one tumor in the image. (2) The results depend on the datasets, i.e., an algorithm using different datasets may obtain different performances. It implies that the performance of traditional methods is dataset-dependent. In this paper, we propose an effective approach: (1) using information extended images to train a fully convolutional network (FCN) to semantically segment BUS image into 3 categories: mammary layer, tumor, and background; and (2) applying layer structure information - the breast cancers are located inside the mammary layer - to the conditional random field (CRF) for conducting breast cancer segmentation and making the segmentation result more accurate. The proposed method is evaluated utilizing BUS images of 325 cases, and the result is the best comparing with that of the existing methods by achieving true positive rate 92.80%, false positive rate 9%, and Intersection over Union 82.11%. The proposed approach has solved the above mentioned two shortcomings of the existing methods.
|
|
12:10-12:30, Paper WeAMOT4.4 | |
Interactive Segmentation of Glioblastoma for Post-Surgical Treatment Follow-Up |
Dhara, Ashis Kumar | Centre for Image Analysis, Department of Information Tech |
Arvids, Erik | Uppsala Univ |
Fahlström, Markus | Department of Surgical Sciences, Radiology, Uppsala Univ |
Wikström, Johan | Department of Surgical Sciences, Radiology, Uppsala Univ |
Larsson, Elna-Marie | Department of Surgical Sciences, Radiology, Uppsala Univ |
Strand, Robin | Uppsala Univ |
Keywords: Medical image and signal analysis, Computer-aided detection and diagnosis, Deep learning
Abstract: In this paper, we present a novel framework for interactive segmentation of glioblastoma in contrast-enhanced T1-weighted magnetic resonance images. U-net based-fully convolutional network is combined with an interactive refinement technique. Initial segmentation of brain tumor is performed using U-net, and the result is further improved by including complex foreground regions or removing background regions in an iterative manner. The method is evaluated on a research database containing post-operative glioblastoma of 15 patients. Radiologists can refine initial segmentation results in about 90 seconds, which is well below the time of interactive segmentation from scratch using state-of-the-art interactive segmentation tools. The experiments revealed that the segmentation results (Dice score) before and after the interaction step (performed by expert users) are similar. This is most likely due to the limited information in the contrast-enhanced T1-weighted magnetic resonance images used for evaluation. The proposed method is computationally fast and efficient, and could be useful for post-surgical treatment follow-up.
|
|
WePMP |
North Foyer & Park View Foyer, 3rd Floor |
Poster Session WePMP, Coffee Break(North Foyer & Park View Foyer, 3rd
Floor) |
Poster Session |
|
15:00-17:00, Paper WePMP.1 | |
Common Random Subgraph Modeling Using Multiple Instance Learning |
Xu, Tao | Univ. of Guelph |
Chiu, David K.Y. | Univ. of Guelph |
Gondra, Iker | St. Francis Xavier Univ |
Keywords: Probabilistic graphical model, Data mining, Graph matching
Abstract: In balancing information that is typical of a class and discriminatory between classes, we aim at synthesizing a common random subgraph (CRSG) model from an ensemble of attributed graph data. The common random subgraph model incorporates both structural and probabilistic information of the data that is common of a class in the ensemble, while multiple instance learning provides an effective process in handling large number of samples and is tolerant of substantial irrelevant graph elements. The proposed two-level multiple instance learning that compares the data between graphs at one level (as bags of instances), but also takes into account structural relationships between graph elements (as instances) at the other level. The method is evaluated using benchmarked structural datasets taken from the IAM graph repository. The experimental results show that the method can generate a meaningful and informative common random subgraph model of a class, but also effective in applying it to classification tasks in discriminating between classes.
|
|
15:00-17:00, Paper WePMP.2 | |
Fast Descriptor Extraction for Contextless 3D Registration Using a Fully Convolutional Network |
Garrett, Timothy | Iowa State Univ |
Radkowski, Rafael | Iowa State Univ |
Keywords: Deep learning, Segmentation, features and descriptors, 3D vision
Abstract: In recent years, numerous consumer devices have emerged that are capable of capturing 3D point data originating from depth images. Many computer vision tasks, such as object recognition, environment mapping, augmented reality, and more, rely on accurately registering 3D point sets. One method to compute this registration is to use 3D local feature descriptors for a coarse alignment, and to further refine the alignment with a variant of the Iterative Closest Point algorithm. While robust feature descriptors work well for this approach, online computation for all points in a single depth image is typically intractable. In this work, a method to facilitate real-time 3D registration by performing descriptor extraction on depth images using a Fully Convolutional Network (FCN) is presented. The input to this method is a raw depth image and results in a 33-bin descriptor for each pixel, enabling a general-purpose 3D registration process that doesn't require future network retraining and refinement. Experimental results on consumer hardware demonstrate that the proposed method significantly outperforms the state-of-the-art in terms of computation time and approaches depth sensor frame capture times, with only a slight reduction in descriptiveness.
|
|
15:00-17:00, Paper WePMP.3 | |
Rotational Invariant Discriminant Subspace Learning for Image Classification |
Ye, Qiaolin | Nanjing Univ. of Science and Tech |
Zhang, Zhao | City Univ. of Hong Kong |
Keywords: Dimensionality reduction
Abstract: A novel discriminant analysis technique for feature extraction, referred to as Robust Discriminant Subspace (RDS) with L2,p+s-Norm Distance Maximization-Minimization (maxmin) is posed. In its objective, the within-class and between-class distances are measured by L2,p-norm and L2,s-norm, respectively, such that it is robust and rotational invariant. An efficient iterative algorithm is designed to solve the resulted objective, which is non-greedy. These characteristics make RDS more intuitive and powerful than previous efforts. We also conduct some insightful analysis on the convergence of the proposed algorithm. Theoretical insights and effectiveness of our RDS are further supported by promising experimental results on several images databases.
|
|
15:00-17:00, Paper WePMP.4 | |
FReLU: Flexible Rectified Linear Units for Improving Convolutional Neural Networks |
Qiu, Suo | South China Univ. of Tech |
Xu, Xiangmin | South China Univ. of Tech |
Cai, Bolun | South China Univ. of Tech |
Keywords: Deep learning, Image classification
Abstract: Rectified linear unit (ReLU) is a widely used activation function for deep convolutional neural networks. However, because of the zero-hard rectification, ReLU networks lose the benefits from negative values. In this paper, we propose a novel activation function called emph{flexible rectified linear unit (FReLU)} to further explore the effects of negative values. By redesigning the rectified point of ReLU as a learnable parameter, FReLU expands the states of the activation output. When a network is successfully trained, FReLU tends to converge to a negative value, which improves the expressiveness and thus the performance. Furthermore, FReLU is designed to be simple and effective without exponential functions to maintain low-cost computation. For being able to easily used in various network architectures, FReLU does not rely on strict assumptions by self-adaption. We evaluate FReLU on three standard image classification datasets, including CIFAR-10, CIFAR-100, and ImageNet. Experimental results show that FReLU achieves fast convergence and competitive performance on both plain and residual networks.
|
|
15:00-17:00, Paper WePMP.5 | |
Adaptive Latent Representation for Multi-View Subspace Learning |
Zhang, Yuemei | Xidian Univ |
Wang, Xiumei | Xidian Univ |
Gao, Xinbo | Xidian Univ |
Keywords: Multiview learning, Clustering, Dimensionality reduction
Abstract: In recent years, datasets represented in multi-view contain more information because different views describe different aspects. In fact, the structure of the dataset is embedded in a union of certain low-dimensional subspaces. Therefore, multi-view subspace learning is a powerful technology to find the underlying structure and cluster data points correctly. Actually, the information contained in different views is different. Furthermore, the original data may be noisy. However, most existing multi-view subspace clustering methods treat each view equally and learn the self-expressiveness coefficient matrix of each view on the original data, which would decrease the clustering performance. To solve the above problems, we propose a new method that learns a latent representation in an adaptive way. Meanwhile, the local geometrical structure is maintained on the latent representation by graph regularization. At same time, a basic subspace clustering method is performed on the latent representation to get self-expressiveness coefficient matrix. We formulate the above problems into a unified optimization framework. Experimental results on several real-world datasets show the effectiveness of the proposed method.
|
|
15:00-17:00, Paper WePMP.6 | |
Appearance-Based Data Augmentation for Image Datasets Using Contrast Preserving Sampling |
Merchant, Alishan | NUCES |
Syed, Tahir | NUCES |
Khan, Behraj | NUCES |
Keywords: Deep learning, Learning-based vision, Object detection
Abstract: Data augmentation techniques have been employed to overcome the problem of model over-fitting in deep convolutional neural networks and have consistently shown improvement in classification. Most data augmentation techniques perform affine transformations on the image domain. However these techniques cannot be used when object position is significant in the image. In this work we propose a data augmentation technique based on sampling an representation built by inequality constraints propagated from local binary patterns. We sample nine distinct variations for an image in a manner meant to preserve local structure and differ only in the amount of contrast between pixels. These textit{contrast invariants} are then used to augmented the original images. We present evaluations on CIFAR-10 and validate our gains in performance across four criteria, accuracy, precision, recall and F1-Score; using a 2-layer convolutional neural network with different configuration of filters, and report improvement by about 13%, 9%, 12%, and 22% respectively over the baseline.
|
|
15:00-17:00, Paper WePMP.7 | |
Local Binary Patterns for Graph Characterization |
Jawad, Muhammad | Inst. of Management Sciences |
Aziz, Furqan | Inst. of Management Sciences |
Hancock, Edwin | Univ. of York |
Keywords: Graph matching, Clustering, Object recognition
Abstract: In this paper we propose a novel approach for defining Local Binary Patterns (LBP) to directly encode graph structure. LBP is a simple and widely used technique for texture analysis in static 2D images, and there is no work in the literature describing its generalisation to graphs. The proposed method (GraphLBP) is efficient and yet effective as a noise-tolerant graph-based representation. We compute the new feature representation for graphs by combining LBP with Galois Fields, using irreducible polynomials. The proposed method is scalable as it preserves the local and global properties of the graph. Experimental results show that GraphLBP can both increase the recognition accuracy and is both simpler and more computationally efficient when compared with state of the art techniques.
|
|
15:00-17:00, Paper WePMP.8 | |
Nonnegative and Adaptive Multi-View Clustering |
Zou, Peng | Soochow Univ |
Li, Fan-Zhang | Soochow Univ |
Zhang, Li | Soochow Univ |
Keywords: Multiview learning, Clustering, Data mining
Abstract: This paper proposes a novel Nonnegative and Adaptive Multi-view Clustering (NAMC) method. NAMC integrates the nonnegative matrix factorization (NMF), adaptive neighborhood learning and consensus adaptive similarity matrix fusion. More specifically, NAMC performs the nonnegative weight learning over the original data and the parts-based representations of NMF for more accurate measure and representation. For nonnegative adaptive feature extraction, our model first utilizes NMF to obtain the local parts-based representation of the original high-dimensional data. To keep the local structure of parts-based representations, we minimize the adaptive neighborhood reconstruction error. Then the optimal consensus similarity matrix can be iteratively obtained according to the nonnegative adaptive similarity matrix of each view. What’s more, the proposed NAMC is totally self-weighted. Once the target graph is obtained in our model, it can be partitioned into specific clusters directly. Extensive simulations show that NAMC can achieve good performance on several public databases for multi-view clustering, compared with other related methods.
|
|
15:00-17:00, Paper WePMP.9 | |
Stabilizing Actor Policies by Approximating Advantage Distributions from K Critics |
Labao, Alfonso | Univ. of the Philippines Computer Science Department |
Naval, Prospero | Univ. of the Philippines |
Keywords: Reinforcement learning, Deep learning
Abstract: Reinforcement learning algorithms that use policy gradient methods approach an optimal policy faster than Q-learning but at the cost of incurring high variances in gradients. Among variance reduction techniques are actor-critic methods that use value and advantage functions to train a policy actor. We propose an algorithm under the actor-critic family that further reduces gradient variance through estimation of advantage distributions from K deep network critics. We combine outputs of the K critics into an advantage distribution using a histogram approach followed by kernel convolution. We show in our analysis that using the K-critic advantage distribution provides variance reduction properties that results in more stable performance even on long training runs. We test our algorithm on a set of high-dimensional VizDoom experiments. Our experimental results show that our proposed algorithm attains the most average rewards compared to other methods, and with less noise compared to the 1-critic method.
|
|
15:00-17:00, Paper WePMP.10 | |
Single-Image Dehazing Algorithm Based on Convolutional Neural Networks |
Xiao, Jinsheng | Wuhan Univ |
Luo, Li | Wuhan Univ |
Liu, Enyu | Wuhan Univ |
Lei, Junfeng | School of Electronic Information, Wuhan Univ |
Klette, Reinhard | Auckland Univ. of Tech |
Keywords: Deep learning, Low-level vision
Abstract: The paper proposes a new single image dehazing method based on a convolutional neural network. Our method directly learns an end to end mapping between the haze images and their corresponding haze layers (i.e. residual images between haze images and non-haze images). A convolutional neural network takes the haze image as an input and the residual image as an output. Then, a recovered dehazed image can be obtained by removing the residual from the haze image. Residual learning allows the network to directly estimate the initial haze layer with relatively high learning rates, which reduce computational complexity and speed-up the convergence process. Since the initial haze layer is only approximate, we use a guided filter to refine this layer to avoid halos and block artefacts, which makes the recovered image more similar to a real scene. The algorithms are tested on fog images with different fog densities. Comparisons are provided with other dehazing algorithms. Experiments demonstrate that the proposed algorithm outperforms state-of-the-art methods on both synthetic and real-world images, qualitatively and quantitatively.
|
|
15:00-17:00, Paper WePMP.11 | |
Driving Maneuver Detection Via Sequence Learning from Vehicle Signals and Video Images |
Peng, Xishuai | Univ. of Michigan-Dearborn |
Liu, Ruirui | Univ. of Michgan-Dearborn |
Murphey, Yi | Univ. of Michigan-Dearborn |
Stent, Simon | Univ. of Cambridge |
Li, Yuanxiang | Shanghai Jiao Tong Univ |
Keywords: Applications of pattern recognition and machine learning, Video processing and analysis, Sensor array & multichannel signal processing
Abstract: Driving maneuver detection is one of the most challenging tasks in Advanced Driver Assistance Systems (ADAS). Research has shown that the early notification of improper driving maneuvers is helpful to avoid fatalities and serious accidents. In this paper, we introduce a driver maneuvering detection (DMD) system. The DMD system contains three major computational components, distance based representation of driving context, combined features of vehicle trajectory and VGG-19 network features extracted from the video images of vehicle front view, and a Long Short-Term Memory (LSTM)-based neural network model to learn sequence knowledge in driving maneuvering events. We show through experiments that the DMD system is capable of learning the latent features of five different classes of driving maneuvers and achieving significantly better performance than traditional classification methods on real-world driving trips.
|
|
15:00-17:00, Paper WePMP.12 | |
Background Subtraction Via 3D Convolutional Neural Networks |
Gao, Yongqiang | National Univ. of Defense Tech. Changsha, Hunan, P.R |
Cai, Huayue | National Univ. of Defense Tech |
Zhang, Xiang | National Univ. of Defense Tech. Changsha, 410073, Hun |
Lan, Long | National Univ. of Defense Tech. Changsha, Hunan, P.R |
Luo, Zhigang | National Univ. of Defense Tech. Changsha, Hunan, P.R |
Keywords: Applications of pattern recognition and machine learning, Deep learning, Neural networks
Abstract: Background subtraction can be treated as the binary classification problem of highlighting the foreground region in a video whilst masking the background region, and has been broadly applied in various vision tasks such as video surveillance and traffic monitoring. However, it still remains a challenging task due to complex scenes and for lack of the prior knowledge about the temporal information. In this paper, we propose a novel background subtraction model based on 3D convolutional neural networks (3D CNNs) which combines temporal and spatial information to effectively separate the foreground from all the sequences in an end-to-end manner. Different from conventional models, we view background subtraction as three-class classification problem, emph{i.e.}, the foreground, the background and the boundary. This design can obtain more reasonable results than existing baseline models. Experiments on the Change Detection 2012 dataset verify the potential of our model in both quantity and quality.
|
|
15:00-17:00, Paper WePMP.13 | |
Universal Perturbation Generation for Black-Box Attack Using Evolutionary Algorithms |
Wang, Siyu | Tianjin Univ. School of Computer Science and Tech |
Shi, Yucheng | Tianjin Univ |
Han, Yahong | Tianjin Univ |
Keywords: Neural networks, Image classification, Information forensics and security
Abstract: Image classifiers based on deep neural networks (DNNs) are vulnerable to tiny, imperceptible perturbations. Maliciously generated adversarial examples can exploit the instability of DNNs and mislead it into outputting a wrong classification result. Prior works showed the transferability of adversarial perturbations between models and between images. In this work, we shed light on the combination of source/target misclassification, black-box attack, and universal perturbation by employing improved evolutionary algorithms. We additionally find that the use of ‘adversarial initialization’ enhances the efficiency of evolutionary algorithms finding universal perturbations. Experiments demonstrate impressive misclassification rates and surprising transferability for the proposed attack method using different models trained on CIFAR-10 and CIFAR-100 datasets. Our attach method also shows robustness against defensive measures like adversarial training.
|
|
15:00-17:00, Paper WePMP.14 | |
Probabilistic Graph Embedding for Unsupervised Domain Adaptation |
Xiao, Pan | Wuhan Univ. Wuhan 430072 |
Du, Bo | School of Computer, Wuhan Univ. Wuhan 430079, China |
Yun, Shuang | Wuhan Univ |
Li, Xue | Wuhan Univ |
Zhang, YiPeng | Department of Electrical Engineering & Computer Science, Syracus |
Wu, Jia | Department of Computing, Macquarie Univ. Sydney |
Keywords: Domain adaptation, Classification, Probabilistic graphical model
Abstract: Unsupervised domain adaptation aims to predict unlabeled target domain data by taking advantaging of labeled source domain data. This problem is hard to solve mainly because of the limitation of target domain label and the difference between both domains. To overcome these two difficulties, many researchers have proposed to assign pseudo labels for the unlabeled target domain data and then project both domain data into a common subspace. The use of pseudo labels, however, lack of theoretical basis and thus may not gain ideal classification results eventually. Therefore, in this work we propose a novel method called Probabilistic Graph Embedding(PGE). PGE first derives probabilities that the target domain instances belong to each category, which is believed to be able to explore target domain better. We then obtain a projection matrix by constructing a within-class probabilistic graph. This projection matrix can embed both domains into a shared subspace where domain shift is largely diminished. Experiments on object recognition cross-domain datasets show that PGE is more effective and robust than the state-of-art unsupervised domain adaptation methods.
|
|
15:00-17:00, Paper WePMP.15 | |
Reservoir Computing with Untrained Convolutional Neural Networks for Image Recognition |
Tong, Zhiqiang | THE Univ. OF TOKYO |
Tanaka, Gouhei | The Univ. of Tokyo |
Keywords: Neural networks, Classification, Image processing and analysis
Abstract: Reservoir computing has attracted much attention for its easy training process as well as its ability to deal with temporal data. A reservoir computing system consists of a reservoir part represented as a sparsely connected recurrent neural network and a readout part represented as a simple regression model. In machine learning tasks, the reservoir part is fixed and only the readout part is trained. Although reservoir computing has been mainly applied to time series prediction and recognition, it can be applied to image recognition as well by considering an image data as a sequence of pixel values. However, to achieve a high performance in image recognition with raw image data, a large-scale reservoir including a large number of neurons is required. This is a bottleneck in terms of computer memory and computational cost. To overcome this bottleneck, we propose a new method which combines reservoir computing with untrained convolutional neural networks. We use an untrained convolutional neural network to transform raw image data into a set of smaller feature maps in a preprocessing step of the reservoir computing. We demonstrate that our method achieves a high classification accuracy in an image recognition task with a much smaller number of trainable parameters compared with a previous study.
|
|
15:00-17:00, Paper WePMP.16 | |
EMD-Based Entropy Features for Micro-Doppler Mini-UAV Classification |
Ma, Xinyue | Nanyang Tech. Univ |
Oh, Beomseok | Nanyang Tech. Univ |
Sun, Lei | Beijing Inst. of Tech |
Toh, Kar-Ann | Yonsei Univ |
Lin, Zhiping | Nanyang Tech. Univ |
Keywords: Classification, Signal analysis
Abstract: In this paper, we first investigate into six popular entropies extracted from a set of intrinsic mode functions (IMFs) as a feature pattern for radar-based mini-size unmanned aerial vehicles (mini-UAV) classification. The six entropies include Shannon entropy, spectral entropy, log energy entropy, approximate entropy, fuzzy entropy and permutation entropy. Via an empirical comparison among the six entropies on real measurement radar data, the first three are selected as the representative due to their high efficiency and accuracy. To enhance the classification accuracy, the three selected entropies are then extracted from eight different sets of IMFs obtained by signal downsampling, and then fused at feature level. The nonlinear support vector machine classifier is adopted to predict the class label of unseen test radar signals. Our empirical results on a set of real-world continuous wave radar data show that the proposed method outperforms the state-of-the-art method in terms of the mini-UAV classification accuracy.
|
|
15:00-17:00, Paper WePMP.17 | |
Context-Aware Attention LSTM Network for Flood Prediction |
Wu, Yirui | Hohai Univ |
Liu, Zhaoyang | Nanjing Univ |
Xu, Weigang | Hohai Univ |
Feng, Jun | Hohai Univ |
Palaiahnakote, Shivakumara | National Univ. of Singapore |
Lu, Tong | State Key Lab. for Software Tech. Nanjing Univ |
Keywords: Applications of pattern recognition and machine learning, Neural networks, Deep learning
Abstract: To minimize the negative impacts brought by floods, researchers from pattern recognition community utilize artificial intelligence based methods to solve the problem of flood prediction. Inspired by the significant power of Long Short-Term Memory (LSTM) networks in modeling the dynamics and dependencies of sequential data, we intend to utilize LSTM networks to predict sequential flow rate values based on a set of collected flood factors. Since not all factors are informative for flood prediction and the irrelevant factors often bring a lot of noise, we need to pay more attention to the informative ones. However, original LSTM doesn't have strong attention capability. Hence we propose an context-aware attention LSTM (CA-LSTM) network for flood prediction, which is capable to selectively focus on informative factors. During training, the local context-aware attention model is constructed by learning probability distributions between flow rate and hidden output of each LSTM cell. During testing, the learned local attention model assign weights to adjust relations between input factors and predictions at all steps of LSTM network. We conduct experiments on a flood dataset with several comparative methods to demonstrate high accuracy of the proposed method and the effectiveness of the proposed context-aware attention model.
|
|
15:00-17:00, Paper WePMP.18 | |
An Online Kernel Selection Wrapper Via Multi-Armed Bandit Model |
Li, Junfan | Tianjin Univ |
Liao, Shizhong | Tianjin Univ |
Keywords: Model selection, Online learning, Classification
Abstract: Online kernel selection is critical to online kernel learning, but most of the existing online kernel learning methods ignore the online kernel selection process, and instead they empirically preset and fix a kernel or adjust kernel parameter s by gradient descent, which is sensitive to the initial setting and has no theoretical guarantee. In this work, we propose an online kernel selection wrapper via the multi-armed bandit model, which can select a kernel at each round from a set of candidate kernels with theoretical guarantee and can be applied to any online kernel learning model. Specifically, the wrapper consists of two layers. In the outer layer, the wrapper corresponds each candidate kernel to an arm of the multi-armed bandit model, and chooses an arm according to the probability distribution maintained by the model at each round. In the inner layer, the wrapper updates the probability distribution according the loss of the selected arm, which is incurred by the prediction of the online kernel learning algorithm. We propose a new online kernel selection regret to measure the performance of the proposed wrapper, and prove that the proposed wrapper enjoys a sub-linear expected online kernel selection regret with respect to the cumulative loss of the optimal kernel among the candidates kernels. Experimental results on benchmark datasets demonstrate the effectiveness of the proposed wrapper.
|
|
15:00-17:00, Paper WePMP.19 | |
Variational Bayes Block Sparse Modeling with Correlated Entries |
Sharma, Shruti | Indian Inst. of Tech. Delhi |
Chaudhury, Santanu | Indian Inst. of Tech. Delhi |
Jayadeva, Dr. | IIT Delhi |
Keywords: Sparse learning, Signal analysis
Abstract: This paper addresses the problem of Bayesian Block Sparse Modeling when coefficients within the blocks are correlated. In contrast to the current hierarchical methods which do not exploit correlation structure within the blocks, we propose a three level hierarchical estimation framework. It employs heavy-tailed priors for block sparse modeling and variational inference for Bayesian estimation. This paper also describes the relationship between proposed framework and some of the existing Block Sparse Bayesian Learning (SBL) methods and show that these SBL methods can be viewed as its special cases. Extensive experimental results for synthetic signals are provided, demonstrating the superior performance of the proposed framework in terms of failure rate, relative reconstruction error, to name a few. We also demonstrate the applicability of this framework in telemonitoring of Fetal Electrocardiogram.
|
|
15:00-17:00, Paper WePMP.20 | |
Semi-Supervised Convolutional Neural Networks with Label Propagation for Image Classification |
Chen, Lin | ShenZhen Univ |
Yu, Shiqi | Shenzhen Univ |
Yang, Meng | Sun Yat-Sen Univ |
Keywords: Semi-supervised learning, Image classification, Neural networks
Abstract: Over the past several years, deep learning has achieved promising performance in many visual tasks, e.g., face verification and object classification. However, a limited number of labeled training samples existing in practical applications is still a huge bottleneck for achieving a satisfactory performance. In this paper, we integrate class estimation of unlabeled training data with deep learning model which generates a novel semi-supervised convolutional neural network (SSCNN) trained by both the labeled training data and unlabeled data. In the framework of SSCNN, the deep convolution feature extraction and the class estimation of the unlabeled data are jointly learned. Specifically, deep convolution features are learned from the labeled training data and unlabeled data with confident class estimation. After the deep features are obtained, the label propagation algorithm is utilized to estimate the identities of unlabeled training samples. The alternative optimization of SSCNN makes the class estimation of unlabeled data more and more accurate due to the learned CNN feature more and more discriminative. We compared the proposed SSCNN with some representative semi-supervised learning approaches on MINIST and Cifar-10 databases. Extensive experiments on landmark databases show the effectiveness of our semi-supervised deep learning framework.
|
|
15:00-17:00, Paper WePMP.21 | |
Occluded Joints Recovery in 3D Human Pose Estimation Based on Distance Matrix |
Guo, Xiang | Australian National Univ |
Dai, Yuchao | Northwestern Pol. Univ |
Keywords: Applications of pattern recognition and machine learning, 3D reconstruction, Sparse learning
Abstract: Albeit the recent progress in single image 3D human pose estimation due to the convolutional neural network, it is still challenging to handle real scenarios such as highly occluded scenes. In this paper, we propose to address the problem of single image 3D human pose estimation with occluded measurements by exploiting the Euclidean distance matrix (EDM). Specifically, we present two approaches based on EDM, which could effectively handle occluded joints in 2D images. The first approach is based on 2D-to-2D distance matrix regression achieved by a simple CNN architecture. The second approach is based on sparse coding along with a learned over-complete dictionary. Experiments on the Human3.6M dataset show the excellent performance of these two approaches in recovering occluded observations and demonstrate the improvements in accuracy for 3D human pose estimation with occluded joints.
|
|
15:00-17:00, Paper WePMP.22 | |
Scalable Semi-Supervised Learning by Graph Construction with Approximate Anchors Embedding |
Zhu, Hao | JD Finance |
Xia, Minxue | JD Finance |
Keywords: Semi-supervised learning, Image classification
Abstract: Semi-supervised learning (SSL) generalizes and improves supervised learning using labeled data and unlabeled data. With the rapid development of the Internet as well as the increasing availability of data in the open web, collecting tremendous amount of unlabeled data has became more feasible. As the central notion in SSL, smoothness is often defined on a graph representation of the data. However, only few researches up to date adapt graph-based SSL approaches into the large scale Solution. Even if approximation approaches are commonly used in methods of the large-scale SSL, it is difficult to outperform original methods. In this paper, we propose an efficient method to construct a graph based on anchors. There are two major concerns in the anchor graph generally: how to learn better anchors and to present the input both in a more efficient and effective fashion. Intuitively, compared with sparse representation (e.g. local anchor embedding), a more straightforward approach is marginal regression with non-negative constraint. Rather than using clustering algorithms, the anchors are trained using dictionary learning with sparsity constraints. And thus in our approach, not only the relation between anchors and data is taken into consideration, the relation between anchors is also regarded. Beyond that, in the evaluation section, we demonstrate that our method outperforms other large scale SSL methods as well as traditional ones in classification performance according to several classical datasets. Further more, the proposed method solves the large scale SSL problem more efficient than current methods. Therefore, our method is an efficient and effective alternative to handle large scale SSL problem.
|
|
15:00-17:00, Paper WePMP.23 | |
Facial Attribute Editing by Latent Space Adversarial Variational Autoencoders |
Li, Defang | Sun Yat-Sen Univ |
Zhang, Min | Sun Yat-Sen Univ |
Chen, Weifu | Sun Yat-Sen Univ. Guangzhou, China |
Feng, Guocan | Sun Yat-Sen Univ |
Keywords: Neural networks
Abstract: In this work, we focus on the problem of editing facial images by manipulating specify attributes of interest. To learn latent representations disentangled with respect to specified face attribute, we proposed a novel attribute-disentangled generative model by combining Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs) frameworks. In the proposed model, only two deep mappings are included: an encoder and a decoder, similarly as the counterparts in the context of VAEs. Latent space mapped by the encoder is split into two parts: style space and attribute space. The former represents attribute-irrelevant factors, such as identity, position, illumination and background. The latter represents the attributes, such as hair color, gender, with or without glasses etc, of which each dimension represents one single attribute. By regarding constraints on the output of the encoder as discriminative objectives, the encoder can act not only as a discriminator that expected to discriminate a sample is a real or a generated one, but also is an attribute classifier that can discriminate whether a sample has the specified attributes or not. Combining reconstruction and Kullback-Leibler (KL) divergence regularization error like in VAEs, the adversarial training loss defined for the style and the attribute in the latent space was introduced, which led the proposed model to generate images whose distribution are close to the real data distribution in the latent space. Finally, the model was evaluated on the CelebA dataset and experimental results showed its effectiveness in disentangling face attributes and generating high-quality face images.
|
|
15:00-17:00, Paper WePMP.24 | |
MMGAN: Manifold-Matching Generative Adversarial Network |
Park, Noseong | Univ. of North Carolina, Charlotte |
Anand, Ankesh | Montreal Inst. for Learning Algorithms |
Moniz, Joel Ruben Antony | Carnegie Mellon Univ |
Lee, Kookjin | Univ. of Maryland Coll. Park |
Choo, Jaegul | Korea Univ |
Park, David Keetae | Korea Univ |
Chakraborty, Tanmoy | IIIT Delhi, India |
Keywords: Deep learning, Neural networks, Image processing and analysis
Abstract: It is well-known that GANs are difficult to train, and several different techniques have been proposed in order to stabilize their training. In this paper, we propose a novel training method called manifold-matching, and a new GAN model called manifold-matching GAN (MMGAN). MMGAN finds two manifolds representing the vector representations of real and fake images. If these two manifolds match, it means that real and fake images are statistically identical. To assist the manifold matching task, we also use i) kernel tricks to find better manifold structures, ii) moving-averaged manifolds across mini-batches, and iii) a regularizer based on correlation matrix to suppress mode collapse. We conduct in-depth experiments with three image datasets and compare with several state-of-the-art GAN models. 32.4% of images generated by the proposed MMGAN are recognized as fake images during our user study (16% enhancement compared to other state-of-the-art model). MMGAN achieved an inception score of 7.8 for CIFAR-10.
|
|
15:00-17:00, Paper WePMP.25 | |
Multi-Source Domain Adaptation for Face Recognition |
Yi, Haiyang | Guilin Univ. of Electronic Tech |
Xu, Zhi | Guilin Univ. of Electronic Tech |
Wen, Yimin | Guilin Univ. of Electronic Tech |
Fan, Zhigang | Zebra Tech. Corp |
Keywords: Domain adaptation, Transfer learning, Face recognition
Abstract: For transfer learning, many research works have demonstrated that effective use of information from multi-source domains will improve classification performance. In this paper, we propose a method of Targetize Multi-source Domain Bridged by Common Subspace (TMSD) for face recognition, which transfers rich supervision knowledge from more than one labeled source domains to the unlabeled target domain. Specifically, a common subspace is learnt for several domains by keeping the maximum total correlation. In this way, the discrepancy of each domain is reduced, and the structures of both the source and target domains are well preserved for classification. In the common subspace, each sample projected from the source domains is sparsely represented as a linear combination of several samples projected from the target domain, such that the samples projected from different domains can be well interlaced. Then, in the original image space, each source domain image can be represented as a linear combination of neighbors in the target domain. Finally, the discriminant subspace can be obtained by targetized multi-source domain images using supervised learning algorithm. The experimental results illustrate the superiority of TMSD over those competitive ones.
|
|
15:00-17:00, Paper WePMP.26 | |
Domain Translation with Conditional GANs: From Depth to RGB Face-To-Face |
Fabbri, Matteo | Univ. Degli Studi Di Modena E Reggio Emilia |
Borghi, Guido | Univ. of Modena and Reggio Emilia |
Lanzi, Fabio | Univ. Degli Studi Di Modena E Reggio Emilia |
Vezzani, Roberto | Univ. of Modena and Reggio Emilia |
Calderara, Simone | Univ. of Modena and Reggio Emilia |
Cucchiara, Rita | Univ. Degli Studi Di Modena E Reggio Emilia |
Keywords: Deep learning, Applications of pattern recognition and machine learning, Applications of computer vision
Abstract: Can faces acquired by low-cost depth sensors be useful to catch some characteristic details of the face? Typically the answer is no. However, new deep architectures can generate RGB images from data acquired in a different modality, such as depth data. In this paper, we propose a new Deterministic Conditional GAN, trained on annotated RGB-D face datasets, effective for a face-to-face translation from depth to RGB. Although the network cannot reconstruct the exact somatic features for unknown individual faces, it is capable to reconstruct plausible faces; their appearance is accurate enough to be used in many pattern recognition tasks. In fact, we test the network capability to hallucinate with some Perceptual Probes, as for instance face aspect classification or landmark detection. Depth face can be used in spite of the correspondent RGB images, that often are not available due to difficult luminance conditions. Experimental results are very promising and are as far as better than previously proposed approaches: this domain translation can constitute a new way to exploit depth data in new future applications.
|
|
15:00-17:00, Paper WePMP.27 | |
Temporal Pattern Localization Using Mixed Integer Linear Programming |
Zhao, Rui | RPI |
Schalk, Gerwin | BCI R&D Progr, Wadsworth Ctr, NYS Dept of Health |
Ji, Qiang | RPI |
Keywords: Applications of pattern recognition and machine learning, Brain-computer interface, Sequence modeling
Abstract: In this paper, we consider the problem of localizing the subsequence in time series which contains the dynamic pattern of interest. This is motivated by brain computer interface application where we need to analyze the dynamic pattern of brain signals in response to external stimulus. We treat the localization as a binary label assignment problem and formalize a mixed integer linear programming (MILP) problem. The optimal solution is obtained by minimizing a cost function associated with label assignment subject to empirical constraints induced by data acquisition process. We experiment with synthetic data to evaluate the effectiveness of the proposed MILP formulation and achieve 10.8% improvement on F1-score. We then experiment with electrocorticographic (ECoG) data for a classification task and achieve 8.8% improvement on accuracy using subsequences localized by our method compared to other methods.
|
|
15:00-17:00, Paper WePMP.28 | |
Net4lap: Neural Laplacian Regularization for Ranking and Re-Ranking |
Curado, Manuel | Univ. of Alicante |
Escolano, Francisco | Univ. of Alicante |
Lozano, Miguel Angel | Univ. of Alicante |
Hancock, Edwin | Univ. of York |
Keywords: Neural networks, Manifold learning, Learning-based vision
Abstract: In this paper, we propose net4Lap, a novel architecture for Laplacian-based ranking. The two main ingredients of the approach are: a) pre-processing graphs with neural embeddings before performing Laplacian ranking, and b) introducing a global measure of centrality to modulate the diffusion process. We explicitly formulate ranking as an optimization problem where regularization is emphasized. This formulation is a theoretical tool to validate our approach. Finally, our experiments show that the proposed architecture significantly outperforms state-of-the-art rankers and it is also a proper tool for re-ranking.
|
|
15:00-17:00, Paper WePMP.29 | |
Part-Based Multi-Stream Model for Vehicle Searching |
Sun, Ya | Nanjing Univ. of Science and Tech |
Minxian, Li | Nanjing Univ. of Science and Tech |
Lu, Jianfeng | Nanjing Univ. of Science & Tech |
Keywords: Neural networks, Multilabel learning, Image classification
Abstract: Due to the tremendous requirement in public security and intelligent transportation system, searching an identical vehicle has become more and more important. Current studies usually treat vehicle object as an integral whole and then train a distance metric to measure the similarity among vehicles. However, these raw images may be exactly similar to ones with different identification and include more or less pixels in background that may disturb the deep distance metric learning. In this paper, we propose a novel and useful method to segment an original vehicle image into several discriminative and foreground parts, and these parts consist of some fine grained regions that are named discriminative patches. After that, these parts combined with the raw image are fed into a new deep learning network. We can easily measure the similarity of two vehicle images by computing the Euclidean distance of the features from FC layer. Two main contributions of this paper are as follows. Firstly, a method is proposed to estimate if a patch in a raw vehicle image is discriminative or not. Secondly, a new Part-based Multi-Stream Model (PMSM) is designed and optimized for vehicle retrieval and re-identification tasks. We evaluate our new method on the VehicleID dataset, and the experimental results show that our method can outperform the baseline.
|
|
15:00-17:00, Paper WePMP.30 | |
Dynamic Learning Rate for Neural Networks: A Fixed-Time Stability Approach |
Aldana Lopez, Rodrigo | Intel Corp |
Campos Macias, Leobardo Emmanuel | Intel Corp |
Zamora Esquivel, Julio | Intel Corp |
Gomez-Gutierrez, David | Intel Labs |
Cruz Vargas, Jesus Adan | Intel |
Keywords: Online learning, Deep learning, Neural networks
Abstract: Neural Networks (NN) have become important tools that have demonstrated their value solving complex problems regarding pattern recognition, natural language processing, automatic speech recognition, among others. Recently, the number of applications that require running the training process at the front-end in an online manner have increased dramatically. Unfortunately, in state-of-the-art (SoA) methods, this training process is an unbounded function of the initial conditions. Thus, there is no insight on the number of epochs required, making the online training a difficult problem. Speeding up the training process plays a key role in machine learning. In this work, an algorithm for dynamic learning rate is proposed based on recent results from fixed-time stability of continuous-time nonlinear systems, which ensures a convergence time bound to the equilibrium point independently of the initial conditions. We show experimentally that our discrete-time implementation presents promising results, proving that the number of epochs required for the training remains bounded, independently of the initial weights. This constitutes an important feature toward learning systems with real-time constraints. The efficiency of the method proposed is illustrated under different scenarios, including the public database MNIST, which shows that out algorithm outperforms SoA methods in terms of the number of epoch required for the training.
|
|
15:00-17:00, Paper WePMP.31 | |
Convolutional Networks for Semantic Heads Segmentation Using Top-View Depth Data in Crowded Environment |
Liciotti, Daniele | Univ. Pol. Delle Marche |
Paolanti, Marina | Univ. Pol. Delle Marche |
Pietrini, Rocco | Univ. Pol. Delle Marche |
Frontoni, Emanuele | Univ. Pol. Delle Marche |
Zingaretti, Primo | Univ. Pol. Delle Marche |
Keywords: Applications of pattern recognition and machine learning
Abstract: Detecting and tracking people is a challenging task in a persistent crowded environment (i.e. retail, airport, station, etc.) for human behaviour analysis of security purposes. This paper introduces an approach to track and detect people in cases of heavy occlusions based on CNNs for semantic segmentation using top-view depth visual data. The purpose is the design of a novel U-Net architecture, U-Net3, that has been modified compared to the previous ones at the end of each layer. In particular, a batch normalization is added after the first ReLU activation function and after each max-pooling and up-sampling functions. The approach was applied and tested on a new and public available dataset, TVHeads Dataset, consisting of depth images of people recorded from an RGB-D camera installed in top-view configuration. Our variant outperforms baseline architectures while remaining computationally efficient at inference time. Results show high accuracy, demonstrating the effectiveness and suitability of our approach.
|
|
15:00-17:00, Paper WePMP.32 | |
Conditional Information Gain Networks |
Biçici, Ufuk Can | Boğ Aziçi Univ. Idea Teknoloji |
Keskin, Cem | PerceptiveIO |
Akarun, Lale | Bogazici Univ |
Keywords: Deep learning, Image classification, Neural networks
Abstract: Deep neural network models owe their representational power to the high number of learnable parameters. It is often infeasible to run these largely parametrized deep models in limited resource environments, like mobile phones. Network models employing conditional computing are able to reduce computational requirements while achieving high representational power, with their ability to model hierarchies. We propose Conditional Information Gain Networks, which allow the feed forward deep neural networks to execute conditionally, skipping parts of the model based on the sample and the decision mechanisms inserted in the architecture. These decision mechanisms are trained using cost functions based on differentiable Information Gain, inspired by the training procedures of decision trees. These information gain based decision mechanisms are differentiable and can be trained end-to-end using a unified framework with a general cost function, covering both classification and decision losses. We test the effectiveness of the proposed method on MNIST and recently introduced Fashion MNIST datasets and show that our information gain based conditional execution approach can achieve better or comparable classification results using significantly fewer parameters, compared to standard convolutional neural network baselines.
|
|
15:00-17:00, Paper WePMP.33 | |
Face Aging with Improved Invertible Conditional GANs |
Li, Jia | Xi’an Jiaotong Univ |
Song, Yonghong | Xi'an Jiaotong Univ |
Zhang, Yuanlin | Xian JiaoTong Univ |
Keywords: Deep learning
Abstract: Abstract—Due to the continuous development of GAN, vivid faces can be generated, and the use of GAN for face aging becomes a novel trend. However, many existing works for face aging require tedious pre-processing of datasets. This brings a lot of computational burden and limits the application of face aging.In order to solve these problems, a face aging network is constructed using IcGAN without any data pre-processing which map a face image into personality and age vector spaces through encoders Z and Y. Different from the previous work, we make an emphasis on the preservation of both personalized and aging features. Thus, the minimize absolute reconstructing loss is proposed to optimize vector z, which can remain the personality characteristics, meanwhile preserving the pose, hairstyle and background of the input face. Additionally, we introduce a novel age vector optimization approach by classifying reconstruction loss and introduce the parameter which is well-balanced between large age features and subtle texture features. The experimental results demonstrate our proposed AIGAN provides better aging faces over other state-of-the-art age progression methods.
|
|
15:00-17:00, Paper WePMP.34 | |
IMU-Based Robust Human Activity Recognition Using Feature Analysis, Extraction, and Reduction |
Dehzangi, Omid | Univ. of Michigan-Dearborn |
Sahu, Vaishali | Univ. of Michigan Dearborn |
Keywords: Classification, Dimensionality reduction, Performance evaluation
Abstract: In recent years, research investigations on recognizing human activities to assess the physical and cognitive capability of humans have gained importance. This paper presents the development of a robust recognition system for Human Activity Recognition under real-world conditions. The activities considered are walking, walking upstairs, walking downstairs, sitting, standing and sleeping. The proposed system consists of 3 main elements - a feature extraction from an IMU (Inertial Measurement Unit) based on the spectral and temporal analysis; a feature dimensionality reduction techniques to reduce the high dimensional feature representation, and; various model training algorithms to recognize the human activities. Different methods for feature extraction based on time and frequency domain signal properties are evaluated. The high dimensionality of extracted features results in complex model training and suffers from the curse of dimensionality. Therefore, we evaluated feature selection and transformation algorithms to improve robustness without decreasing the prediction accuracy. Our results finding shows that Random forest feature selection method, when used with Ensemble bagged classifier, provides an accuracy of 96.9% with 15 features compared to the current benchmark system that employs 561 features. We further obtained a less complex activity recognition system via Neighborhood component analysis along with Ensemble bagged classifier that yields a classification accuracy of 96.3% with only 9 features.
|
|
15:00-17:00, Paper WePMP.35 | |
Multi-Label Classification of Stem Cell Microscopy Images Using Deep Learning |
Witmer, Adam | Univ. of California, Riverside |
Bhanu, Bir | Univ. of California |
Keywords: Applications of pattern recognition and machine learning
Abstract: This paper develops a pattern recognition and machine learning system to localize cell colony subtypes in multi-label, phase-contrast microscopy images. A convolutional neural network is trained to recognize homogeneous cell colonies, and is used in a sliding-window patch based testing method to localize these homogeneous cell types within heterogeneous, multi-label images. The method is used to determine the effects of nicotine on induced pluripotent stem cells expressing the Huntington's disease phenotype. The results of the network are compared to those of an ECOC classifier trained on texture features. The ability of the network to localize cell phenotypes within heterogeneous colonies is visualized and the temporal behavior of stem cells is analyzed
|
|
15:00-17:00, Paper WePMP.36 | |
D-NND: A Density-Based Hierarchical Clustering Method Via Nearest Neighbor Descent |
Teng, Qiu | Univ. of Electronic Science and Tech. of China |
Li, Chaoyi | Univ. of Electronic Science and Tech. of China |
Li, Yongjie | Univ. of Electronic Science and Tech. of China |
Keywords: Clustering
Abstract: Most density-based clustering methods largely rely on how well the underlying density is estimated. However, like clustering, density estimation is also a challenging unsupervised learning problem,especially the determination of the kernel bandwidth. In this paper, we propose a density-based multi-layer hierarchical clustering method, called the Deep Nearest Neighbor Descent (D-NND), which can largely alleviate the impact of the density estimation. Unlike previous density-based methods, D-NND learns the underlying density distribution layer by layer and at the same time makes the dataset sparsely and effectively organized into a directed Tree. The experiments on three real-world datasets and several challenging synthetic datasets demonstrate that the proposed method has strong ability to discover the underlying cluster structures and is not very sensitive to the density estimation method, the parameters and the clusters of multiple scales.
|
|
15:00-17:00, Paper WePMP.37 | |
Deep Age Estimation Model Stabilization from Images to Videos |
Ji, Zhipeng | Beijing Jiaotong Univ. School of Computer and Information |
Lang, Congyan | BJTU School of Computer and Information Tech |
Li, Kai | Chinese Acad. of Sciences |
Xing, Junliang | Inst. of Automation, Chinese Acad. of Sciences |
Keywords: Deep learning, Applications of pattern recognition and machine learning, Video processing and analysis
Abstract: Deep learning models for age estimation from a single image have significantly improved the state-of-the-art. However, when deploying a deep age estimation model from images directly to videos, it often suffers from the fluctuation issue, i.e., the estimated age varies a lot for face frames from the same person. To deal with this problem, this work presents a new deep age estimation model specifically designed for video facial age estimation, which produces very stable and accurate age estimation results. The proposed deep architecture for video facial age estimation incorporates a convolutional neural network with an attention mechanism, where the convolutional neural network extracts the facial features, and an attention block aggregates the facial feature vectors into a single feature representation for final age estimation. The whole model is trained by a novel loss function to guarantee both the accuracy of each frame and the stabilization of age estimation results of all the frames. To evaluate the proposed model for video facial age estimation, a new dataset is collected and annotated. Extensive experimental analyses and comparisons demonstrate the effectiveness of the proposed model and the state-of-the-art performances compared to many competing methods.
|
|
15:00-17:00, Paper WePMP.38 | |
Person Re-Identification with Weighted Spatio-Temporal Features |
Zhang, Dongyu | Sun Yat-Sen Univ |
Chen, Rongcong | Sun Yat-Sen Univ |
Qiu, Zhilin | Sun Yat-Sen Univ |
Zhang, Wei | Sun Yat-Sen Univ |
Wang, Qing | Sun Yat-Sen Univ |
Keywords: Classification, Deep learning, Object recognition
Abstract: Person re-identification (re-id) which resolves to recognize a person from the non-overlapped cameras has received increasing research. In this paper, we addressed a new problem of person re-id, i.e., image-to-video (ImtoV) person re-id, in which the probe is an image and the gallery consists of videos from nonoverlapping cameras with different views of probe image as shown in Fig. ref{fig:pipeline}. It is different from the traditional image-based person re-id in which the probe and gallery are all images. Although more information in the video is brought into ImtoV, it remains a challenging problem because of the large variations of light conditions, viewing angles, body pose, and occlusions in different views of videos. One problem is that most of the current models ignore that different frames play different importance in the matching, and assign equal weights to feature vector of each frame of videos. However, frames with serious occlusion and dramatical illumination change have the negative effect in improving the re-id performance. In order to overcome this problem, we proposed a novel framework for this task. We adopted CNNs for the feature extraction of images and videos, and further employed LSTM network for the spatiotemporal feature representation of videos. We added a weight modular to learn the weights for different frames of videos adaptively. We evaluated the proposed framework on three public person re-id datasets, and the experimental results showed that the proposed approach was effective for the ImtoV person re-id.
|
|
15:00-17:00, Paper WePMP.39 | |
Transductive Label Augmentation for Improved Deep Network Learning |
Elezi, Ismail | Ca' Foscari Univ. of Venice; Zurich Univ. of Applied S |
Torcinovich, Alessandro | Ca' Foscari Univ. of Venice |
Vascon, Sebastiano | Univ. Ca' Foscari of Venice |
Pelillo, Marcello | Ca' Foscari Univ |
Keywords: Semi-supervised learning, Deep learning, Classification
Abstract: A major impediment to the application of deep learning to real-world problems is the scarcity of labeled data. Small training sets are in fact of no use to deep networks as, due to the large number of trainable parameters, they will very likely be subject to overfitting phenomena. On the other hand, the increment of the training set size through further manual or semi-automatic labellings can be costly, if not possible at times. Thus, the standard techniques to address this issue are transfer learning and data augmentation, which consists of applying some sort of “transformation” to existing labeled instances to let the training set grow in size. Although this approach works well in applications such as image classification, where it is relatively simple to design suitable transformation operators, it is not obvious how to apply it in more structured scenarios. Motivated by the observation that in virtually all application domains it is easy to obtain unlabeled data, in this paper we take a different perspective and propose a label augmentation approach. We start from a small, curated labeled dataset and let the labels propagate through a larger set of unlabeled data using graph transduction techniques. This allows us to naturally use (second-order) similarity information which resides in the data, a source of information which is typically neglected by standard augmentation techniques. In particular, we show that by using known game theoretic transductive processes we can create larger and accurate enough labeled datasets which use results in better trained neural networks. Preliminary experiments are reported which demonstrate a consistent improvement over standard image classification datasets.
|
|
15:00-17:00, Paper WePMP.40 | |
Fast Skin Lesion Segmentation Via Fully Convolutional Network with Residual Architecture and CRF |
Luo, Wenfeng | South China Univ. of Tech |
Yang, Meng | Sun Yat-Sen Univ |
Keywords: Deep learning, Segmentation, features and descriptors, Medical image and signal analysis
Abstract: Melanoma is known to be the most fatal form of skin cancers. In order to achieve automated diagnosis of such disease, a system is needed to accurately locate suspicious skin lesions using images captured by standard digital cameras. Recently, there exists a trend for the use of Fully Convolutional Network(FCN) to perform image segmentation task. In this paper, we propose a FCN-based processing pipeline that incorporates a deep neural net and a graphical model, to attain a segmentation mask of lesion region from normal skin. Our method extends the residual network by adding a transposed convolution layer to yield a FCN architecture. We demonstrate that the noisy outcome from FCN can be refined by a fully connected Conditional Random Field(CRF). Our model enjoys three major advantages over existing algorithms: simpler process pipeline, state-of-art accuracy in terms of segmentation sensitivity(95.6%) and fast inference time.
|
|
15:00-17:00, Paper WePMP.41 | |
Improved Robust Discriminant Analysis for Feature Extraction |
Chen, Xiaobo | Jiangsu Univ |
Keywords: Dimensionality reduction, Classification
Abstract: Dimensionality reduction (DR) has emerged as a crucial issue in developing effective pattern recognition approaches. However, the performance of many DR algorithms degrades due to the impact of noisy environment. To address this problem, we propose in this paper a novel algorithm termed as robust large margin discriminant analysis (RLMDA) in order to enhance the robustness of features. In the spirit of large margin principle as applied in support vector machine, RLMDA maximizes the minimum between-class dispersion and simultaneously minimizes the within-class dispersion in the reduced subspace. Moreover, the l1-norm rather than traditional squared l2-norm is exploited to describe such dispersions, making the resultant algorithm robust to noisy features. The solution of RLMDA boils down to a nonconvex and nonsmooth optimization problem. Therefore, we take advantages of both constrained concave-convex procedure (CCCP) and Lagrangian dual method, and develop an efficient iterative algorithm. Experimental results show that RLMDA achieves better performance compared with other related methods.
|
|
15:00-17:00, Paper WePMP.42 | |
An Improved Self-Representation Approach for Missing Value Imputation |
Chen, Xiaobo | Jiangsu Univ |
Keywords: Applications of pattern recognition and machine learning
Abstract: Recovering missing values (MVs) from incomplete data is an important problem for many real-world applications. In this work, we propose a novel MVs imputation method by combing sample self-representation strategy and underlying local structure of data in a uniformed framework. Specifically, the proposed method firstly obtain the first-round estimation of MVs using an existing method. Then, a graph, characterizing local proximity structure of data, is constructed. Next, a novel model coined as graph regularized local self-representation (GRLSR) is proposed by integrating two crucial elements: local self-representation and graph regularization. The former assumes each sample can be well represented (reconstructed) by linearly combining the neighboring samples while the latter further requires the neighboring samples should not deviate too much from each other after reconstruction. By doing so, MVs can be more accurately restored due to the joint imputation as well as local linear reconstruction. We also develop an effective alternating optimization algorithm in order to solve GRLSR model, thereby achieving final imputation. Experimental results on a real-world road network traffic flow data and several UCI benchmark datasets demonstrate the effectiveness of our proposed method.
|
|
15:00-17:00, Paper WePMP.43 | |
Convolutional Discriminant Analysis |
Zhong, Guoqiang | Ocean Univ. of China |
Zheng, Yan | Ocean Univ. of China |
Zhang, Xu-Yao | Inst. of Automation, Chinese Acad. of Sciences |
Wei, Hongxu | Ocean Univ. of China |
Ling, Xiao | Ocean Univ. of China |
Keywords: Deep learning, Image classification
Abstract: Softmax regressor is arguably the most commonly used classifier in convolutional neural networks (CNNs). However, the cross-entropy based softmax loss only supervises the deep neural networks to learn effective representations of data, but does not explicitly enforce the separability between the classes. In this paper, we propose a novel convolutional neural network model, called convolutional discriminative analysis (CDA). Beyond the softmax loss, CDA employs a convolutional discriminant loss (CD-Loss), which minimizes the distance between the sample and its class center while maximizes the distance between the sample and its adversarial class center in the space of the learned deep representations. Extensive experiments on two benchmark data sets, Fashion-MNIST and CIFAR-10, demonstrate the superiority of CDA over traditional deep CNNs on the image classification tasks.
|
|
15:00-17:00, Paper WePMP.44 | |
Merging Neurons for Structure Compression of Deep Networks |
Zhong, Guoqiang | Ocean Univ. of China |
Yao, Hui | Ocean Univ. of China |
Zhou, Huiyu | Univ. of Leicester |
Keywords: Deep learning, Image classification
Abstract: Deep neural networks are increasingly used in many fields, such as pattern recognition, computer vision, and natural language processing. However, how to apply deep neural networks in mobile settings has become an urgent issue, as mobile devices are getting more and more popularity. This is mainly due to the fact that mobile devices usually have very limited computation and storage resources, which prevents from running a large-scale deep network. This paper proposes a novel method for structure compression of deep neural networks. The main idea is to merge the neurons and connections of the original network using clustering methods. To the end, the new network after compression possesses much less parameters, which leads to reduced requirements for computation and storage resources. Experiments on benchmark data sets demonstrate that the proposed method can greatly improve the efficiency of deep neural networks, while retain their learning capability.
|
|
15:00-17:00, Paper WePMP.45 | |
Effects of Sampling Skewness of the Importance-Weighted Risk Estimator on Model Selection |
Kouw, Wouter Marco | Delft Univ. of Tech |
Loog, Marco | Delft Univ. of Tech. / Univ. of Copenhagen |
Keywords: Domain adaptation, Model selection, Performance evaluation
Abstract: Importance-weighting is a popular and well-researched technique for dealing with sample selection bias and covariate shift. It has desirable characteristics such as unbiasedness, consistency and low computational complexity. However, weighting can have a detrimental effect on an estimator as well. In this work, we empirically show that the sampling distribution of an importance-weighted estimator can be skewed. For sample selection bias settings, and for small sample sizes, the importance-weighted risk estimator produces overestimates for data sets in the body of the sampling distribution, i.e. the majority of cases, and large underestimates for data sets in the tail of the sampling distribution. These over- and underestimates of the risk lead to sub-optimal regularization parameters when used for importance-weighted validation.
|
|
15:00-17:00, Paper WePMP.46 | |
A Convolutional Neural Network Approach for Estimating Tropical Cyclone Intensity Using Satellite-Based Infrared Images |
Combinido, Jay Samuel | Advanced Science and Tech. Inst |
Mendoza, John Robert | Electrical and Electronics Engineering Inst. Univ. Of |
Aborot, Jeffrey | Advanced Science and Tech. Inst |
Keywords: Applications of pattern recognition and machine learning, Transfer learning, Domain adaptation
Abstract: Existing techniques for satellite-based tropical cy- clone (TC) intensity estimation involve an explicit feature extraction step to model TC intensity on a set of relevant TC features or patterns such as eye formation and cloud organization. However, crafting such a feature set is often time-consuming and requires expert knowledge. In this paper, a convolutional neural network (CNN) approach, which eliminates explicit feature extraction, for estimating the intensity of tropical cyclones is proposed. Utilizing a Visual Geometry Group 19-layer CNN (VGG19) model pre-trained on ImageNet, transfer learning experiments were performed using grayscale IR images of TCs obtained from various geostationary satellites in the Western North Pacific region (1996 – 2016) to estimate TC intensity. The model re-trained on TC images achieved a root-mean-square error (RMSE) of 13.23 knots – a performance comparable to existing feature-based approaches (RMSE ranging from 12 to 20 knots). Moreover, the model was able to learn generic TC features that were previously identified in feature-based approaches as important indicators of TC intensity.
|
|
15:00-17:00, Paper WePMP.47 | |
Driver Distraction Detection Using MEL Cepstrum Representation of Galvanic Skin Responses and Convolutional Neural Networks |
Dehzangi, Omid | West Virginia Univ |
Taherisadr, Mojtaba | Univ. of Michigan |
Keywords: Applications of pattern recognition and machine learning, Human behavior analysis, Biological image and signal analysis
Abstract: Driver distraction is one of the major causes of road accidents which can lead to severe physical injuries and deaths. Statistics indicate the need of a reliable driver distraction system, which can monitor the driver’s distraction and alert the driver before there is a chance of for disasters on the road continuously and ubiquitously. Therefore, early detection of driver distraction can help decrease the cost of roadway disasters. Physiological signals such as galvanic skin response (GSR) analysis have been extensively used to monitor distraction at physiological level and develop system which alerts divers well in advance. In this paper, we introduce a driver distraction detection system based on MEL Cepstrum analysis of GSR signals and using convolutional neural networks (CNN). The proposed model operates by calculating and feeding two dimensional (2D) representation of GSR data as input to deep convolutional neural networks. We present a recipe to extract Mel frequency filter bank coefficients in time and frequency domains. The deep CNN is structured to automatically learn reliable discriminative patterns in the 2D Mel cepstrum space as features thus replacing the traditional ad hoc hand-crafted features when working with a high dimensional time-series dataset. The classification accuracy of the proposed prediction algorithm is evaluated based on a set of recorded GSR signals from 7 subjects. The subjects aged 24 to 45, actively participated in the naturalistic driving experiment during the GSR recordings. The experimental results demonstrate that the proposed algorithm achieves 93.28% accuracy.
|
|
15:00-17:00, Paper WePMP.48 | |
Exploring Spatio-Temporal Correlations Via Deep Convolutional Neural Networks for Short-Term Traffic Flow Prediction with Incomplete Data |
Hou, Jiaxin | School of Software Engineering, Chongqing Univ |
Chen, Jing | School of Big Data & Software Engineering, Chongqing Univ |
Liao, Shijie | Chongqing Univ |
Xiong, Qingyu | Chongqing Univ |
Wen, Junhao | Chongqing Univ |
Keywords: Applications of pattern recognition and machine learning, Sequence modeling, Deep learning
Abstract: Traffic flow prediction is a crucial task for the intelligent traffic management and control. Various machine learning based methods have been applied in this field. Most of these methods encounter three fundamental issues: feature representation of traffic patterns, learning from single location or network, and data quality. In order to address these issues, in this work we present a deep architecture for traffic flow prediction that learns deep hierarchical feature representation with spatio-temporal relations over the traffic network. Furthermore, we design an ensemble learning strategy via random subspace learning to make the model be able to tolerate incomplete data. The experimental results corroborate the effectiveness of the proposed approach compared with the state of the art methods.
|
|
15:00-17:00, Paper WePMP.49 | |
A Joint Optimization Framework of Low-Dimensional Projection and Collaborative Representation for Discriminative Classification |
Liu, Xiaofeng | Carnegie Mellon Univ |
Li, Zhaofeng | Univ. of Chinese Acad. of Sciences |
Kong, Lingsheng | Chinese Acad. of Sciences |
Diao, Zhihui | Chinese Acad. of Sciences |
Yan, Junliang | Chinese Acad. of Sciences |
Zou, Yang | Carnegie Mellon Univ |
Yang, Chao | Univ. of Southern California |
Jia, Ping | Changchun Inst. of Optics, Fine Mechanies and Physics, CAS |
You, Jane | The Hong Kong Pol. Univ |
Keywords: Image classification, Object recognition, Dimensionality reduction
Abstract: Various representation-based methods have been developed and shown great potential for pattern classification. To further improve their discriminability, we propose a Bi-level optimization framework in terms of both low-dimensional projection and collaborative representation. Specifically, during the projection phase, we try to minimize the intra-class similarity and inter-class dissimilarity, while in the representation phase, our goal is to achieve the lowest correlation of the representation results. Solving this joint optimization mutually reinforces both aspects of feature projection and representation. Experiments on face recognition, object categorization, and scene classification dataset demonstrate remarkable performance improvements led by the proposed framework.
|
|
15:00-17:00, Paper WePMP.50 | |
Action Classification Via Concepts and Attributes |
Rosenfeld, Amir | York Univ |
Ullman, Shimon | Weizmann Inst |
Keywords: Vision and language, Classification, Deep learning
Abstract: Classes in natural images tend to follow long tail distributions.This is problematic when there are insufficient training examples for rare classes. This effect is emphasized in compound classes, involving the conjunction of several concepts, such as those appearing in action-recognition datasets. In this paper, we propose to address this issue by learning how to utilize common visual concepts which are readily available. We detect the presence of prominent concepts in images and use them to infer the target labels instead of using visual features directly, combining tools from vision and natural-language processing. We validate our method on the recently introduced HICO dataset reaching a mAP of 31.54% and on the Stanford-40 Actions dataset, where the proposed method outperforms that obtained by direct visual features, obtaining an accuracy 83.12%. Moreover, the method provides for each class a semantically meaningful list of keywords and relevant image regions relating it to its constituent concepts.
|
|
15:00-17:00, Paper WePMP.51 | |
RelationNet: Learning Deep-Aligned Representation for Semantic Image Segmentation |
Zhuang, Yueqing | Peking Univ |
Tao, Li | Peking Univ |
Yang, Fan | Peking Univ |
Ma, Cong | Peking Univ |
Zhang, Ziwei | Peking Univ |
Jia, Huizhu | Peking Univ |
Xie, Xiaodong | Peking Univ |
Keywords: Scene understanding, Deep learning, Mid-level vision
Abstract: Semantic image segmentation, which assigns labels in pixel level, plays a central role in image understanding. Recent approaches have attempted to harness the capabilities of deep learning. However, one central problem of these methods is that deep convolutional neural network gives little consideration to the correlation among pixels. To handle this issue, in this paper, we propose a novel deep neural network named RelationNet, which utilizes CNN and RNN to aggregate context information. Besides, a spatial correlation loss is applied to train RelationNet to align features of spatial pixels belonging to same category. Importantly, since it is expensive to obtain pixel-wise annotations, we exploit a new training method to combine the coarsely and finely labeled data. Experiments show the detailed improvements of each proposal. Experimental results demonstrate the effectiveness of our proposed method to the problem of semantic image segmentation, which obtains state-of-the-art performance on the Cityscapes benchmark and Pascal Context dataset.
|
|
15:00-17:00, Paper WePMP.52 | |
Color Image Reconstruction with Perceptual Compressive Sensing |
Du, Jiang | Xidian Univ |
Xie, Xuemei | Xidian Univ |
Wang, Chenye | Xidian Univ |
Shi, Guangming | Xidian Univ |
Keywords: Learning-based vision, Deep learning, Perceptual organization
Abstract: We propose a novel compressive sensing framework for color images. Recently, compressive sensing (CS) has gain its popularity with the development of deep learning. To our best knowledge, existing methods all deal with RGB images channel by channel. This brings redundancy of measurements. In this paper, we do a breakthrough work. Instead of recovering RGB images channel by channel uniformly, we adopt non-uniform sampling in different channels in YCbCr color space. The luminance component takes up more measurements while the other channels take up less in the proposed framework. It greatly enhances the performance on CS for color images. Moreover, perceptual loss gives a powerful ability to better capture the structure information. We give the measurement rate at 2% as an example in the experiments, and the results show the proposed method outperforms all the existing methods with better structure of images.
|
|
15:00-17:00, Paper WePMP.53 | |
Local Compact Binary Patterns for Background Subtraction in Complex Scenes |
He, Wei | Hunan Inst. of Science and Tech |
Kim, Yongkwan | Hoseo Univ |
Qi, Qi | Hunan Inst. of Science and Tech |
Wu, Jianhui | Hunan Inst. of Science and Tech |
Zhang, Guoyun | Hunan Inst. of Science and Tech |
Guo, Longyuan | Hunan Inst. of Science and Tech |
Tu, Bing | Hunan Inst. of Science and Tech |
Ou, Xianfeng | Hunan Inst. of Science and Tech |
Huang, Feng | Hunan Inst. of Science and Tech |
Keywords: Object detection, Motion and tracking
Abstract: Background modeling in complex scenes is a challenging problem. In this paper, a novel background subtraction method is proposed to address it. First, the textures are modeled with local compact binary patterns (LCBP), which have excellent robustness, strong discriminative power, and fast computation speed. To make LCBP more effective to appearance changes in complex scenarios, spatiotemporal local compact binary patterns (STLCBP) are then considered in which spatial texture information and temporal motion information are combined together. Multiple color spaces are also presented to separate foreground pixels more accurately from the background. To our knowledge, this is the first time that LCBP have been used for background modeling. Extensive experimental results on a widely used dataset clearly show that the proposed method outperforms other state-of-the-art methods and works effectively in complex scenes.
|
|
15:00-17:00, Paper WePMP.54 | |
Facial Expression Recognition for Different Pose Faces Based on Special Landmark Detection |
Wu, Wenqi | Inst. of Automation, Chinese Acad. of Sciences |
Yin, Yingjie | Inst. of Automation Chinese Acad. of Sciences |
Wang, Yingying | Inst. of Automation, Chinese Acad. of Sciences |
Wang, Xingang | Inst. of Automation, Chinese Acad. of Sciences |
Xu, De | Inst. of Automation, Chinese Acad. of Sciences, Beijing 10 |
Keywords: Object recognition, Deep learning, Neural networks
Abstract: Facial expression recognition is a challenging task in computer vision field using only single facial image. As we know, human faces are convex spheres. The self-occlusion phenomenon generated from face pose will seriously affect the accuracy of expression recognition. In order to solve this problem, we propose a novel facial expression recognition method for different pose faces based on special landmark detection (FER-MPI-SFL). Our method is based on two shared networks. The outputs of the first Network are 29 special landmarks and 1 face box, which are the inputs of the second network and used to estimate face pose. The methods of RoIAlign and feature map concatenation are introduced in the second network to recognize the facial expression. The weight allocation of feature maps concatenation is guided by the result of pose estimation. In addition, an improved center loss is proposed to make the distances between the features of different expressions larger and easier to be classified in the feature space. As a result, superior performance to other state-of-the-art methods is achieved in facial expression databases CK+, MMI, Oulu-CASIA VIS and a new created database CASIA-MFE which contains more faces with different poses.
|
|
15:00-17:00, Paper WePMP.55 | |
Plant Identification from Bark: A Texture Description Based on Statistical Macro Binary Pattern |
Boudra, Safia | LaSTIC, Univ. of Batna 2 |
Itheri Yahiaoui, Itheri Yahiaoui | CReSTIC, Univ. De Reims Champagne-Ardenne |
Ali Behloul, Ali Behloul | Univ. of Batna2 |
Keywords: Applications of computer vision, Texture analysis, Image classification
Abstract: This paper presents a novel, yet compact texture descriptor for plant species identification based on bark texture images. Termed Statistical Macro Binary Pattern (SMBP), the descriptor is informative, rotation invariant, and it is designed to encode texture information from a large support area. The main novelty of this approach is the use of statistical description to represent the intensity distribution in the large support area, and an LBP-like encoding scheme to derive a statistical macro pattern by thresholding it against its adaptive statistical prototype. We propose to test three neighborhood sampling schemes according to the angular quantization at each level of the macrostructure. The comprehensive experiments on three challenging bark datasets (BarkTex, Trunk12, AFF) show that our descriptor achieves high and more consistent identification rates when compared with LBP-like texture descriptors.
|
|
15:00-17:00, Paper WePMP.56 | |
Person in Vehicle Counting Method of HOV HOT System |
Miyamoto, Shinichi | NEC Corp |
Keywords: Object recognition, Video analysis, Learning-based vision
Abstract: Automated number of passenger in vehicle counting system has been desired to realize for HOV (High Occupancy Vehicle) and HOT (High Occupancy Toll) use. Under outdoor environment, many complicated factors such as sunlight and vehicle window with darkly tinted cause to deteriorate captured image quality and as a result number counting accuracy remains at low level. In this paper, simple but refined image acquisition scheme and novel number of passenger counting algorithm are proposed. Especially, by incorporating deep neural network for face detection, integrating face detection results over plural frames and calculating confidence value of estimation result, useful and high accuracy has been realized.
|
|
15:00-17:00, Paper WePMP.57 | |
Kernel Dual Linear Regression for Face Image Set Classification |
Gao, Xizhan | Nanjing Univ. of Science and Tech |
Sun, Quansen | Nanjing Univ. of Science and Tech |
Xu, Haitao | Liaocheng Univ |
Li, Yanmeng | Nanjing Univ. of Science and Tech |
Keywords: Image classification, Regression, Support vector machine and kernel methods
Abstract: DLRC is an extension of LRC that extends LRC from conventional still image based method to the image set based method. DLRC has a demonstrated better performance on image set classification. However, when the image sets of different objects are not linear separable, or when the linear regression axes of class-specific samples of different classes have an intersection, DLRC may be failed for well classifying the image sets. In this paper, a new classification method, kernel dual linear regression classification (KDLRC), is proposed. KDLRC is a nonlinear version of DLRC and can overcome the drawback of DLRC. KDLRC first embeds the input data into a highdimensional Hilbert space, then in the kernel space, the data become easier to classify. Extensive experiments on four wellknown databases prove that the performance of KDLRC is better than that of DLRC and several state-of-the-art classifiers.
|
|
15:00-17:00, Paper WePMP.58 | |
Flexible Rotation Invariant Bases from Orthogonal Moments |
Yang, Bo | Northwestern Pol. Univ |
Chen, Xiaofeng | Northwestern Pol. Univ |
Zhang, Yuye | Xianyang Normal Univ |
Keywords: Object recognition, Image processing and analysis, Signal analysis
Abstract: Rotation transformation is a basic but fundamental geometric distortion. Design of rotation invariants is an indispensable part in researches on moment invariants. Due to better numerical stability and efficient invariant development, Gaussian-Hermite moments become powerful tools in field of pattern recognition. The existing rotation invariants of Gaussian-Hermite moments are constructed either with special constraints or for special patterns. The invariant bases neither contain all possible rotation invariants; nor are they available for all images. In this paper, we propose the flexible rotation invariant bases from Gaussian-Hermite moments. The invariants generated from such bases are available for any image and they are complete and exact representations of all rotation invariants of Gaussian-Hermite moments. The inherent properties, such as rotation invariance, completeness and independence of such flexible rotation bases are proven. Rotation invariance is verified by real data. The experiments with respect to template matching and image recognition show that the invariants generated by such flexible bases have better feature representation ability and numerical stability in comparison with traditional complex moment invariants.
|
|
15:00-17:00, Paper WePMP.59 | |
Anchor Free Network for Multi-Scale Face Detection |
Wang, Chengji | Xiamen Univ |
Luo, Zhiming | Xiamen Univ |
Lian, Lancer | Xiamen Univ |
Li, Shaozi | Xiamen Univ |
Keywords: Object detection, Deep learning, Multitask learning
Abstract: Anchor-based deep methods are the most widely used methods for face detection and have reached the state-of-the-art result. Compared with anchor-based methods that estimates the bounding-box rely on some pre-defined anchor boxes, anchor-free methods perform the localization by predicting the offsets of a pixel inside a face to its outside boundaries whose accuracies are much more precise. However, anchor-free methods suffer the drawback of low recall-rate mainly because 1) only using single scale features lead to miss detection of small faces, 2) the highly intra-class imbalance problem among different size faces. In this paper, to address these problems, we propose a unified anchor-free network for detecting multi-scale faces by leveraging the local and global contextual information of multi-layer features. We also utilize a scale aware sampling strategy to mitigate the intra-class imbalance issue which can adaptivity select the positive samples. Furthermore, a revised focal loss function is adopted to deal with the foreground/background imbalance issue. Experimental results on two benchmark datasets demonstrate the effective of our proposed method.
|
|
15:00-17:00, Paper WePMP.60 | |
3DMAX-Net: A Multi-Scale Spatial Contextual Network for 3D Point Cloud Semantic Segmentation |
Ma, Yanxin | National Univ. of Defense Tech |
Guo, Yulan | National Univ. of Defense Tech |
Lei, Yinjie | Sichuan Univ |
Lu, Min | National Univ. of Defense Tech |
Zhang, Jun | National Univ. of Defense Tech |
Keywords: 3D vision, Scene understanding, Deep learning
Abstract: Semantic segmentation of 3D scenes is a fundamental problem in 3D computer vision. In this paper, we propose a deep neural network for 3D semantic segmentation of raw point clouds. A multi-scale feature learning block is first introduced to obtain informative contextual features in 3D point clouds. A global and local feature aggregation block is then extended to improve the feature learning ability of the network. Based on these strategies, a powerful architecture named 3DMAX-Net is finally provided for semantic segmentation in raw 3D point clouds. Experiments have been conducted on the Stanford largescale 3D Indoor Spaces Dataset using only geometry information. Experimental results have clearly shown the superiority of the proposed network.
|
|
15:00-17:00, Paper WePMP.61 | |
Beyond Two-Stream: Skeleton-Based Three-Stream Networks for Action Recognition in Videos |
Xu, Jianfeng | KDDI Res. Inc |
Tasaka, Kazuyuki | KDDI Res. Inc |
Yanagihara, Hiromasa | KDDI Res. Inc |
Keywords: Behavior recognition, Deep learning for multimedia analysis, Neural networks
Abstract: Two-stream architecture, trained on stacked optical flows and image frames, has demonstrated excellent performance for human action recognition in videos and served as the basis of many advanced architectures. To improve performance, we propose a novel architecture of three-stream networks based on skeletons, extending two-stream architecture while retaining the input as video only. Recently, it has become possible to successfully detect skeletons from videos, which contain high-level joint-motion information and are complementary to low-level optical flows and image frames. However, skeleton data lack object information (e.g. a book) involved in the action, which is essential to distinguish such actions as reading, writing, and typing. Therefore, we fuse the complementary information on skeleton data, optical flows, and image frames in our three-stream networks. Furthermore, we crop the human and object region using skeleton data for sparse spatial sampling. In addition, we calculate adaptive weights when fusing the three streams to further improve performance. The experimental results demonstrate that the proposed three-stream networks outperform the state-of-the-art techniques on the NTU RGB+G dataset, significantly improving the accuracy from 80.8% to 86.4%.
|
|
15:00-17:00, Paper WePMP.62 | |
Sparse Representation and Weighted Clustering Based Abnormal Activity Detection |
Jin, Dongliang | Nanjing Univ. of Posts and Telecommunications |
Songhao, Zhu | Nanjing Univ. of Posts and Telecommunications |
Songsong, Wu | Nanjing Univ. of Posts and Telecommunications |
Jing, Xiaoyuan | Nanjing Univ. of Posts and Telecommunications |
Keywords: Behavior recognition, Sparse learning, Image processing and analysis
Abstract: Abnormal activity detection is a challenging problem divided into global abnormal activity detection and local abnormal activity detection. First, the hybrid histogram of optical flow feature is extracted; then, the double sparse representation is proposed to tackle the issue of global abnormal activity detection; finally, for the issue of local abnormal activity detection, the foreground of region of interest within the current frame is first detected, and then the method of online weighted clustering is utilized to detect local abnormal activity. Experiments results conducted on UMN datasets and UCSD datasets validate the advantages of the proposed method.
|
|
15:00-17:00, Paper WePMP.63 | |
Weak Supervised Learning Based Abnormal Behavior Detection |
Sun, Xian | Nanjing Univ. of Posts and Telecommunications |
Songhao, Zhu | Nanjing Univ. of Posts and Telecommunications |
Songsong, Wu | Nanjing Univ. of Posts and Telecommunications |
Jing, Xiaoyuan | Nanjing Univ. of Posts and Telecommunications |
Keywords: Behavior recognition, Semi-supervised learning, Image processing and analysis
Abstract: Artificial features are adopted in most of existing abnormal behavior detection, however it is difficult to choose and design an effective behavior feature for the reason of high computation cost and complex scenarios. To solve this problem, weak supervised detection method of abnormal behavior based on temporal consistency is proposed in this paper. First, temporal gram matrix is constructed for a given pair of video. Then, a pair of behavior units (Candidate action fragment) is formed by exploiting the temporal consistency and smoothness of human behavior, which aims to locate the start and end frames of the related abnormal action and train the corresponding classifier. Finally, sparse reconstruction is utilized to detect abnormal behavior. Experimental results conducted on CAVIAR and BOSS dataset demonstrate the effectiveness of the proposed method.
|
|
15:00-17:00, Paper WePMP.64 | |
A New Method for Face Alignment under Extreme Poses and Occlusion |
Li, Jun | Nanjing Univ |
Xiao, Qiongling | Nanjing Univ |
Yang, Ruoyu | Nanjing Univ |
Keywords: Object detection, Applications of computer vision, Applications of pattern recognition and machine learning
Abstract: In real-world conditions, robust face alignment is challenging due to the large variability of occlusion and pose. Many methods aim to solve the problem, but can handle either images with occlusion only or with arbitrary poses only. In this paper, we propose a unified framework by ignoring the points which cannot be seen under occlusion and extreme poses, in which we get facial parts first by classification and then train regression models to get key points. It leads to higher accuracy when locating the truly existing points without considering the occluded and non-existent points. Besides, we observed that the drift and shape of face detection results affect face alignment .As far as we know, we are the first to explicitly raise the issue and solve it to some extent. Finally, our method outperforms the state-of-the-art methods on AFLW and COFW datasets. It is also comparable to other methods on LFPW dataset.
|
|
15:00-17:00, Paper WePMP.65 | |
Latent Linear Dynamics for Modeling Pedestrian Behaviors |
Dhaka, Devendra | NEC Corp. Japan |
Ishii, Masato | NEC |
Sato, Atsushi | NEC |
Keywords: Behavior recognition, Clustering
Abstract: We consider the problem of generative pedestrian modeling for capturing common behaviors constituting trajectory dataset. We present a model, that simultaneously represents the trajectory data and latent dynamics associated with different behaviors. Model represents the trajectory dynamics as scaled component of cluster dynamics, and overall the cluster dynamics is shared among all trajectories belonging to a cluster, thus giving rise to similarity. Cluster dynamics is modeled by incorporating Bayesian nonparametrics, particularly the usage of Dirichlet process mixture model approach, which relaxes the number of unique behaviors or clusters. Additionally, the relative velocity scaling term encapsulates the relative nature of an individual trajectory to its cluster dynamics. Model parameters and latent states are inferred using sequential blocked Gibbs sampler, which can be scaled to large datasets.
|
|
15:00-17:00, Paper WePMP.66 | |
SCUT-FBP5500: A Diverse Benchmark Dataset for Multi-Paradigm Facial Beauty Prediction |
Liang, Lingyu | South China Univ. of Tech |
Lin, Luojun | South China Univ. of Tech |
Jin, Lianwen | South China Univ. of Tech |
Xie, Duorui | South China Univ. of Tech |
Li, Mengru | South China Univ. of Tech |
Keywords: Image classification
Abstract: Facial beauty prediction (FBP) is a significant visual recognition problem to make assessment of facial attractiveness that is consistent to human perception. To tackle this problem, various data-driven models, especially state-of-the-art deep learning techniques, were introduced into FBP, and benchmark dataset become one of the essential elements to achieve FBP. Previous works have formulated the recognition of facial beauty as a specific supervised learning problem of classification or regression, which indicates that FBP is intrinsically a computation problem with multiple paradigms. However, most of FBP benchmark datasets were built under specific computation constrains, which limits the performance and flexibility of the computational model trained on the dataset. In this paper, we argue that FBP is a multi-paradigm computation problem, and build a new diverse benchmark dataset, called SCUT-FBP5500, to achieve multi-paradigm facial beauty prediction. The SCUT-FBP5500 dataset has totally 5500 frontal faces with diverse properties (male/female, Asian/Caucasian, ages) and diverse labels (face landmarks, beauty scores within [1,~5], beauty score distribution), which allows different computational model with different FBP paradigms, such as appearance-based/shape-based facial beauty classification/regression/ranking model for male/female of Asian/Caucasian. We evaluated the SCUT-FBP5500 dataset for FBP using different combinations of feature and predictor, and various deep learning methods. The results indicates the improvement of FBP and the potential applications based on the SCUT-FBP5500.
|
|
15:00-17:00, Paper WePMP.67 | |
Fast and Robust Pose Estimation Algorithm for Bin Picking Using Point Pair Feature |
Li, Mingyu | Tohoku Univ |
Hashimoto, Koichi | Tohoku Univ |
Keywords: Object recognition, 3D vision
Abstract: Bin picking refers to picking up the objects randomly piled in the container (bin) and robotic bin picking is always used to improve the industrial production efficiency. A pose estimation algorithm is necessary to tell the poses of the objects to the robot. This paper proposes a pose estimation algorithm for bin picking using 3D point cloud data. Point Pair Feature algorithm is performed in a fast way to propose possible poses and the poses are verified by a voxel-based verification method. Iterative Closest Point is used to refine the result poses. Our algorithm is proved to be more accurate and faster than Curve Set Feature algorithm and Point Pair Feature algorithm, robust to occlusion and able to detect multiple poses in one scene.
|
|
15:00-17:00, Paper WePMP.68 | |
Hybrid 3D Surface Description with Global Frames and Local Signatures of Histograms |
Shen, Zhiqiang | School of Control Science and Engineering |
Ma, Xin | Shandong Univ |
Zeng, Xianglei | Shandong Univ |
Keywords: 3D vision, Object recognition
Abstract: This paper presents a novel 3D descriptor named Frame-SHOT to combine global structural frame with local surface information. Global feature descriptors are generally more descriptive for surface representation, while susceptible to occlusion and clutter. In contrast, local feature descriptors are more robust, while less discriminative due to the limit of the support region. We employ the advantages of these two methods by combining both local and global information for surface description. The Signature of Histograms of Orientation (SHOT) descriptor is used to characterize the local surface and structural frame points of the object are used to encode the global surface. We have compared the proposed Frame-SHOT descriptor with the state-of-the-art global and local methods on two public datasets. The results show that our proposed descriptor is more descriptive and robust for surface matching and recognition.
|
|
15:00-17:00, Paper WePMP.69 | |
A Structural Approach to Person Re-Identification Problem |
Mahboubi, Amal | Greyc Umr Cnrs 6072 |
Brun, Luc | ENSICAEN |
Conte, Donatello | Univ. of Tours |
Keywords: Object recognition, Image processing and analysis
Abstract: Although it has been studied extensively during past decades, object tracking is still a difficult problem due to many challenges. Several improvements have been done, but more and more complex scenes (dense crowd, complex interactions) need more sophisticated approaches. Particularly long-term tracking is an interesting problem that allow to track objects even after it may become longtime occluded or it leave/re-enter the field-of-view. In this case the major challenges are significantly changes in appearance, scale and so on. At the heart of the solution of long-term tracking is the re-identification technique, that allows to identify an object coming back visible after an occlusion or re-entering on the scene. This paper proposes an approach for pedestrian re-identification based on structural representation of people. The experimental evaluation is carried out on two public data sets (ETHZ and CAVIAR4REID datasets) and they show promising results compared to others state-of-the-art approaches.
|
|
15:00-17:00, Paper WePMP.70 | |
Pre-Trained VGG-Net Architecture for Remote-Sensing Image Scene Classification |
Usman, Muhammad | Univ. of Chinese Acad. of Sciences |
Wang, Weiqiang | Univ. of Chinese Acad. of Sciences |
Shahbaz Pervaiz, Chattha | Yanbu Univ. Coll |
Sajid, Ali | Univ. of Education |
Keywords: Image classification, Deep learning, Transfer learning
Abstract: The visual geometry group network (VGGNet) is used widely for image classification and has proven to be very effective method. Most existing approaches use features of just one type, and traditional fusion methods generally use multiple manually created features. However, to get the benefits of multi-layer features remain a significant challenge in the remote-sensing domain. To address this challenge, we present a simple yet powerful framework based on canonical correlation analysis and 4-layer SVM classifier. Specifically, the pretrained VGGNet is employed as a deep feature extractor to extract mid-level and deep features for remote-sensing scene images. We then choose two convolutional (mid-level) and two fully-connected layers produced by VGGNet in which each layer is treated as a separated feature descriptor. Next, canonical correlation analysis (CCA) is used as a feature fusion strategy to refine the extracted features, and to fuse them with more discriminative power. Finally, the support vector machine (SVM) classifier is used to construct the 4-layer representation of the scenes images. Experimenting on a UC Merced and WHU-RS datasets, demonstrate that the proposed approach, even without data augmentation, fine tuning or coding strategy, has a superior performance than state-of-the-art methods used now.
|
|
15:00-17:00, Paper WePMP.71 | |
Long-Term Object Tracking with Instance Specific Proposals |
Liu, Hao | National Univ. of Defense Tech |
Hu, Qingyong | National Univ. of Defense Tech |
Li, Biao | National Univ. of Defense Tech |
Guo, Yulan | National Univ. of Defense Tech |
Keywords: Motion and tracking, Online learning, Regression
Abstract: Correlation filter based trackers have been extensively investigated for their superior efficiency and fairly good robustness. However, it remains challenging to achieve long-term tracking when the object is under occlusion and severe deformation. In this paper, we propose a tracker named Complementary Learners with Instance-specific Proposals (CLIP). The CLIP tracker consists of three main components, including a translation filter, a scale filter, and an error correction module. Complementary features are incorporated into the translation filter to cope with illumination changes and deformation, and an adaptive updating mechanism is proposed to prevent model corruption. The translation filter aims to provide an excellent real-time inference. Furthermore, the error correction module is activated to correct the localization error by an instance-specific proposal generator, especially when the target suffers from dramatic appearance changes. Experimental results on the OTB, Temple-Color 128 and UAV20L datasets demonstrate that the CLIP tracker performs favorably against existing competitive trackers in term of accuracy and robustness. Moreover, our proposed CLIP tracker runs at the speed of 33 fps on the OTB. It is highly suitable for real-time applications.
|
|
15:00-17:00, Paper WePMP.72 | |
Which Part Is Better: Multi-Part Competition Network for Person Re-Identification |
Du, Peng | Xi’an Jiaotong Univ. School of Software Engineering |
Song, Yonghong | Xi'an Jiaotong Univ |
Zhang, Yuanlin | Xian JiaoTong Univ |
Keywords: Image classification, Deep learning
Abstract: Person re-identification is a challenging task due to the background clutters, occlusion and illumination variations. In addition, the pedestrian misalignment always exists in some automatic-detection datasets. In this paper, we propose a MultiPart Competition Network (MPCN) consisting of Multi-Part Network (MPN) and Part Competition Network (PCN), which aims to solve the misalignment problem caused by the detector errors and human pose variations. First, we construct original body parts and enlarged body parts using human pose estimation algorithm. These two kinds of body parts not only alleviate the misalignment from background and varying human pose but also solve the missing details and imprecise body parts introduced by human pose estimator. Then, we use MPN to acquire global features and two different body parts features. The components of MPN, a global branch and two part branches, are combined by ROI pooling layer. Finally, we apply PCN to achieve a tradeoff between the original body parts and the enlarged body parts and acquire discriminative part features from these two different body parts. Extensive evaluations on three widely used re-id datasets, Market-1501, CUHK03, VIPeR demonstrate that our proposed network have a competitive result compared to the state-of-the-art methods.
|
|
15:00-17:00, Paper WePMP.73 | |
Explain Black-Box Image Classifications Using Superpixel-Based Interpretation |
Wei, Yi | Univ. at Albany, State Univ. of New York |
Chang, Ming-Ching | Univ. at Albany - SUNY |
Ying, Yiming | SUNY Albany |
Lim, Ser Nam | GE |
Lyu, Siwei | SUNY Albany |
Keywords: Image classification, Deep learning, Applications of pattern recognition and machine learning
Abstract: How to best understand and interpret the decisions of deep neural networks is a crucial topic, as the impact of intelligent deep network systems is prevalent in many applications. We propose a superpixel based method to interpret and explain the results of black-box deep networks in the widely-applied image classification tasks. We perform probabilistic prediction difference analysis upon one or more superpixels clustered from image pixels. Our method calculates a superpixel score map visualization that can provide rich interpretation regarding image components. Such interpretation provides supportive likelihoods of image regions upon the decisions performed by the black-box classifier. We compare our method against state-of-art pixelwise interpretation methods over the latest deep neural network classifiers on the ImageNet dataset. Results show that our method produces more consistent interpretations in less computation time. Our method also supports interactive interpretation, where users can acquire explanations on specified regions through a convenient interface for a prompt reaction.
|
|
15:00-17:00, Paper WePMP.74 | |
Improved Correlation Filter Tracking with Hard Negative Mining |
Qie, Chunguang | Xiamen Univ |
Guanjun, Guo | Xiamen Univ |
Yan, Yan | Xiamen Univ |
Liming, Zhang | Univ. of Macau |
Wang, Hanzi | Xiamen Univ |
Keywords: Motion and tracking, Video analysis
Abstract: Recently, the correlation filter based trackers have achieved very good tracking performance. However, due to the boundary effects of the circulant matrix and the usage of cosine window, the lack of effective negative samples becomes a challenging problem for the correlation filter based trackers. This problem may cause overfitting so that these trackers become very sensitive to deformation and occlusion. In this paper, we propose a novel object tracker (i.e., STAPLE HNM), which can effectively select hard negative samples and assign adaptive weights to these samples to train the correlation filter. Experimental results demonstrate that the proposed STAPLE HNM tracker effectively improves the performance of the baseline STAPLE CA tracker on the OTB-50 and OTB-100 datasets. Moreover, the proposed STAPLE HNM tracker also achieves superior performance among several state-of-the-art trackers.
|
|
15:00-17:00, Paper WePMP.75 | |
A Rigorous Solution for Closed-Form Correlation Filter Tracking |
Li, Dongdong | National Univ. of Defense Tech |
Wen, Gongjian | National Univ. of Defense Tech |
Kuai, Yangliu | National Univ. of Defense Tech |
Keywords: Motion and tracking, Video analysis, Scene understanding
Abstract: Recently, Discriminative Correlation Filters (DCF) have achieved enormous popularity in the tracking community due to high efficiency and fair robustness. With the circular structure, DCF transform computationally consuming spatial correlation into efficient element-wise operation in the Fourier domain. In this paper, we argue that this element-wise solution can be derived only in the case of single-channel features. In terms of tracking with multi-channel features, this element-wise solution trains each feature dimension independently and fails to learn a joint correlation filter. To tackle this problem, we propose a rigorous solution to closed-form correlation filter tracking. This rigorous solution can be computed pixel by pixel from a small linear equation system. Experimental results demonstrate that our rigorous pixel-wise solution achieves better tracking performance than the baseline element-wise solution.
|
|
15:00-17:00, Paper WePMP.76 | |
Incremental 3D Line Segment Extraction from Semi-Dense SLAM |
He, Shida | Univ. of Alberta |
Qin, Xuebin | Univ. of Alberta |
Zhang, Zichen | Univ. of Alberta |
Jagersand, Martin | Univ. of Alberta |
Keywords: 3D reconstruction, Multiple view geometry, Image based modeling
Abstract: Although semi-dense Simultaneous Localization and Mapping (SLAM) has been becoming more popular over the last few years, there is a lack of efficient methods for representing and processing their large scale point clouds. In this paper, we propose using 3D line segments to simplify the point clouds generated by semi-dense SLAM. Specifically, we present a novel incremental approach for 3D line segment extraction. This approach reduces a 3D line segment fitting problem into two 2D line segment fitting problems and takes advantage of both images and depth maps. In our method, 3D line segments are fitted incrementally along detected edge segments via minimizing fitting errors on two planes. By clustering the detected line segments, the resulting 3D representation of the scene achieves a good balance between compactness and completeness. Our experimental results show that the 3D line segments generated by our method are highly accurate. As an application, we demonstrate that these line segments greatly improve the quality of 3D surface reconstruction compared to a feature point based baseline.
|
|
15:00-17:00, Paper WePMP.77 | |
Robust Locality-Constrained Label Consistent KSVD by Joint Sparse Embedding |
Zhang, Zhao | Soochow Univ |
Jiang, Weiming | Soochow Univ. the School of Computer Science and Tech |
Li, Sheng | Nanjing Univ. of Posts and Telecommunications |
Qin, Jie | ETH Zurich |
Liu, Guangcan | Cornell |
Yan, Shuicheng | National Univ. of Singapore |
Keywords: Learning-based vision, Image and video coding, Classification
Abstract: We mainly propose a robust Embedded Locality-Constrained Label Consistent Dictionary Learning (ELC2DL) framework for discriminative classification. ELC2DL improves the representation and classification performance by performing DL in the noise-removed sparse embedding space, since most real data often contains noise and performing DL over noisy data for reconstruction may decrease performance potentially. To reduce the noise in data, our model computes a sparse projection jointly for noise reduction and then uses the noise-removed data for DL. By incorporating a noise-reduction term with a discriminative locality-constrained label consistent term that associates the label information with each dictionary atom to preserve local structure of training data, a noise-reduction projection, an over-complete dictionary and discriminative sparse codes are obtained jointly. Simulations on several image databases show that our algorithm can deliver enhanced performance over other state-of-the-arts.
|
|
15:00-17:00, Paper WePMP.78 | |
Perceptual Face Completion Using a Local-Global Generative Adversarial Network |
Ma, Ruijun | Sun Yat-Sen Univ |
Hu, Haifeng | Sun Yat-Sen Univ |
Keywords: Mid-level vision, Learning-based vision, Inpainting
Abstract: Face completion is one of the most challenging problems, as the reconstruction algorithm should render the missing pixels with semantically plausible contents. Recent methods have achieved promising advances in photorealistic human face synthesis. However, these approaches are limited to fix highly-structured images because they cannot faithfully preserve the identity information in the training process. In this paper, we propose a Two-Pathway Perceptual Generative Adversarial Network (TPP-GAN) for face completion by learning identity-preserving representations from both global structures and local details of a face. We combine a reconstruction network and a perceptual network containing two pathways adversarial networks (local and global) into our framework to ensure the transfer of the prominent features for identity classification to the occluded parts, encouraging a high identity preserving quality of synthesis results. Experimental results well demonstrate that our proposed framework not only generates locally semantic as well as globally consistent fragments, but also outperforms existing methods on unaligned faces and synthesis of part components.
|
|
15:00-17:00, Paper WePMP.79 | |
Joint Identification-Verification Model for Visual Tracking |
Wu, Min | Air Force Engineering Univ |
Zha, Yufei | Air Force Engineering Univ |
Zhang, Yuanqinag | Air Force Engineering Univ |
Ku, Tao | Air Force Engineering Univ |
Zhang, Lichao | Air Force Engineering Univ |
Chen, Bin | Air Force Engineering Univ |
Keywords: Motion and tracking, Deep learning, Applications of pattern recognition and machine learning
Abstract: Similarity tracking algorithm determine the location of the target by the similarity of the template and the candidate, the most similar candidate of the template is considered as the target in visual tracking. Most trackers only take usage of the intra-class similarity, yet the inter-class separability is ignored. In this paper, a joint identification -verification model is proposed to learn the similarity with the category attribution for visual tracking. The approach constructs the cost function both on the inter-class category and intra-class similarity. Then, the training dataset is fed into the network, the discriminative features are learned in the embedding space. During tracking, the template and candidates are fed into the network simultaneously, and in the learned embedding space, the object can be located correctly by the similarity metric between the template and candidates. We evaluate the proposed approach on the tracking benchmark: OTB50 and UAV123 dataset. A large number of experimental results show that the inter-class category can increase the discrimination for the similar distractors effectively, and bootstrap the tracking performance of the trackers based on the similarity learning.
|
|
15:00-17:00, Paper WePMP.80 | |
Incremental Kernel Null Foley-Sammon Transform for Person Re-Identification |
Huang, Xinyu | Aviation Univ. of Airforce |
Xu, Jiaolong | Computer Vision Center |
Guo, Gang | Aviation Univ. of Airforce |
Keywords: Object recognition, Online learning, Visual surveillance
Abstract: Person re-identification (Re-ID) is an important technique for video surveillance and security systems. Most existing Re-ID methods assume fixed size of training data. Given newly collected training data, the models have to be re-trained from scratch with both new and old data, which is time-consuming. Accelerating the training speed with ever-increasing data is desired and critical for Re-ID. In this work, we propose to apply incremental learning to address this problem. We build the Re-ID model based on the null Foley-Sammon transform (NFST) method. The idea is to extract new information from newly-added samples and integrate it with the existing NFST-trained model by an efficient updating scheme. We derived the incremental learning algorithm for both the non-kernelization and kernelization version of NFST. Extensive experiments have been carried on three public datasets, including VIPeR, PRID2011 and CUHK01. The results show that our proposed method can achieve comparable accuracy to the batch learning method while significantly reduces the computational complexity.
|
|
15:00-17:00, Paper WePMP.81 | |
Simultaneous Context Feature Learning and Hashing for Large Scale Loop Closure Detection |
Fu, Zhiheng | Coll. of Electronic Science, National Univ. of Defense Te |
Guo, Yulan | National Univ. of Defense Tech |
An, Wei | National Univ. of Defense Tech |
Keywords: Vision for robotics, Deep learning, Clustering
Abstract: Visual loop closure is important in pose tracking and relocalization in many robotics and Argument Reality (AR) systems. For large and highly repetitive environments, sparse keypoint-based methods face several challenges, especially the discriminability of descriptors. In this paper, we propose an augmented descriptor by combining ORB feature and the context descriptor to increase its discriminability and matching performance. An end-to-end network is adopted to perform simultaneous feature learning and code hashing for the context. In addition, feature position clustering is used to reduce the number of contexts. Besides, hash mapping is adopted to reduce the dimensionality of ORB features. Finally, the context descriptors and ORB features with dimensionality reduction are stacked. Experimental results on the NewCollege and TUM datasets demonstrate that our algorithm achieves higher precision/recall and faster speed than the original algorithm proposed by Antonio et al.[1]
|
|
15:00-17:00, Paper WePMP.82 | |
From Text to Video: Exploiting Mid-Level Semantics for Large-Scale Video Classification |
Zhang, Ji | Inst. of Artificial Intelligence and Robotics, Xi'an Jiaoton |
Mei, Kuizhi | Inst. of Artificial Intelligence and Robotics, Xi'an Jiaoton |
Wang, Xiao | Inst. of Artificial Intelligence and Robotics, Xi'an Jiaoton |
Zheng, Yu | Xidian Univ |
Fan, Jianping | Univ. of North Carolina - Charlotte |
Keywords: Video analysis, Classification, Video processing and analysis
Abstract: Automatically classifying large scale of video data is an urgent yet challenging task. To bridge the semantic gap between low-level features and high-level video semantics, we propose a method to represent videos with their mid-level semantics. Inspired by the problem of text classification, we regard the visual objects in videos as the words in documents, and adapt the TF-IDF word weighting method to encode videos by visual objects. Some extensions upon the proposed method are also made according to the characteristics of videos. We integrate the proposed semantic encoding method with the popular two-stream CNN model for video classification. Experiments are conducted on two large-scale video datasets, CCV and ActivityNet. The experimanetal results validates the effectiveness of our method.
|
|
15:00-17:00, Paper WePMP.83 | |
Occlusion Handling Human Detection with Refocused Images |
Kataoka, Hirokatsu | National Inst. of Advanced Industrial Science and Tech |
Shuhei, Ohki | AIST, Tsukuba Univ |
Iwata, Kenji | National Inst. of Advanced Industrial Science and Tech |
Satoh, Yutaka | National Inst. of Advanced Industrial Science and Tech |
Keywords: Vision sensors, Object detection, Object recognition
Abstract: The paper presents a novel robust human detection method based on camera array system to broaden the application range for human detection. Currently, even by using a deep neural network (DNN), it is difficult to detect a hardly occluded human. In the camera array system, we consider how to distinctly show a human occluded by an environmental condition. The generated refocused images by the camera array system allow us to remove the effect of the noises. Although refocused images have not been utilized in conventional human detection, we believe that the refocused images are beneficial for improving the detection performance, especially in severe conditions. To execute the experiments, we have collected Refocused Human DataBase (RHDB) with the camera array system. By using HOG+SVM with a monocular camera (at an almost random rate of 54.8%), the refocused images made the +10.1% improvement (64.9%) by noticeably showing a human. The combined representation of refocused images and AlexNet achieved 94.6% on the RHDB. Moreover, our final model recorded 98.0% with an attention-layer and fine-tuned parameters.
|
|
15:00-17:00, Paper WePMP.84 | |
Fourier Transform Based Features for Clean and Polluted Water Image Classification |
Wu, Xuerong | Nanjing Univ |
Palaiahnakote, Shivakumara | National Univ. of Singapore |
Zhu, Liping | Nanjing Univ |
Zhang, Hualu | NARI GROUP Corp. GRID ELECTRIC POWER Res. Inst |
Shi, Jie | NARI GROUP Corp. GRID ELECTRIC POWER Res. Inst |
Lu, Tong | State Key Lab. for Software Tech. Nanjing Univ |
Pal, Umapada | Indian Statistical Inst |
Blumenstein, Michael | Univ. of Tech. Sydney |
Keywords: Image classification, Applications of computer vision
Abstract: Water image classification is challenging because water images of ocean or river share the same properties with images of polluted water such as fungus, waste and rubbish. In this paper, we present a method for classifying clean and polluted water images. The proposed method explores Fourier transform based features for extracting texture properties of clean and polluted water images. Fourier spectrum of each input image is divided into several sub-regions based on angle and spatial information. For each region over the spectrum, the proposed method extracts mean and variance features using intensity values, which results in a feature matrix. The feature matrix is then passed to an SVM classifier for the classification of clean and polluted water images. Experimental results on classes of clean and polluted water images show that the proposed method is effective. Furthermore, a comparative study with the state-of-the-art method shows that the proposed method outperforms the existing method in terms of classification rate, recall, precision and F-measure.
|
|
15:00-17:00, Paper WePMP.85 | |
A Robust and Efficient Method for License Plate Recognition |
Meng, Ajin | Univ. of Science and Tech. of China |
Yang, Wei | Univ. of Science and Tech. of China |
Xu, Zhenbo | Univ. of Science and Tech. in China |
Huang, Huan | Xingtai Financial Holdings Group Co., Ltd |
Huang, Liusheng | Univ. of Science and Tech. of China |
Ying, Changchun | Xingtai Financial Holdings Group Co., Ltd |
Keywords: Object detection, Object recognition, Image processing and analysis
Abstract: License plate recognition is an essential step in automatic license plate recognition since it is a key technology to recognize detected license plates. Though there are extensive researches on license plate recognition, it is still challenging to recognize license plates under conditions like great tilt angles, uneven illuminations, and distortions. Based on the observation that an accurate shape correction can significantly improve the recognition accuracy on these images, this paper proposes a robust methodology named LCR for license plate recognition free of conventional image analysis operations. This approach is based on three neural networks for three different purposes: (i) predicting the locations of four vertices; (ii) predicting cutting locations; (iii) character classification. To the best of our knowledge, LCR is the first to address shape correction by designing neural networks to accurately predict the coordinates of license plates vertices. Experiments on over 250,000 unique images show that LCR significantly outperforms several state-of-the-art license plate recognition approaches. Moreover, in evaluations, the application of shape correction significantly improve the recognition accuracy.
|
|
15:00-17:00, Paper WePMP.86 | |
Object-Adaptive LSTM Network for Visual Tracking |
Du, Yihan | Xiamen Univ |
Yan, Yan | Xiamen Univ |
Chen, Si | Xiamen Univ. of Tech |
Hua, Yang | Queen’s Univ. Belfast |
Wang, Hanzi | Xiamen Univ |
Keywords: Motion and tracking, Video analysis, Deep learning
Abstract: Convolutional Neural Networks (CNNs) have shown outstanding performance in visual object tracking. However, most of classification-based tracking methods using CNNs are time-consuming due to expensive computation of complex online fine-tuning and massive feature extractions. Besides, these methods suffer from the problem of over-fitting since the training and testing stages of CNN models are based on the videos from the same domain. Recently, matching-based tracking methods (such as Siamese networks) have shown remarkable speed superiority, while they cannot well address target appearance variations and complex scenes for inherent lack of online adaptability and background information. In this paper, we propose a novel object-adaptive LSTM network, which can effectively exploit sequence dependencies and dynamically adapt to the temporal object variations via constructing an intrinsic model for object appearance and motion. In addition, we develop an efficient strategy for proposal selection, where the densely sampled proposals are firstly pre-evaluated using the fast matching-based method and then the well-selected high-quality proposals are fed to the sequence-specific learning LSTM network. This strategy enables our method to adaptively track an arbitrary object and operate faster than conventional CNN-based classification tracking methods. To the best of our knowledge, this is the first work to apply an LSTM network for classification in visual object tracking. Experimental results on OTB and TC-128 benchmarks show that the proposed method achieves state-of-the-art performance, which exhibits great potentials of recurrent structures for visual object tracking.
|
|
15:00-17:00, Paper WePMP.87 | |
Context-Aware and Depthwise-Based Detection on Orbit for Remote Sensing Image |
Fu, Yanmei | Inst. of Software Chinese Acad. of Sciences |
Wu, Fengge | Inst. of Software Chinese Acad. of Sciences |
Zhao, Junsuo | Inst. of Software Chinese Acad. of Sciences |
Keywords: Object detection, Image processing and analysis, Deep learning
Abstract: Automatic detection on orbit is an efficient way to filter useless data downloaded to the ground. However, detection on orbit is a challenging task due to limited computational resources on the satellite. In this paper, a context-aware and depthwise-based detection framework for remote sensing images is proposed which can be used on orbit. In the result of limited computational resources on the satellite, on-orbit object detection should detect with low memory cost and fast speed while ensuring the accuracy. To address the problem of small model in the process of feature extracting, a depthwise convolution is applied instead of typical convolution. In this light, a small deep neural network is built to run on orbit, using Single Shot Multibox Detector (SSD) as basic detection module. Motivated by its weak performance on remote sensing image owing to few pixel about target object, context information about target object is added to improve performance. To further investigate the context information influence, we add a balance factor to balance the context information and background noise it brings. Then an experiment on real remote sensing image dataset is conducted comparing our extended model with other current state-of-the-art detection models. Results show our extended model outperforms other models in accuracy and speed. Deploying the pre-trained model on the Android Platform with only 60M memory cost confirms the feasibility to detect on orbit. This detection system is to be verified on the TZ-1 satellite which will be launched in the year of 2018.
|
|
15:00-17:00, Paper WePMP.88 | |
Which Content in a Booklet Is He/she Reading? Reading Content Estimation Using an Indoor Surveillance Camera |
Kawanishi, Yasutomo | Nagoya Univ |
Murase, Hiroshi | Nagoya Univ |
Xu, Jianfeng | KDDI Res. Inc |
Tasaka, Kazuyuki | KDDI Res. Inc |
Yanagihara, Hiromasa | KDDI Res. Inc |
Keywords: Behavior recognition, Visual surveillance, Image classification
Abstract: In this paper, we propose a method for estimating reading content in a booklet using an image captured by an indoor surveillance camera. Here, we assume that a reading content can be specified by estimating followings; what booklet, which page of the booklet, and which region in the page. We propose a reading booklet/page estimation method based on image search, and a reading region estimation method focusing on the body pose of the reader. We evaluated the method as a 44 classes classification problem, which consists of eleven pages of booklets and four regions in each pages. We achieved 25.6% in accuracy of the reading content estimation.
|
|
15:00-17:00, Paper WePMP.89 | |
Hybrid Sparse Subspace Clustering for Visual Tracking |
Ma, Lin | Samsung Company |
Liu, Zhihua | Samsung |
Keywords: Motion and tracking, Clustering, Object detection
Abstract: In many conditions, the object samples are distributed in a number of different subspaces. By segmenting the subspaces with spectral clustering based subspace clustering, more accurate sample distribution is obtained. The LSR (Least Squares Regression) sparse subspace clustering method which fulfills the EBD (Enhance Block Diagonal) criterion and has closed-form solution, is an important spectral clustering based sparse subspace clustering method. However, LSR uses no discriminative information which is important to discriminate positive samples from the negative samples. Thus, we propose a new hybrid sparse subspace clustering method which makes the clustering discriminative by involving the discriminative information provided by graph embedding into LSR. The sub subspaces obtained based on the new subspace clustering method can both retain the object distribution information and also make the object samples less confused with surrounding environment. Experimental results on a set of challenging videos in visual tracking demonstrate the effectiveness of our method in discriminating the object from the background.
|
|
15:00-17:00, Paper WePMP.90 | |
A Co-Occurrence Background Model with Hypothesis on Degradation Modification for Object Detection in Strong Background Changes |
Zhou, Wenjun | Graduate School of Information Science and Tech. Hokkaido |
Kaneko, Shun'ichi | Hokkaido Univ |
Hashimoto, Manabu | Chukyo Univ |
Satoh, Yutaka | National Inst. of Advanced Industrial Science and Tech |
Liang, Dong | Nanjing Univ. of Aeronautics and Astronautics |
Keywords: Object detection, Video analysis, Low-level vision
Abstract: Object detection has become an indispensable part of video processing and current background models are sensitive to background changes. In this paper, we propose a novel background model using an algorithm called Co-occurrence Pixel-block Pairs (CPB) against background changes, such as illumination changes and background motion. We utilize the co-occurrence ``pixel to block" structure to extract the spatial-temporal information of each pixel to build background model, and then employ an efficient evaluation strategy to identify the current state of each pixel, which is named as correlation dependent decision function. Furthermore, we also introduce a Hypothesis on Degradation Modification (HoD) into CPB structure to reinforce the robustness of CPB. Experimental results obtained from the dataset of the PETS 2001, AIST-Indoor, SBMnet and CDW-2012 databases show that our models can detect objects robustly in strong background changes.
|
|
15:00-17:00, Paper WePMP.91 | |
High-Quality and Memory-Efficient Volumetric Integration of Depth Maps Using Plane Priors |
Liu, YangDong | National Lab. of Pattern Recognition, Inst. of Automat |
Gao, Wei | Inst. of Automation, Chinese Acad. of Sciences |
Hu, Zhanyi | Inst. of Automation, Chinese Acad. of Sciences |
Keywords: 3D reconstruction, Image based modeling
Abstract: Volumetric integration method is widely used to fuse depth maps in dense 3D reconstruction systems. High memory footprint is one of its main disadvantages. We introduce a method to de-noise depth maps and save memory usage during volumetric integration of depth maps with the use of plane priors. We develop a new planar region detection method with the use of depth gradients and then de-noise the planar region of depth maps. During volumetric integration we allocate the voxels and integrate depth maps with the use of plane priors as well. Extensive experiments show that our method saves approximately 30% memory footprint and has higher reconstruction quality compared with some of the current state-of-the-art systems. These characteristics enable our method to be used for 3D scanning on mobile devices which have limited memory resources.
|
|
15:00-17:00, Paper WePMP.92 | |
Appearance Variation Insensitive State Regression for Visual Tracking |
Ma, Lin | Samsung Company |
Liu, Zhihua | Samsung |
Keywords: Motion and tracking, Regression, Object detection
Abstract: In visual tracking, many methods first sample a set of candidate states and then select the optimal state with the best evaluation value. In this way the tracking avoids trapping in local optimum. However, the obtained state is not accurate when the appearance suffers from large challenges or the sample number is small, while the prediction information provided by the surrounding candidates is useful to improve the robustness of state determination. Thus, in this paper we propose a new object localization method which infers the object state by state regression of surrounding states. By acquiring the state weights according to two constraints, i.e. the constraint of representing the confidence of single state and the constraint of approximate states having approximate weights, the sensitivity to appearance variation in state regression is reduced. Experimental results on a set of benchmark videos demonstrate the robustness of the proposed method.
|
|
15:00-17:00, Paper WePMP.93 | |
Online Temporal Calibration of Camera and IMU Using Nonlinear Optimization |
Liu, Jinxu | Inst. of Automation, Chinese Acad. of Sciences |
Gao, Wei | Inst. of Automation, Chinese Acad. of Sciences |
Hu, Zhanyi | Inst. of Automation, Chinese Acad. of Sciences |
Keywords: Vision for robotics, Motion and tracking
Abstract: In this paper, we aim to calibrate the time delay of timestamps of cameras and IMU measurements provided by Android smart phones or other low-cost devices whose camera and IMU are not temporally aligned. The time delay is estimated online in an iterative way through nonlinear optimization in sliding windows. We add new terms that are relative to time delay to the pre-integration results of IMU measurements instead of feature observations in order to improve the precision of temporal calibration. The experimental results indicate that our calibration result is closer to the real value compared with the state-of-the-art system and that our method appears to converge faster. By using our temporal calibration, the visual inertial odometry algorithm is less likely to suffer from fast turning or sudden stop.
|
|
15:00-17:00, Paper WePMP.94 | |
Local Regression Based Hourglass Network for Hand Pose Estimation from a Single Depth Image |
Li, Jia | Univ. of Science and Tech. of China |
Wang, Zengfu | Univ. of Science and Tech. of China |
Keywords: Mid-level vision, Deep learning, Pattern recognition for human computer interaction
Abstract: Hand pose estimation plays an important role in many applications such as human-computer interaction. With the advent of commodity depth sensors and the developments of deep learning, noticeable improvements have been made in this field recently. Nevertheless, the accuracy and robustness of existing approaches are still dissatisfying. In this paper, we propose an end-to-end local regression based hourglass network with a modified loss function to estimate the 3D pose of the hand in a depth image. We use a third order hourglass block to extract features of the hand. At the top of our network, we slice the feature map into several regions and regress the regions independently first. Then, we merge the regression results and feed them to the final regressor. Besides, we compare performances of different loss functions for the task. The results indicate that the structure of the network and the loss function designed here lead to an obvious improvement. And the proposed approach is comparable to, or superior to the state-of-the-art on a public challenging dataset. Our system can run at over 910 FPS on a single GPU, and the mean error of estimation is reduced to 12.36 mm.
|
|
15:00-17:00, Paper WePMP.95 | |
Action Recognition Method Based on Sets of Time Warped ARMA Models |
Sogi, Naoya | Univ. of Tsukuba |
Fukui, Kazuhiro | Univ. of Tsukuba |
Keywords: Behavior recognition, Classification, Sequence modeling
Abstract: In this paper, we propose a novel method for recognizing human actions from sequential body skeleton data. Our method is based on ARMA (Autoregressive Mean Average) model, which is constructed from the matrix of 3D joint positions time-series. The intrinsic structure of an action can be compactly summarized by the observability matrix of the ARMA model. Since the column vectors of an observability matrix span a subspace, given two ARMA models, we can measure the similarity between them by the canonical angles between the corresponding subspaces. This framework based on subspace representation is useful for action recognition. However, it does not work well when handling various actions with different action speeds, since optimal row size of each observability matrix depends on the action speed. To address this limitation, we perform a random sampling operation to the row elements in each observability matrix, while preserving the order of the elements. By repeating this operation, we generate a set of various time-warped ARMA models with various local motion speeds. The essence of this idea is that a whole set of such time-warped ARMA models is invariant to the changes in action speed. Furthermore, to construct an effective classification framework, we applied Grassmann discriminant analysis to the time-warped ARMA models. The effectiveness of the proposed method is demonstrated through comparison experiments with state-of-the-art methods on two public datasets: MSR 3D action dataset and UT-Kinect dataset.
|
|
15:00-17:00, Paper WePMP.96 | |
Multi-Spectral Fusion and Denoising of RGB and NIR Images Using Multi-Scale Wavelet Analysis |
Jung, Cheolkon | Xidian Univ |
Su, Haonan | Xidian Univ |
Keywords: Computational photography, Enhancement, restoration and filtering, Image processing and analysis
Abstract: In this paper, we propose multi-spectral fusion and denoising (MFD) of RGB and NIR images using multi-scale wavelet analysis. We formulate MFD of RGB and NIR images as a maximum a posterior (MAP) estimation problem in the wavelet domain. The direct fusion of noisy RGB and NIR image often leads to contrast attenuation due to the discrepancy between RGB and NIR images. Thus, we generate the wavelet scale map for fusion and denoising based on correlation between NIR and RGB wavelet coefficients. To consider local contrast and visibility of NIR data on RGB components, we provide the contrast preservation term for scale map estimation based on the local contrast and visibility. We use the regularization term to select high visibility and contrast of NIR wavelet coefficients in the scale map. Since noise generally appears in the high frequency band, we use gradients of NIR wavelet coefficients as the weight for weighted least square (WLS) smoothing in the scale map. Based on the wavlet scale map, we perform fusion and denoising of RGB and NIR wavelet coefficients. Experimental results show that the proposed method successfully performs fusion of RGB and NIR images with noise reduction and detail preservation as well as outperforms state-of-the-arts in terms of discrete entropy (DE) and feature-based blind image quality evaluator (FBIQE).
|
|
15:00-17:00, Paper WePMP.97 | |
Visual Localization in Changing Environments Using Place Recognition Techniques |
Xin, Zhe | Inst. of Automation, Chinese Acad. of Sciences |
Cai, Yinghao | Chinese Acad. of Sciences |
Cai, Shaojun | UISEE Tech. Beijing Co., Ltd |
Zhang, Jixiang | Inst. of Automation, Chinese Acad. of Sciences |
Yang, Yiping | Inst. of Automation, Chinese Acad. of Sciences |
Wang, Yanqing | Inst. of Automation, Chinese Acad. of Sciences |
Keywords: Vision for robotics, Applications of computer vision, Neural networks
Abstract: This paper proposes a visual localization system combining Convolutional Neural Networks (CNNs) and sparse point features to estimate the 6-DOF pose of the robot. The challenges of visual localization across time lie in that the same place captured across time appears dramatically different due to different illumination and weather conditions, viewpoint variations and dynamic objects. In this paper, a novel CNN-based place recognition approach is proposed, which requires no time-consuming feature generation process and no tasks-specific training. Moreover, we demonstrate that the rich semantic context information obtained from place recognition can greatly improve the subsequent feature matching process for pose estimation. The semantic constraint performs much better than traditional Bag-of-Words based methods for establishing correspondences between the query image and the map. To evaluate the robustness of the algorithm, the proposed system is integrated into ORB-SLAM2 and verified on the data collected over various illumination and weather conditions. Extensive experimental results show that even with weak ORB descriptors, the proposed system can significantly improve the success rate of localization under severe appearance changes.
|
|
15:00-17:00, Paper WePMP.98 | |
Probabilistic Voting for Sequence Based Visual Place Recognition |
Xin, Zhe | Inst. of Automation, Chinese Acad. of Sciences |
Cai, Yinghao | Chinese Acad. of Sciences |
Zhang, Jixiang | Inst. of Automation, Chinese Acad. of Sciences |
Yang, Yiping | Inst. of Automation, Chinese Acad. of Sciences |
Wang, Yanqing | Inst. of Automation, Chinese Acad. of Sciences |
Keywords: Vision for robotics, Applications of computer vision
Abstract: Visual place recognition is the task of recognizing the query image in a set of dataset images. It is a challenging problem in computer vision due to frequent and unpredictable environmental changes. In this paper, a novel approach is proposed for visual place recognition. We consider the problem of visual place recognition as a probabilistic voting problem on coherent image sequences. According to the co visibility relationship of images in the dataset, each query can be represented by a categorical variable. Therefore, the whole sequence is the distribution of several independent and non-identical categorical variables. Introducing the probabilistic framework not only removes the need for heuristic parameters but also recognizes location efficiently and effectively. Two widely used datasets are used to evaluate the performance of the proposed method. The probabilistic voting algorithm achieves superior performance compared with state-of-the-art methods and satisfies the real-time requirement.
|
|
15:00-17:00, Paper WePMP.99 | |
Convolutional Features-Based CRF Graph Matching for Tracking of Densely Packed Cells |
Qian, Weili | Hunan Univ |
Wei, Yangliu | Hunan Univ |
Wang, Xueping | Hunan Univ |
Liu, Min | Hunan Univ |
Keywords: Motion and tracking, Biological image and signal analysis, Deep learning
Abstract: The tracking of plant cells across large-scale microscopy image sequences is very challenging, because plant cells are densely packed in a specific honeycomb structure, and the microscopy images can be randomly translated, rotated and scaled in the imaging process. This paper proposes a convolutional features-based conditional random field (CRF) graph matching method to track plant cells in unregistered image sequences, by exploiting deep features extracted from deep convolutional neural networks and tight spatial topology feature of neighboring cells as contextual information. Because the extracted convolutional feature and spatial topology feature are resilient to image translation, rotation and scaling, the proposed CRF matching approach is able to track plant cells across unregistered image sequences. Compared with other plant cell tracking methods, the experimental results show that the proposed method improves the tracking accuracy rate by about 30% in the unregistered cell image sequences.
|
|
15:00-17:00, Paper WePMP.100 | |
SPCNet: Scale Position Correlation Network for End-To-End Visual Tracking |
Wang, Qiang | Inst. of Automation, Chinese Acad. of Sciences, Beijing, C |
Gao, Jin | Inst. of Automation Chinese Acad. of Sciences |
Zhang, Mengdan | Chinese Acad. of Sciences |
Xing, Junliang | Inst. of Automation, Chinese Acad. of Sciences |
Hu, Weiming | National Lab. of Pattern Recognition, Inst |
Keywords: Motion and tracking
Abstract: We present a novel Scale Position Correlation Network (SPCNet) for learning to track objects robustly and efficiently. Different from most previous Correlation Filter (CF) based tracking models, SPCNet unifies the feature representation learning and CF based appearance modeling within one end-to-end learnable framework. In particular, SPCNet learns to track objects within a joint scale-position space, and is very effective in learning features for the accurate prediction of object scale and position. To learn our model from end to end, the SPCNet introduces a differentiable correlation filter layer into a Siamese architecture. Therefore, the localization error can be effectively back-propagated through the whole network, enabling fast adaptation of feature learning and appearance modeling for the objects to be tracked. Such task driven feature learning admits a very lightweight design that can be efficiently pre-trained. In addition, the dense appearance modeling in the joint scale-position space is also efficient. It benefits from the computation of gradients within the Fourier frequency domain. Such careful architecture design ensures that SPCNet is effective and efficient with a small model size. Extensive experimental analyses and evaluations on three largest benchmarks, OTB-2013, OTB-2015, and VOT2015, demonstrate its superiority over many state-of-the-art algorithms.
|
|
15:00-17:00, Paper WePMP.101 | |
Online Multi-Target Tracking with Tensor-Based High-Order Graph Matching |
Zhou, Zongwei | Inst. of Automation, Chinese Acad. of Sciences |
Xing, Junliang | Inst. of Automation, Chinese Acad. of Sciences |
Zhang, Mengdan | Chinese Acad. of Sciences |
Hu, Weiming | National Lab. of Pattern Recognition, Inst |
Keywords: Motion and tracking, Applications of computer vision
Abstract: In this paper we formulate multi-target tracking (MTT) as a high-order graph matching problem and propose a l1-norm tensor power iteration solution. Concretely, the search for trajectory-observation correspondences in MTT task is casted as a hypergraph matching problem to maximize a multilinear objective function over all permutations of the associations. This function is defined by a tensor representing the affinity between association tuples where pair-wise similarities, motion consistency and spatial structural information can be embedded expediently. To solve the matching problem, a dual-direction unit l1-norm constrained tensor power iteration algorithm is proposed. Additionally, as measuring the appearance affinity with features extracted from the rectangle patch, which is adopted in most methods, has a weak discrimination when bounding boxes overlap each other heavily, we present a deep pair-wise appearance similarity metric based on object mask in this paper where just the features from true target region are utilized. Experimental evaluation shows that our approach achieves an accuracy comparable to state-of-the-art online trackers1. Our code will be made available soon.
|
|
15:00-17:00, Paper WePMP.102 | |
Dense Receptive Field for Object Detection |
Yao, Yongqiang | Beijing Univ. of Posts and Telecommunications |
Dong, Yuan | Beijing Univ. of Posts and Telecommunications |
Huang, Zesang | Beijing Univ. of Posts and Telecommunications |
Bai, Hongliang | Beijing Faceall Co.Ltd |
Keywords: Object detection, Deep learning, Neural networks
Abstract: Current one-stage single-shot detectors such as DSSD and StairNet based on aggregating context information from multiple scales have shown promising accuracy. However, existing multi-scale context fusion techniques are insufficient for detecting objects of different scales. In this paper, we investigate how to detect different objects with different scales with respect to accuracy-vs-speed trade-off. We propose a novel single-shot based detector, called DRFNet which fuses feature maps with different sizes of the receptive field to boost the detection accuracy. Our final model DRFNet detector unifies comprehensive context information from various receptive fields effectively to enable it to detect objects in different sizes with higher accuracy. Experimental results on PASCAL VOC 2007 benchmark (79.6% mAP, 68 FPS) demonstrate that DRFNet is better than other state-of-the-art one-stage detectors similar to FPN. Code will be made publicly available soon.
|
|
15:00-17:00, Paper WePMP.103 | |
Online Learning of Spatial-Temporal Convolution Response for Robust Real-Time Tracking |
Zhou, Jinglin | People's Public Security Univ. of China |
Wang, Rong | People's Public Security Univ. of China |
Ding, Jianwei | People's Public Security Univ. of China |
Keywords: Motion and tracking, Occlusion and shadow detection
Abstract: The challenges of generic visual tracking have attracted great attentions. However, it is still difficult for most of the existing trackers to track objects accurately on real-time occasion. We propose a framework which integrate a verifying mechanism and a correcting mechanism to improve the accuracy of real-time tracking. Under online learning, both target location and sample model update in parallel. Validations are carried out in every frame according to spatial-temporal convolution response. Furthermore, a correcting mechanism would be activated when the current tracking results considered to be unreliable. Synchronously, an online target model updating strategy is constructed to filter the contributive samples, which makes the sample model update confidently. The proposed tracker is evaluated on four popular benchmarks, achieving a state-of-the-art performance while runs at real-time speed.6
|
|
15:00-17:00, Paper WePMP.104 | |
Partial Descriptor Update and Isolated Point Avoidance Based Template Update for High Frame Rate and Ultra-Low Delay Deformation Matching |
Xu, Yuhao | Waseda Univ |
Hu, Tingting | Waseda Univ |
Du, Songlin | Waseda Univ |
Ikenaga, Takeshi | Waseda Univ. Japan |
Keywords: Motion and tracking, Video processing and analysis, Applications of computer vision
Abstract: High frame rate and ultra-low delay matching system plays an important role in various human-machine interactive applications, which demands better performance in matching deformable and out-of-plane rotating objects. Although many algorithms have been proposed for deformation tracking and matching, few of them are suitable for hardware implementation due to complicated operations and large time consumption. This paper proposes a hardware-oriented template update method for high frame rate and ultra-low delay deformation matching system. In the proposed method, the new template is generated in real time by partially updating the template descriptor and adding new keypoints simultaneously with the matching process in pixels, and incorrect boundary points are avoided when judged as isolated with distance-reachability to solve the problem of template drift. Evaluation results indicate that the proposed method successfully supports the real-time processing of the 784fps and 640*480 resolution system on field-programmable gate array (FPGA), with a delay of 0.808ms/frame, as well as achieves satisfactory deformation matching results in comparison with other general methods.
|
|
15:00-17:00, Paper WePMP.105 | |
Human Routine Change Detection Using Bayesian Modelling |
Xu, Yangdi | Univ. of Bristol |
Damen, Dima | Univ. of Bristol |
Keywords: Video analysis, Behavior recognition, Applications of pattern recognition and machine learning
Abstract: Automatic discovery of changes in a human’s routine is one of the requirements for the future of smart home living, and its contribution to the E-health of the community. In this paper, a Bayesian modelling approach is used which models routine change discovery as a pairwise model selection problem. The method is evaluated on a collected office kitchen dataset that captures snapshots of the routine of the same person over multiple years (2014-2017). The results show that our method is able to detect not only the presence of routine changes, but also which activity patterns have been changed, fully automatically, and in a fully unsupervised manner. Moreover, changes within the same activity pattern can be discovered. Interestingly, discovered changes demonstrate subtle variations that are missed by the visual inspection of a human observer.
|
|
15:00-17:00, Paper WePMP.106 | |
Attention-Based Neural Network for Traffic Sign Detection |
Zhang, Jing | Nanjing Univ. of Science and Tech |
Hui, Le | Nanjing Univ. of Science and Tech |
Lu, Jianfeng | Nanjing Univ. of Science & Tech |
Zhu, Yuhua | Nanjing Univ. of Science and Tech |
Keywords: Object detection, Neural networks, Deep learning
Abstract: Existing object detection pipelines can show superior performance for large objects with high resolution but fail to detect very small objects such as traffic signs. So, detecting traffic signs is a proverbially challenging problem. In this paper, we propose a novel end-to-end architecture that improves small object detection by combining Faster R-CNN with the attention mechanism. Specifically, we focus on channel-wise features and utilize the attention mechanism to enhance the feature responses by explicitly modeling the interdependencies between channel-wise features. Finally, the regression of bounding boxes and the classification of traffic signs are generated after selecting the discriminative features by the attention mechanism. Extensive evaluations of the largest traffic sign dataset demonstrate that the attention mechanism improves the performance of detecting objects, especially the small targets. For traffic sign detection task, our method achieves better performance compared with many state-of-the-art approaches on the largest traffic sign detection dataset, Tsinghua-Tencent 100K.
|
|
15:00-17:00, Paper WePMP.107 | |
Depth-Assisted RefineNet for Indoor Semantic Segmentation |
Chang, Manyu | Xiamen Univ |
Guo, Feng | Xiamen Univ |
Ji, Rongrong | Department of Computer Science, Xiamen Univ |
Keywords: Scene understanding, Deep learning
Abstract: This paper focuses on indoor semantic segmentation using RGB-D data. It has been shown that incorporating depth information into RGB information is helpful to improve segmentation accuracy. However, previous studies have revealed two problems. One is about the model size. Recent state-of-the-art methods generally build a network branch for depth images, inherently increasing the model size. The other is about boundary segmentation. The complex and various object configurations with severe occlusions influence the segmentation precision of object boundaries. To address these two problems, we propose a depth-assisted refinenet (D-RefineNet) for refining the boundary segmentation. The proposed network only uses RGB images to predict segmentation results. Depth images are just used in the proposed loss function without increasing the model size. When the depth values of adjacent pixels change drastically but the adjacent pixels have the same predicted semantic labels, the proposed loss function penalizes the predicted result. Experimental evaluations demonstrate that the proposed method is effective on two challenging RGB-D indoor datasets, NYUDv2 and SUN RGB-D.
|
|
15:00-17:00, Paper WePMP.108 | |
Robust Projective Low-Rank and Sparse Representation by Robust Dictionary Learning |
Ren, JiaHuan | Soochow Univ |
Zhang, Zhao | Soochow Univ |
Li, Sheng | Nanjing Univ. of Posts and Telecommunications |
Liu, Guangcan | Cornell |
Wang, Meng | Microsoft Res. Asia |
Yan, Shuicheng | National Univ. of Singapore |
Keywords: Learning-based vision, Image and video coding, Classification
Abstract: In this paper, we discuss the robust factorization based robust dictionary learning problem for data representation. A Robust Projective Low-Rank and Sparse Representation model (R-PLSR) is technically proposed. Our R-PLSR model integrates the L1-norm based robust factorization and robust low-rank & sparse representation by robust dictionary learning into a unified framework. Specifically, R-PLSR performs the joint low-rank and sparse representation over the informative low-dimensional representations by robust sparse factorization so that the results are more accurate. To make the factorization and representation procedures robust to noise and outliers, R-PLSR imposes the sparse L2, 1-norm jointly on the reconstruction errors based on the factorization and dictionary learning. Note that L2, 1-norm can also minimize the reconstruction error as much as possible, since the L2, 1-norm theoretically tends to force many rows of the reconstruction error matrix to be zeros. The Nuclear-norm and L1-norm are jointly used on the representation coefficients so that salient representations can be obtained. Extensive results on several image data sets show that our R-PLSR formulation can deliver superior performance over other state-of-the-arts.
|
|
15:00-17:00, Paper WePMP.109 | |
A Multi-Part Convolutional Attention Network for Fine-Grained Image Recognition |
Zhong, Weilin | Shanghai Jiao Tong Univ |
Jiang, Linfeng | Shanghai Jiao Tong Univ |
Zhang, Tao | Shanghai Jiao Tong Univ |
Ji, Jinsheng | Shanghai Jiao Tong Univ |
Xiong, Huilin | Shanghai Jiao Tong Univ |
Keywords: Applications of computer vision, Image classification, Deep learning
Abstract: The goal of fine-grained image recognition is to recognize hundreds of sub-categories affiliating to the same basic-level category (e.g., bird species). It is a highly challenging task due to the large intra-class variance and small inter-class variance. Existing approaches deal with the subtle difference among object classes via learning and localizing discriminative parts. However, most of the part localization methods follow a step-to-step manner that first localizes larger parts and then generates smaller parts from the larger ones, which is not efficient. In this paper, we present a Multi-part Convolutional Attention Network (M-CAN), which simultaneously focuses on the discriminative image parts at multiple scales. In specific, a convolutional attention based part localization network is presented to localize multi-scale parts from different layers of the deep Convolutional Neural Networks (CNN). Importantly, our part localization network requires no part annotations but only the image labels, which avoids the heavy labor of complex part labeling. We conduct comprehensive experiments and the experimental results show that, our method outperforms the state-of-the-art approaches on three challenging fine-grained datasets, including CUB-Birds, Stanford-Dogs and Stanford-Cars.
|
|
15:00-17:00, Paper WePMP.110 | |
Improving Image Classification Performance with Automatically Hierarchical Label Clustering |
Chen, Zhiqiang | Inst. of Automation, Chinese Acad. of Sciences |
Du, Changde | Inst. of Automation, Chinese Acad. of Sciences |
Huang, Lijie | Inst. of Automation, Chinese Acad. of Sciences |
Li, Dan | Inst. of Automation, Chinese Acad. of Sciences |
He, Huiguang | Inst. of Automation, Chinese Acad. of Sciences |
Keywords: Image classification, Deep learning, Clustering
Abstract: Image classification is a common and foundational problem in computer vision. In traditional image classification, a category is assigned with single label, which is difficult for networks to learn better features. On the contrary, hierarchical labels can depict the structure of categories better, which helps network to learn more hierarchical features and improve the classification performance. Though many datasets contain images with multi-labels, the labels in these datasets usually lack of hierarchy. To overcome this problem, we propose a new method to improve image classification performance with Automatically Hierarchical Label Clustering (AHLC). Firstly, AHLC calculates the similarity between each pair of original categories by how easily they are misclassified with a pre-trained classifier. Secondly, AHLC obtains hierarchical labels by merging similar categories using hierarchical clustering. Finally, AHLC trains a new classifier with hierarchial labels to improve the original classification performance. We evaluate our method on MNIST and CIFAR-100 datasets and the results demonstrate the superiority of our method. The main contribution of this work is that we can simply improve an existing classification network by AHLC without extra information or heavy architecture redesign.
|
|
15:00-17:00, Paper WePMP.111 | |
MSFD: Multi-Scale Receptive Field Face Detector |
Guo, Qiushan | Beijing Univ. of Posts and Telecommunications |
Dong, Yuan | Beijing Univ. of Posts and Telecommunications |
Guo, Yu | Beijing Univ. of Posts and Telecommunications |
Bai, Hongliang | Beijing Faceall Co.Ltd |
Keywords: Object detection, Deep learning, Neural networks
Abstract: We aim to study the multi-scale receptive fields of a single convolutional neural network to detect faces of varied scales. This paper presents our Multi-Scale Receptive Field Face Detector (MSFD), which has superior performance on detecting faces at different scales and enjoys real-time inference speed. MSFD agglomerates context and texture by hierarchical structure. More additional information and rich receptive field bring significant improvement but generate marginal time consumption. We simultaneously propose an anchor assignment strategy which can cover faces with a wide range of scales to improve the recall rate of small faces and rotated faces. To reduce the false positive rate, we train our detector with focal loss which keeps the easy samples from overwhelming. As a result, MSFD reaches superior results on the FDDB, Pascal-Faces and WIDER FACE datasets, and can run at 31 FPS on GPU for VGA-resolution images.
|
|
15:00-17:00, Paper WePMP.112 | |
Generic Calibration of Cameras with Non-Parallel Optical Elements |
Fasogbon, Peter | Nokia Tech |
Fan, Lixin | Nokia Tech |
Keywords: 3D vision
Abstract: Multiple cameras are increasingly prevalent for autonomous driving, and the increase need to have 360 degree perception of world space has led to the combination of various cameras in the varieties of narrow-angle and wide-angle field of view. This has led to issues regarding the quality of these optics as a result of bad and cheap designs. Intrinsic calibration is indispensable for accurate perception of the environment in order for it to make accurate decisions such as camera pose estimation and 3-D reconstruction. In this work, we propose a lens distortion model that has been motivated by unintentional tilt in the optical lens system. The proposed distortion model has been added to current state of art generic method of Kannala to form an extended model. To our knowledge, this is the first time that the idea of tilt distortion has been introduced for wide-angle view and fish-eye cameras. We show improved result of 4 to 13percent from experiments.
|
|
15:00-17:00, Paper WePMP.113 | |
Em-SLAM: A Fast and Robust Monocular SLAM Method for Embedded Systems |
Wu, Yirui | Hohai Univ |
Li, Zhi-Kai | National Key Lab for Novel Software Tech. Nanjing Univ |
Palaiahnakote, Shivakumara | National Univ. of Singapore |
Lu, Tong | State Key Lab. for Software Tech. Nanjing Univ |
Keywords: Vision for robotics, Motion and tracking, Applications of computer vision
Abstract: Simultaneous Localization and Mapping (SLAM) is difficult to deploy in the embedded systems due to its high computation cost and stable input requirements. Building on excellent algorithms of recent years, we present Em-SLAM, a monocular SLAM method which is fast and robust in the embedded system. We present Em-SLAM in three stages comprising initial pose estimation, iterative pose optimization and correspondences, and mapping with nearest frame queue. During the first stage, we perform stable initial pose estimation based on the matched ORB features extracted around the selected key points. Regarding initial pose and corresponding key points as input, the second stage of Em-SLAM iteratively optimizes these inputs values by tracking key points in the new frames. At the last stage, we firstly determine keyframes with the help of the proposed nearest frame queue and then design a greedy search algorithm to find matched ORB features between keyframes, which are adopted for compact and robust map reconstruction. Due to the special designs for the embedded systems, Em-SLAM demonstrates a high accurate and fast performance on the embedded system for all SLAM tasks: tracking, mapping and loop closing. We evaluate Em-SLAM on he most popular datasets by comparing with one latest SLAM method.
|
|
15:00-17:00, Paper WePMP.114 | |
Non-Negative Subspace Representation Learning Scheme for Correlation Filter Based Tracking |
Xu, Tianyang | Jiangnan Univ |
Wu, Xiaojun | Jiangnan Univ |
Kittler, Josef | Univ. of Surrey |
Keywords: Motion and tracking
Abstract: Discriminative correlation filter (DCF) based tracking methods have achieved great success recently. However, the temporal learning scheme in the current paradigm is of a linear recursion form determined by a fixed learning rate which can not adaptively feedback appearance variations. In this paper, we propose a unified non-negative subspace representation constrained leaning scheme for DCF. The subspace is constructed by several templates with auxiliary memory mechanisms. Then the current template is projected onto the subspace to find the non-negative representation and to determine the corresponding template weights. Our learning scheme enables efficient combination of correlation filter and subspace structure. The experimental results on OTB50 demonstrate the effectiveness of our learning formulation.
|
|
15:00-17:00, Paper WePMP.115 | |
Radial Lens Distortion Correction by Adding a Weight Layer with Inverted Foveal Models to Convolutional Neural Networks |
Shi, Yongjie | Peking Univ |
Zhang, Danfeng | Peking Univ |
Wen, Jingsi | Peking Univ |
Tong, Xin | Peking Univ |
Ying, Xianghua | Peking Univ |
Zha, Hongbin | Peking Univ |
Keywords: Low-level vision, Image based modeling
Abstract: Radial lens distortion often exists in images taken by commercial cameras, which does not satisfy the assumption of pinhole camera model. Eliminating the radial lens distortion of an image is necessary as a preprocessing step for many vision applications. Some paper has employed Convolutional Neural Networks (CNNs), to achieve radial distortion correction. They generated images with a large number of images of high variation of radial distortion, which can be well exploited by deep CNN with a high learning capacity, and reach the state-of-the-art results. In this paper, we claim that a weight layer with inverted foveal models can be added to these existing CNNs methods for radial distortion correction. In the widely used very deep Resnet-18 model, our method achieves about 20 percent decrease in the loss function with faster convergence compared to the previous methods.
|
|
15:00-17:00, Paper WePMP.116 | |
Semi-Supervised Learning Via Convolutional Neural Network for Hyperspectral Image Classification |
Ling, Zhigang | Hunan Univ |
Li, Xiuxin | Hunan Univ |
Zou, Wen | Hunan Univ |
Siyu, Guo | Hunan Univ |
Keywords: Image classification, Deep learning, Semi-supervised learning
Abstract: Abstract—In order to make use of unlabeled data in hyperspectral images (HSIs), a simple but effective semi-supervised learning method based on convolutional neural network (CNN) is proposed for HSIs classification. First, we define a loss function by integrating a clustering loss function for unlabeled data with softmax loss function for labeled data. Here, the labeled features extracted from CNN are not only used for training classifiers, but also providing anchors to initialize a set of clustering centers via K-means method. Then, all data are used to jointly train deep network for HSIs classification. The experimental results demonstrate that our method can achieve comparative results over the traditional supervised learning method based on CNN. Meanwhile, our method has simple network structure and can be easily trained.
|
|
15:00-17:00, Paper WePMP.117 | |
Single Shot Feature Aggregation Network for Underwater Object Detection |
Zhang, Lu | Inst. of Automation, Chines Acad. of Sciences |
Yang, Xu | Inst. of Automation, Chinese Acad. of Sciences |
Liu, Zhiyong | Inst. of Automation, Chinese Acad. of Sciences |
Qi, Lu | The Chinese Univ. of Hong Kong |
Zhou, Hao | Harbin Engineering Univ |
Charles, Chiu | School for Higher and Professional Education, Chai Wan, Hong Kon |
Keywords: Applications of computer vision, Object detection, Deep learning
Abstract: The rapidly developing ocean exploration and observation make the demand for underwater object detection become increasingly urgent. Recently, deep convolutional neural networks (CNN) have shown strong ability in feature representation and CNN-based detectors also achieve remarkable performance, but still facing the big challenge when detecting multi-scale objects in a complex underwater environment. To address this challenge, we propose a novel underwater object detector, introducing multi-scale features and complementary context information for better classification and location ability. In the auto-grabbing contest of 2017 Underwater Robot Picking Contest sponsored by National Natural Science Foundation of China (NSFC), we won the 1-st place by using proposed method for real coastal underwater object detection.
|
|
15:00-17:00, Paper WePMP.118 | |
Visual Tracking by Combining the Structure-Aware Network and Spatial-Temporal Regression |
Xu, Dezhong | Beijing Univ. of Tech |
Wu, Lifang | Beijing Univ. of Tech |
Jian, Meng | Beijing Univ. of Tech |
Wang, Qi | Beijing Univ. of Tech |
Keywords: Motion and tracking
Abstract: In this paper, we propose a novel visual tracking algorithm by combining the structure-aware network (SA-Net) and spatial-temporal regression model. We first use SA-Net to obtain the initial location proposal, and the deep features are extracted using a fine-tuned convolutional neural network model. Finally, both the location proposal and deep features, including historical information, are input into the long short-term memory (LSTM) for end-to-end spatial temporal regression to adjust the initial location proposal from SA-Net. The experimental results on the challenging OTB dataset demonstrate that the proposed scheme is robust to missing tracking caused by occlusion or object deformation. Additionally, the compared experiments show that the proposed scheme is more competitive than state-of-the-art algorithms.
|
|
15:00-17:00, Paper WePMP.119 | |
Generative Band Feature Enhancement for Hyperspectral Image Classification |
Li, Jiming | Zhejiang Pol. Coll |
Chen, Fangjie | Zhejiang Univ. of Tech |
Yang, Dongyong | Zhejiang Univ. of Tech |
Keywords: Image classification, Neural networks, Deep learning
Abstract: In this paper, we propose a generative method for the feature enhancement of hyperspectral image bands. The method can significantly improve the discriminative information and visible quality of hyperspectral image. Based on the generative adversarial networks scheme, randomly sampled small band subset from original hyperspectral image cube can be used to disentangle spectral signals from noisy bands and generate new bands which achieves much better performance in land-cover classification. Experiments on real hyperspectral datasets demonstrate the effectiveness of the generative band feature enhancement method.
|
|
15:00-17:00, Paper WePMP.120 | |
A Selective Tracking and Detection Framework with Target Enhanced Feature |
Ding, Xinyao | South China Univ. of Tech |
Li, Lian | Tencent Company |
Zhang, Xin | South China Univ. of Tech |
Keywords: Motion and tracking, Applications of computer vision
Abstract: Abstract—In the long time tracking, object representation and occlusion handling are two important issues. We propose a novel selective tracking and detection framework and a new probabilistic object-enhanced feature are integrated. Firstly, besides precise object appearance feature, we believe the neighboring foreground-background contrast is another key factor in the tracking. Hence we propose a foreground probability map to enhance the target and weaken the surrounding background. It is computed based on the object color distribution and its comparison with the surrounding background. Secondly, we introduce the selective tracking and detection framework that has two sets of conditions to control the detector activation and final result selection. The detector will only be activated when the tracker is not trustable, which is determined by the tracking confidence and foreground parochiality value. Then, given the tracking and detection results, the final output is selected in terms of their individual correspondence values. We have evaluated our methods on two popular benchmark datasets. Extensive experiments demonstrate that our algorithm performs favorably comparing with state-of-the-art methods
|
|
15:00-17:00, Paper WePMP.121 | |
Voting-Based Incremental Structure-From-Motion |
Cui, Hainan | Inst. of Automation, Chinese Acad. of Sciences |
Shen, Shuhan | Inst. of Automation, Chinese Acad. of Sciences |
Gao, Wei | Inst. of Automation, Chinese Acad. of Sciences |
Keywords: 3D reconstruction, 3D vision
Abstract: Incremental Structure-from-Motion (SfM) technique is the most prevalent way for image-based reconstruction,but its robustness is highly relying on each camera registration, where a false calibration could make everything following fail.In this paper, we propose a voting-based incremental SfM approach to improve upon the camera registration process.First, the degree of closeness between cameras is used as the vote to determine which cameras are going to register. Then, for each camera, two methods are simultaneously used to estimate the camera pose, and the number of inliers is used as the vote to determine which pose is more accurate.Finally, by estimating the priori global camera rotations from the view-graph, the camera poses that are consistent with the priori camera rotations are considered as getting double votes and preferentially kept. After all these prioritized cameras are calibrated, the other cameras are then incrementally registered. Compared to the state-of-the-art incremental SfM approaches, extensive experiments demonstrate that our system performs similarly or better in terms of reconstruction efficiency, while achieves a better robustness and accuracy. Especially for the ambiguous datasets, our system has a better potential to reconstruct them.
|
|
15:00-17:00, Paper WePMP.122 | |
Object Classification of Remote Sensing Images Based on Partial Randomness Supervised Discrete Hashing |
Kang, Ting | Nanjing Univ. of Science and Tech |
Liu, Yazhou | Nanjing Univ. of Science and Tech |
Sun, Quansen | Nanjing Univ. of Science and Tech |
Keywords: Image classification, Multilabel learning, Applications of pattern recognition and machine learning
Abstract: Recently, object classification of remote sensing images has attracted more and more research interests due to the development of satellite and aerial vehicle technologies. Hashing learning is an efficient method to handle the huge amount of the remote sensing data. In this paper, we proposed a novel hashing learning method named partial randomness supervised discrete hashing (PRSDH), which combines data-dependent methods and data-independent methods. It jointly learns a discrete binary codes generation and partial random constraint optimization model. By random projection, the computation complexity is reduced effectively. With the weight matrix derived from the training data, the semantic similarity between the data can be well preserved while generating the hashing codes. For the discrete constraint problem, this paper adopts the discrete cyclic coordinate descent (DCC) algorithm to optimize the codes bit by bit. The experimental results show that PRSDH outperforms other comparative methods and demonstrated that PRSDH has good adaptability to the characteristic of remote sensing object.
|
|
15:00-17:00, Paper WePMP.123 | |
Context-Aware Trajectory Prediction |
Bartoli, Federico | Univ. of Florence |
Lisanti, Giuseppe | Univ. Degli Studi Di Pavia |
Ballan, Lamberto | Univ. of Padova |
Del Bimbo, Alberto | Univ. of Florence |
Keywords: Behavior recognition, Motion and tracking
Abstract: Human motion and behaviour in crowded spaces is influenced by several factors, such as the dynamics of other moving agents in the scene, as well as the static elements that might be perceived as points of attraction or obstacles. In this work, we present a new model for human trajectory prediction which is able to take advantage of both human-human and human-space interactions. The future trajectory of humans, are generated by observing their past positions and interactions with the surroundings. To this end, we propose a ''context-aware'' recurrent neural network LSTM model, which can learn and predict human motion in crowded spaces such as a sidewalk, a museum or a shopping mall. We evaluate our model on a public pedestrian datasets, and we contribute a new challenging dataset that collects videos of humans that navigate in a (real) crowded space such as a big museum. Results show that our approach can predict human trajectories better when compared to previous state-of-the-art forecasting models.
|
|
15:00-17:00, Paper WePMP.124 | |
A Multi-Modal Multi-View Dataset for Human Fall Analysis and Preliminary Investigation on Modality |
Tran, Thanh-Hai | Hanoi Univ. of Science and Tech |
Le, Thi-Lan | MICA Inst. Hanoi Univ. of Science and Tech |
Dinh-Tan, Pham | MICA Inst. Hanoi Univ. of Science and Tech |
Van-Nam, Hoang | MICA Inst. Hanoi Univ. of Science and Tech |
Van-Minh, Khong | MICA |
Quoc-Toan, Tran | MICA Inst. Hanoi Univ. of Science and Tech |
Thai-Son, Nguyen | PTIT |
Van-Cuong, Pham | PTIT |
Keywords: Video analysis, Behavior recognition, Deep learning for multimedia analysis
Abstract: Over the last decade, a large number of methods have been proposed for human fall detection. Most of methods were evaluated on trimmed datasets. More importantly, these datasets lack variety of falls, subjects, views and modalities. This paper makes two contributions in the topic of automatic human fall detection. Firstly, to address the above issues, we introduce a large continous multimodal multivew dataset of human fall, namely CMDFALL. Our proposed dataset is captured by 50 subjects, with seven overlapped Kinect sensors and two wearable accelerometers. Each subject performs 20 activities including 8 falls of different styles and 12 daily activities. All multi-modal multi-view data (RGB, depth, skeleton, acceleration) are time-synchronized and annotated for evaluating performance of recognition algorithms of human activities or human fall in particular in indoor environment. Secondly, based on the multimodal property of the dataset, we investigate the role of each modality and its combination to produce the best results in the context of human activity recognition. To this end, we adopt existing baseline techniques which have been shown to be very efficient for each data modality such as C3D convnet on RGB; DMM-KDES on depth; Res-TCN on skeleton and 2D convnet on acceleration data. We analyze to show which modality gives the best performance.
|
|
15:00-17:00, Paper WePMP.125 | |
A Novel Model for Multi-Label Image Annotation |
Wu, Xinjian | Soochow Univ |
Zhang, Li | Soochow Univ |
Li, Fan-Zhang | Soochow Univ |
Wang, Bangjun | Soochow Univ |
Keywords: Image captioning, Deep learning, Multilabel learning
Abstract: Multi-label image annotation is one of the most important open problems in the computer vision. In this paper, we propose a novel model for image annotation. Unlike existing works that usually use conventional visual features to annotate images, this paper adopts features based on convolutional neural network (CNN), which have shown potential to achieve outstanding performance. In particular, we use CNN to extract image features with higher semantic meaning and apply them to the image annotation method – Tag Propagation (TagProp). Experimental results on four challenging datasets indicate that our model makes a marked improvement as compared to the current state-of-the-art.
|
|
15:00-17:00, Paper WePMP.126 | |
Hallucinating Dense Optical Flow from Sparse Lidar for Autonomous Vehicles |
Vaquero, Victor | Iri, Upc-Csic |
Sanfeliu, Alberto | -Univ. Pol. De Catalunya |
Moreno-Noguer, Francesc | CSIC-UPC |
Keywords: Applications of computer vision, Deep learning, Vision for robotics
Abstract: In this paper we propose a novel approach to estimate dense optical flow from sparse lidar data acquired on an autonomous vehicle. This is intended to be used as a drop-in replacement of any image-based optical flow system when images are not reliable due to e.g. adverse weather conditions or at night. In order to infer high resolution 2D flows from discrete range data we devise a three-block architecture of multiscale filters that combines multiple intermediate objectives, both in the lidar and image domain. To train this network we introduce a dataset with approximately 20K lidar samples of the Kitti dataset which we have augmented with a pseudo ground-truth image-based optical flow computed using FlowNet2. We demonstrate the effectiveness of our approach on Kitti, and show that despite using the low-resolution and sparse measurements of the lidar, we can regress dense optical flow maps which are at par with those estimated with image-based methods.
|
|
15:00-17:00, Paper WePMP.127 | |
Gaze-Aided Eye Detection Via Appearance Learning |
Cao, Lin | Inst. of Automation, Chinese Acad. of Sciences |
Gou, Chao | Chinese Acad. of Sciences |
Wang, Kunfeng | Inst. of Automation, Chinese Acad. of Sciences |
Xiong, Gang | Inst. of Automation, Chinese Acad. of Sciences |
Wang, Fei-Yue | Chinese Acad. of Sciences |
Keywords: Learning-based vision, Object detection, Image processing and analysis
Abstract: Image based eye detection and gaze estimation have a wide range of potential applications, such as medical treatment, biometrics recognition, human-computer interaction. Though a large number of researchers have attempted to solve the two problems, they still exist some challenges due to the variation in appearance and lack of annotated images. In addition, most related work perform eye detection first, followed by gaze estimation via appearance learning. In this paper, we propose a unified framework to execute the gaze estimation and the eye detection simultaneously by learning the cascade regression models from appearance around the eye related key points. Intuitively, there is coupled relationship among location of eye center, shape of eye related key points, appearance representation and gaze information. To incorporate these information, at each cascade level, we first learn a model to map the shape and appearance around current eye related key points to the three dimension gaze. Then, with the help of estimated gaze, we further learn a regression model to map the gaze, shape and appearance information to eye location updates. By leveraging the power of cascade learning, the proposed method can alternatively optimize the two tasks of eye detection and gaze estimation. The experiments are conducted on benchmarks of GI4E and MPIIGaze. Experimental results show that our proposed method can achieve preferable results in gaze estimation and outperform the state-of-the-art methods in eye detection.
|
|
15:00-17:00, Paper WePMP.128 | |
Fish Detection from Low Visibility Underwater Videos |
Shevchenko, Violetta | Lappeenranta Univ. of Tech |
Eerola, Tuomas | Lappeenranta Univ. of Tech |
Kaarna, Arto | Lappeenranta Univ. of Tech |
Keywords: Applications of computer vision, Motion and tracking, Video analysis
Abstract: Counting and tracking fish populations is important for conservation purposes as well as for the fishing industry. Various non-invasive automatic fish counters exist based on such principles as resistivity, light beams and sonar. However, such methods typically cannot make distinction between fish and other passing objects, and moreover, cannot recognize different species. Computer vision techniques provide an attractive alternative for building a more robust and versatile fish counting systems. In this paper we present the fish detection framework for noisy videos captured in water with low visibility. For this purpose, we compare three background subtraction methods for the task. Moreover, we propose necessary post-processing steps and heuristics to detect the fish and separate them from other moving objects. The results showed that by choosing an appropriate background subtraction method, it is possible to achieve a satisfying detection accuracy of 80% and 60% for two challenging datasets. The proposed method will form a basis for the future development of fish species identification methods.
|
|
15:00-17:00, Paper WePMP.129 | |
Accurate 3-D Reconstruction with RGB-D Cameras Using Depth Map Fusion and Pose Refinement |
Ylimäki, Markus | Univ. of Oulu |
Kannala, Juho | Aalto Univ |
Heikkilä, Janne | Univ. of Oulu |
Keywords: 3D reconstruction, Image based modeling, Multiple view geometry
Abstract: Depth map fusion is an essential part in both stereo and RGB-D based 3-D reconstruction pipelines. Whether produced with a passive stereo reconstruction or using an active depth sensor, such as Microsoft Kinect, the depth maps have noise and may have poor initial registration. In this paper, we introduce a method which is capable of handling outliers, and especially, even significant registration errors. The proposed method first fuses a sequence of depth maps into a single non-redundant point cloud so that the redundant points are merged together by giving more weight to more certain measurements. Then, the original depth maps are re-registered to the fused point cloud to refine the original camera extrinsic parameters. The fusion is then performed again with the refined extrinsic parameters. This procedure is repeated until the result is satisfying or no significant changes happen between iterations. The method is robust to outliers and erroneous depth measurements as well as even significant depth map registration errors due to inaccurate initial camera poses.
|
|
15:00-17:00, Paper WePMP.130 | |
Gender Recognition from Face Images Using Trainable Shape and Color Features |
Azzopardi, George | Univ. of Malta |
Foggia, Pasquale | Univ. Di Salerno |
Greco, Antonio | Univ. of Salerno |
Saggese, Alessia | Univ. of Salerno |
Vento, Mario | Univ. Degli Studi Di Salerno |
Keywords: Image classification, Biologically inspired vision, Soft biometrics
Abstract: Gender recognition from face images is an important application and it is still an open computer vision problem, even though it is something trivial from the human visual system. Variations in pose, lighting, and expression are few of the problems that make such an application challenging for a computer system. Neurophysiological studies demonstrate that the human brain is able to distinguish men and women also in absence of external cues, by analyzing the shape of specific parts of the face. In this paper, we describe an automatic procedure that combines trainable shape and color features for gender classification. In particular the proposed method fuses edge-based and color-blob-based features by means of trainable COSFIRE filters. The former types of feature are able to extract information about the shape of a face whereas the latter extract information about shades of colors in different parts of the face. We use these two sets of features to create a stacked classification SVM model and demonstrate its effectiveness on the GENDER-COLOR-FERET dataset, where we achieve an accuracy of 96.4%.
|
|
15:00-17:00, Paper WePMP.131 | |
Semantic-Only Visual Odometry Based on Dense Class-Level Segmentation |
Mahé, Howard | Airbus Defence and Space/cnrs-I3s/uca |
Marraud, Denis | Airbus Defence and Space |
Comport, Andrew Ian | CNRS-I3S/UCA |
Keywords: Motion and tracking, Learning-based vision, Segmentation, features and descriptors
Abstract: This paper proposes a novel approach called Semantic Visual Odometry (SemVO) which incorporates class-level consistency priors into the problem of 6-DoF Visual Odometry. Dense class-level labels are learnt for each pixel of the image using a CNN trained for semantic segmentation. A semantic error is formulated penalising the sum of squared differences (SSD) on class-level feature maps extracted from the decoder of a RefineNet. It will be shown how the proposed approach allows dense RGB-D camera tracking using solely a semantic error term. SemVO is evaluated on the ScanNet dataset and the results demonstrate how the number of classes affects performance. Results are also provided showing how best to fuse the new error function with classic dense photometric and geometric methods. Finally, it is demonstrated that SemVO improves over standard approaches for large camera motion applications.
|
|
15:00-17:00, Paper WePMP.132 | |
A Multi-Scale Feature Extraction Method for Single Sample |
Xu, Xiaoxiang | Soochow Univ |
Zhang, Li | Soochow Univ |
Li, Fan-Zhang | Soochow Univ |
Keywords: Image classification
Abstract: Face recognition is a system that is identified by a computer for the detected face image, which is matched with the stored face database in the computer. The single face recognition means that only 1 face images are used as training samples. In many actual situations, such as public security, airport and customs, most face images have only 1 or a small number of face images. The research shows that the number of training samples has a great influence on the performance of face recognition. Many excellent face recognition algorithms give a sharp decline in performance or even failure, dealing with single sample face recognition problems. The study of single sample face recognition has always been a hot but difficult issue in the research of face recognition. The methods to solve this problem are generally divided into two categories. One is to find and select the feature of robust face recognition for face recognition from the point of feature selection. The other is to generate multiple virtual samples from the perspective of extended samples, so as to reduce the influence of the number of samples. But the existing algorithms are generally only considered from a single aspect, or the mechanical combination of two aspects. Based on this, considering two perspectives, combined with human learning cognition mechanism, we use support vector transformation to generate multi-scale virtual samples for single image, and extract the best support vector transformation feature to identify it
|
|
15:00-17:00, Paper WePMP.133 | |
Automatic Inspection of Aerospace Welds Using X-Ray Images |
Dong, Xinghui | The Univ. of Manchester |
Taylor, Chris | Univ. of Manchester |
Cootes, Tim | The Univ. of Manchester |
Keywords: Applications of computer vision, Applications of pattern recognition and machine learning
Abstract: The non-destructive testing (NDT) of components is very important to the aerospace industry. Welds in these components may contain porosities and other defects. These reduce the fatigue life of components and may result in catastrophic accidents if they end up in the aircraft. Currently such welds are inspected by humans studying radiographs of the welds. We describe an automatic system for detecting defects in welds, with the aim of creating a triage system to reduce the workload on human inspectors. Given an X-ray image of the aerospace weld, the system locates the weld line, then analyses the region around the line to identify abnormalities. Our results show that the weld can be precisely extracted from X-ray images and the defect detection operation can identify 83% of defects with fewer than 3 false positives per image, and thus may be useful for prompting human inspectors to reduce their workload.
|
|
15:00-17:00, Paper WePMP.134 | |
2D-To-3D Facial Expression Transfer |
Rotger, Gemma | Computer Vision Center and Dpt. Ciències De La Computació, Univ |
Felipe, Lumbreras | Computer Vision Center and Dpt. Ciències De La Computació, Univ |
Moreno-Noguer, Francesc | CSIC-UPC |
Agudo, Antonio | Iri, Csic-Upc |
Keywords: 3D vision, Applications of computer vision, 3D reconstruction
Abstract: Automatically changing the expression and physical features of a face from an input image is a topic that has been traditionally tackled in a 2D domain. In this paper, we bring this problem to 3D and propose a framework that given an input RGB video of a human face under a neutral expression, initially computes his/her 3D shape and then performs a transfer to a new and potentially non-observed expression. For this purpose, we parameterize the rest shape --obtained from standard factorization approaches over the input video-- using a triangular mesh which is further clustered into larger macro-segments. The expression transfer problem is then posed as a direct mapping between this shape and a source shape, such as the blend shapes of an off-the-shelf 3D dataset of human facial expressions. The mapping is resolved to be geometrically consistent between 3D models by requiring points in specific regions to map on semantic equivalent regions. We validate the approach on several synthetic and real examples of input faces that largely differ from the source shapes, yielding very realistic expression transfers even in cases with topology changes, such as a synthetic video sequence of a single-eyed cyclops.
|
|
15:00-17:00, Paper WePMP.135 | |
Segmentation-Guided Tracking with Prior Map Decision |
Ma, Ding | Harbin Inst. of Tech |
Bu, Wei | Harbin Inst. of Tech |
Wu, Xiangqian | Harbin Inst. of Tech |
Xie, Yuying | Michigan State Univ. Department of Computational Mathemati |
Cui, YueHua | Michigan State Univ. Department of Statistics and Probabil |
Keywords: Motion and tracking, Video analysis, Applications of computer vision
Abstract: For visual tracking, the target object is represented by an appearance model and the location of the target is estimated in each frame. Numerous tracking algorithms model the appearance of the target with a confidence score and rarely take into account the semantic information of the target. In this paper, we propose an efficient tracking algorithm that models the appearance of the target based on semantic segmentation. The overall architecture consists of two parts: the segmentation part and the tracking part. In the segmentation part, an attention model is employed, providing spatial highlights of the candidate region of the target. In the tracking part, the tracker is constructed by an online updated convolutional neural networks to identify the target in subsequent frames, taking advantage of the segmentation information of the target from the segmentation part. To enhance the performance of this architecture, we design an incremental updated prior map taking both the segmentation signal and the tracking signal into consideration. Extensive experiments on two benchmarks including OTB-50, OTB-100, and Temple-Color, show that the proposed method outperforms other trackers.
|
|
15:00-17:00, Paper WePMP.136 | |
Fast Single Image Dehazing Via Positive Correlation |
Li, Bingheng | Xi’an Univ. of Posts and Telecommunications |
Lai, Yi | Xi’an Univ. of Posts and Telecommunications |
Wu, Chaoyan | Xi’an Univ. of Posts and Telecommunications |
Liu, Ying | Xi’an Univ. of Posts and Telecommunications |
Keywords: Applications of computer vision, Enhancement, restoration and filtering, Image processing and analysis
Abstract: In this paper, we propose a fast single image dehazing method based on positive correlation. Firstly, a linear model is built to describe the positive correlation between the minimum channel of the hazy image and its corresponding depth map. Then, the transmission map and the atmospheric light are separately obtained using the created linear model. Finally, based on the traditional atmospheric scattering model, the hazefree image can be recovered with the transmission map and the atmospheric light. Experimental results on numerous hazy images demonstrate that proposed method has better performance and lower time complexity than the state-of-art methods.
|
|
15:00-17:00, Paper WePMP.137 | |
Temporal Action Detection by Joint Identifi Cation-Verifi Cation |
Wang, Wen | UESTC |
Yongjian, Wu | State Key Lab. of Synthetical Automation for Process Indus |
Liu, Haijun | UESTC |
Wang, Shiguang | Univ. of Electronic Science and Tech. of China |
Cheng, Jian | Univ. of Electronic Science and Tech. of China |
Keywords: Video analysis, Classification
Abstract: Temporal action detection aims at not only recognizing action category but also detecting start time and end time for each action instance in an untrimmed video. The key challenge of this task is to accurately classify the actions and determine the temporal boundaries of each action instance. In temporal action detection benchmark: THUMOS 2014, large variations exist in the same action category while many similarities exist in different action categories, which always limit the performance of temporal action detection. To address this problem, we propose to use joint Identification-Verification network to reduce the intra-action variations and enlarge inter-action differences. The joint Identification-Verification network is a siamese network based on 3D ConvNets, which can simultaneously predict the action categories and the similarity scores for the input pairs of video proposal segments. Extensive experimental results on the challenging THUMOS 2014 dataset demonstrate the effectiveness of our proposed method compared to the existing state-of-art methods for temporal action detection in untrimmed videos. We further demonstrate that our model is a general framework by evaluating our approach on Charades dataset.
|
|
15:00-17:00, Paper WePMP.138 | |
Vehicle Re-Identification by Deep Feature Fusion Based on Joint Bayesian Criterion |
Li, Siyu | Beijing Inst. of Tech |
Pei, Mingtao | Beijing Inst. of Tech |
Zhu, Leyi | Univ. of Science and Tech. of China |
Keywords: Object recognition, Deep learning, Neural networks
Abstract: Vehicle re-identification is a challenging task as the differences between vehicles of the same model are extremely small. In this paper, we propose to fuse deep features extracted by two different CNNs for vehicle re-identification. CNNs can extract discriminative features for classification tasks. Features extracted by different CNNs describe different aspects of the input image, and are complementary to each other. We propose a new loss function called the Joint Bayesian loss to fuse the different deep features. The proposed Joint Bayesian loss can minimize the intra-class variations and simultaneously maximize the inter-class variations of the fused features, and it is very fit for vehicle re-identification. Experiments on a large-scale vehicle dataset demonstrate the effectiveness of the proposed method.
|
|
15:00-17:00, Paper WePMP.139 | |
Robust Attentional Pooling Via Feature Selection |
Zheng, Jian | State Univ. of New York at Binghamton |
Lee, Teng-Yok | Mitsubishi Electric Res. Lab. (MERL) |
Feng, Chen | Mitsubishi Electric Res. Lab. (MERL) |
Li, Xiaohua | State Univ. of New York at Binghamton |
Zhang, Ziming | Mitsubishi Electric Res. Lab. (MERL) |
Keywords: Object recognition, Deep learning
Abstract: In this paper we propose a novel network module, namely Robust Attentional Pooling (RAP), that potentially can be applied in an arbitrary network to generate single vector representations for classification. By taking a feature matrix for each data sample as the input, our RAP learns data-dependent weights that are used to generate a vector through linear transformations of the matrix. We utilize feature selection to control the sparsity in weights for compressing the data matrices as well as enhancing the robustness of attentional pooling. As exemplary applications, we plug RAP into PointNet and ResNet for 3D point cloud and 2D image recognition, respectively. We demonstrate that our RAP significantly improves the recognition performance for both networks whenever sparsity is high. For instance, in extreme cases where only one feature per matrix is selected for recognition, RAP achieves more than 60% improvement in terms of accuracy on the ModelNet40 dataset.
|
|
15:00-17:00, Paper WePMP.140 | |
Unsupervised Multi-Domain Image Translation with Domain-Specific Encoders/Decoders |
Hui, Le | Nanjing Univ. of Science and Tech |
Li, Xiang | NJUST |
Chen, Jiaxin | Nanjing Univ. of Science and Tech |
He, Hongliang | Nanjing Univ. of Science and Tech |
Yang, Jian | Nanjing Univ. of Science and Tech |
Keywords: Mid-level vision, Deep learning, Domain adaptation
Abstract: Unsupervised Image-to-Image Translation achieves spectacularly advanced developments nowadays. However, recent approaches mainly focus on one model with two domains, which may face heavy burdens with the large cost of training time and the huge model parameters, under such a requirement that n (n > 2) domains are freely transferred to each other in a general setting. To address this problem, we propose a novel and unified framework named Domain-Bank, which consists of a globally shared auto-encoder and n domain-specific encoders/decoders, assuming that there is a universal shared-latent space can be projected. Thus, we not only reduce the parameters of the model but also have a huge reduction of the time budgets. Besides the high efficiency, we show the comparable (or even better) image translation results over state-of-the-arts on various challenging unsupervised image translation tasks, including face image translation and painting style translation. We also apply the proposed framework to the domain adaptation task and achieve state-of-the-art performance on digit benchmark datasets.
|
|
15:00-17:00, Paper WePMP.141 | |
A Light CNN Based Method for Hand Detection and Orientation Estimation |
Yang, Li | Southeast Univ |
Qi, Zhi | Southeast Univ |
Liu, Zeheng | Southeast Univ |
Zhou, Shanshan | Southeast Univ |
Zhang, Yang | Southeast Univ |
Liu, Hao | Southeast Univ |
Wu, Jianhui | Southeast Univ |
Shi, Longxing | Southeast Univ |
Keywords: Object detection, Pattern recognition for human computer interaction
Abstract: Hand detection is an essential step to support many tasks including HCI applications. However, detecting various hands robustly under conditions of cluttered backgrounds, motion blur or changing light is still a challenging problem. Recently, object detection methods using CNN models have significantly improved the accuracy of hand detection yet at a high computational expense. In this paper, we propose a light CNN network, which uses a modified MobileNet as the feature extractor in company with the SSD framework to achieve a robust and fast detection of hand location and orientation. The network generates a set of feature maps of various resolutions to detect hands of different sizes. In order to improve the robustness, we also employ a top-down feature fusion architecture that integrates context information across levels of features. For an accurate estimation of hand orientation by CNN, we manage to estimate two orthogonal vectors’ projections along the horizontal and vertical axes then recover the size and orientation of a bounding box exactly enclosing the hand. Evaluated on the challenging Oxford hand dataset, our method reaches 83.2% average precision (AP) at 139 FPS on a Nvidia Titan X, outperforming the previous methods both in accuracy and efficiency.
|
|
15:00-17:00, Paper WePMP.142 | |
Orientation-Guided Similarity Learning for Person Re-Identification |
Jiang, Na | Beihang Univ |
Liu, Junqi | Beihang Univ |
Sun, Chenxin | Beihang Univ |
Wang, Yuehua | Beihang Univ |
Zhou, Zhong | Beihang Univ |
Wu, Wei | Beihang Univ |
Keywords: Image classification, Deep learning, Visual surveillance
Abstract: Person re-identification (re-id) is a promising topic in computer vision, which concentrates on similarity learning of individuals across different camera views. It remains challenging due to the unpredictable orientation variations, the partial occlusions, and the inaccurate detections. To solve these problems, we present an orientation-guided similarity learning architecture to learn discriminative feature representations and define similarity metric for person re-id. Our proposed architecture explicitly leverages pedestrian orientation and body part cues to enhance the generalization ability. In the architecture, an orientation-guided loss function that pulls the positive samples with the same orientations closer is designed to alleviate the orientation variations. Meanwhile, an aligned dense network with pose estimation is presented to extract robust global-local fusion representations, which effectively exploits local features to overcome partial occlusions. In the end, we introduce a two-stage Top-k re-ranking strategy to optimize initial re-id results by min-hash and weighted distance. Extensive experimental results demonstrate that our proposed approach significantly outperforms state-of-the-art re-id methods on the popular CUHK03, Market1501, and DukeMTMC-reID datasets.
|
|
WePMOT1 |
Ballroom C, 1st Floor |
WePMOT1.A Multitask and Multilabel Learning (Ballroom C, 1st Floor) |
Oral Session |
|
17:00-17:20, Paper WePMOT1.1 | |
Learning Multi-View Generator Network for Shared Representation |
Han, Tian | Univ. of California, Los Angeles |
Xing, Xianglei | Harbin Engineering Univ |
Wu, Yingnian | Univ. of California, Los Angeles |
Keywords: Multiview learning, Gait recognition, Deep learning
Abstract: Multi-view representation learning is challenging because different views contain both the common structure and the complex view specific information. The traditional generative models may not be effective in such situation, since view-specific and common information cannot be well separated, which may cause problems for downstream vision tasks. In this paper, we introduce a multi-view generator model to solve the problem of multi-view generation and recognition in a unified framework. We propose a multi-view alternating back-propagation algorithm to learn multi-view generator networks by allowing them to share common latent factors. Our experiments show that the proposed method is effective for both image generation and recognition. Specifically, we first qualitatively demonstrate that our model can rotate and complete faces accurately. Then we show that our model can achieve state-of-art or competitive recognition performances through quantitative comparisons.
|
|
17:20-17:40, Paper WePMOT1.2 | |
Multi-Task Learning of Cascaded CNN for Facial Attribute Classification |
Zhuang, Ni | Xiamen Univ |
Yan, Yan | Xiamen Univ |
Chen, Si | Xiamen Univ. of Tech |
Wang, Hanzi | Xiamen Univ |
Keywords: Classification, Deep learning, Multitask learning
Abstract: Recently, facial attribute classification (FAC) has attracted significant attention in the computer vision community. Great progress has been made along with the availability of challenging FAC datasets. However, conventional FAC methods usually firstly pre-process the input images (i.e., perform face detection and alignment) and then predict facial attributes. These methods ignore the inherent dependencies among these tasks (i.e., face detection, facial landmark localization and FAC). Moreover, some methods using convolutional neural network are trained based on the fixed loss weights without considering the differences between facial attributes. In order to address the above problems, we propose a novel multi-task learning of cascaded convolutional neural network method, termed MCFA, for predicting multiple facial attributes simultaneously. Specifically, the proposed method takes advantage of three cascaded sub-networks (i.e., S_Net, M_Net and L_Net corresponding to the neural networks under different scales) to jointly train multiple tasks in a coarse-to-fine manner, which can achieve end-to-end optimization. Furthermore, the proposed method automatically assigns the loss weight to each facial attribute based on a novel dynamic weighting scheme, thus making the proposed method concentrate on predicting the more difficult facial attributes. Experimental results show that the proposed method outperforms several state-of-the-art FAC methods on the challenging CelebA and LFWA datasets
|
|
17:40-18:00, Paper WePMOT1.3 | |
Learning with Latent Label Hierarchy from Incomplete Multi-Label Data |
Pei, Yuanli | Oregon State Univ |
Fern, Xiaoli Z | Oregon State Univ |
Raich, Raviv | Oregon State Univ |
Keywords: Multilabel learning, Probabilistic graphical model
Abstract: Exploiting hierarchical label structure for multi-label classification can significantly improve classification performance and also benefit the labeling process. Existing work either cannot make use of such structure or assume the hierarchy is given as a prior. In practice, such hierarchy is not always available beforehand and it is desirable to learn it from data. Moreover, the labels in the training data may be incomplete due to inconsistent labeling process, which raises another learning challenge. This paper studies multi-label learning with a latent label hierarchy and incomplete label assignments. Our goal is to simultaneously learn the hierarchy as well as a multi-label classifier given the input features and incomplete label assignments. We propose a probabilistic model that captures the hierarchical structure and the incompleteness of the labels and introduce an Expectation-Maximization (EM) procedure for maximum likelihood estimation.
|
|
18:00-18:20, Paper WePMOT1.4 | |
Learning Fixation Point Strategy for Object Detection and Classification |
Lyu, Jie | Xi'an Jiaotong Univ |
Yuan, Zejian | Xi'an Jiaotong Univ |
Chen, Dapeng | Xi'an Jiaotong Univ |
Zhao, Yun | Xi'an Jiaotong Univ |
Zhang, Hui | Shenzhen Forward Innovation Digital Tech. Co. Ltd. China |
Keywords: Multitask learning, Object detection, Deep learning
Abstract: We propose a novel recurrent attentional structure to localize and recognize objects jointly. The network can learn to extract a sequence of local observations with detailed appearance and rough context, instead of sliding windows or convolutions on the entire image. Meanwhile, those observations are fused to complete detection and classification tasks. On training, we present a hybrid loss function to learn the parameters of the multi-task network end-to-end. Particularly, the combination of stochastic and object-awareness strategy, named SA, can select more abundant context and ensure the last fixation close to the object. In addition, we build a real-world dataset to verify the capacity of our method in detecting the object of interest including those small ones. Our method can predict a precise bounding box on an image, and achieve high speed on large images. Experimental results indicate that the proposed method can mine effective context by several local observations. Moreover, the precision and speed are easily improved by changing the number of recurrent steps. Finally, source code is available at url{https://github.com/jielyu/RADCN}.
|
|
WePMOT2 |
309A, 3rd Floor |
WePMOT1.B Clustering (309A, 3rd Floor) |
Oral Session |
|
17:00-17:20, Paper WePMOT2.1 | |
Probabilistic Sparse Subspace Clustering Using Delayed Association |
Jaberi, Maryam | Univ. of Central Florida |
Pensky, Marianna | Univ. of Central Florida |
Foroosh, Hassan | Univ. of Central Florida |
Keywords: Clustering, Dimensionality reduction, Sparse learning
Abstract: Discovering and clustering subspaces in high-dimensional data is a fundamental problem of machine learning with a wide range of applications in data mining, computer vision, and pattern recognition. Earlier methods divided the problem into two separate stages of finding the similarity matrix and finding clusters. Similar to some recent works, we integrate these two steps using a joint optimization approach. We make the following contributions: (i) we estimate the reliability of the cluster assignment for each point before assigning a point to a subspace. We group the data points into two groups of "certain" and "uncertain", with the assignment of latter group delayed until their subspace association certainty improves. (ii) We demonstrate that delayed association is better suited for clustering subspaces that have ambiguities, i.e. when subspaces intersect or data are contaminated with outliers/noise. (iii) We demonstrate experimentally that such delayed probabilistic association leads to a more accurate self-representation and final clusters. The proposed method has higher accuracy both for points that exclusively lie in one subspace, and those that are on the intersection of subspaces. (iv) We show that delayed association leads to huge reduction of computational cost, since it allows for incremental spectral clustering.
|
|
17:20-17:40, Paper WePMOT2.2 | |
Constrained Sparse Subspace Clustering with Side-Information |
Li, Chun-Guang | Beijing Univ. of Posts and Telecommunications |
Zhang, Junjian | Beijing Univ. of Posts and Telecommunications |
Guo, Jun | Beijing Univ. of Posts and Telecommunications |
Keywords: Clustering, Performance evaluation, Semi-supervised learning
Abstract: Subspace clustering refers to the problem of segmenting high dimensional data drawn from a union of subspaces into the respective subspaces. In some applications, partial side-information to indicate ``must-link'' or ``cannot-link'' in clustering is available. This leads to the task of subspace clustering with side-information. However, in prior work the supervision value of the side-information for subspace clustering has not been fully exploited. To this end, in this paper, we present an enhanced approach for constrained subspace clustering with side-information, termed Constrained Sparse Subspace Clustering plus (CSSC+), in which the side-information is used not only in the stage of learning an affinity matrix but also in the stage of spectral clustering. Moreover, we propose to estimate clustering accuracy based on the partial side-information and theoretically justify the connection to the ground-truth clustering accuracy in terms of the Rand index. We conduct experiments on three cancer gene expression datasets to validate the effectiveness of our proposals.
|
|
17:40-18:00, Paper WePMOT2.3 | |
Stream Clustering with Dynamic Estimation of Emerging Local Densities |
Wang, Ziyin | Indiana Univ. Univ. Indianapolis |
Tsechpenakis, Gavriil | Indiana Univ. Univ |
Keywords: Clustering, Image classification, Online learning
Abstract: We present a method for clustering data streams incrementally, designed to discover all valid density peaks in a single pass, in a non-parametric fashion. It detects emerging clusters along the stream by dynamically locating kernels in the most promising areas and performing a Stochastic Mean Shift procedure to find clustering centers. We present a density estimation approach for dynamic initialization, considering every sub-stream that follows `emerging data' as a sample set and applying Hypothesis Testing (p-value approach) to estimate its local density. The sub-stream size and the p-value are determined in a way that provides provable accuracy guarantee. We compare our method with the state-of-the-art, on realistic and complex datasets. We show that it outperforms not only stream algorithms but also their more complex, non-stream foundational paradigms.
|
|
WePMOT3 |
309B, 3rd Floor |
WePMOT2 Motion Analysis (309B, 3rd Floor) |
Oral Session |
|
17:00-17:20, Paper WePMOT3.1 | |
A Benchmark for Full Rotation Head Tracking |
Li, Yulin | Inst. of Computing Tech. Chinese Acad. of Sciences |
Ma, Bingpeng | Univ. of Chinese Acad. of Sciences |
Chang, Hong | Inst. of Computing Tech. CAS |
Chen, Xilin | Inst. of Computing Tech |
Keywords: Motion and tracking, Visual surveillance
Abstract: This paper introduces a new benchmark for 360-degree rotation head tracking, named Full Rotation Head Tracking (FRHT). The benchmark consists of 50 color sequences containing diverse human activities with complicated head motions. Specially, FRHT covers the most challenges of head tracking and focuses on the appearance variations of heads during the 360-degree rotation. It also pays attention to the clutters from the heads of nearby people. Further, we propose a baseline tracker. It guides a selective adaption updating by verifying strategies, thus alleviates error accumulation. Extensive experiments validate the advantages of FRHT in head rotation and similar object clutter.
|
|
17:20-17:40, Paper WePMOT3.2 | |
Depth Masked Discriminative Correlation Filter |
Kart, Ugur | Tampere Univ. of Tech |
Kamarainen, Joni-Kristian | Tampere Univ. of Tech |
Matas, Jiri | CTU Prague |
Fan, Lixin | Nokia Tech |
Cricri, Francesco | Nokia Tech |
Keywords: Motion and tracking, Video analysis, Vision for robotics
Abstract: Depth information provides a strong cue for occlusion detection and handling, but has been largely omitted in generic object tracking until recently due to lack of suitable benchmark datasets and applications. In this work, we propose a Depth Masked Discriminative Correlation Filter (DM-DCF) which adopts novel depth segmentation based occlusion detection that stops correlation filter updating and depth masking which adaptively adjusts the spatial support for correlation filter. In Princeton RGBD Tracking Benchmark, our DM-DCF is among the state-of-the-art in overall ranking and the winner on multiple categories. Moreover, since it is based on DCF, “DM-DCF” runs an order of magnitude faster than its competitors making it suitable for time constrained applications.
|
|
17:40-18:00, Paper WePMOT3.3 | |
OWP: Objectness Weighted Patch Descriptor for Visual Tracking |
Jiang, Bo | Anhui Univ |
Zhang, Yuan | Anhui Univ |
Tang, Jin | Anhui Univ |
Luo, Bin | Anhui Univ |
Keywords: Motion and tracking, Video analysis, Learning-based vision
Abstract: Visual object tracking is an active research problem and has been widely used in computer vision and pattern recognition area. Existing visual tracking methods usually localize the visual object with a bounding box which are often disturbed by the introduced background information and partial occlusion because of bounding box representation of visual object. To deal with this problem, in this paper, we propose a novel Objectness Weighted Patch (OWP) descriptor for object feature descriptor in visual tracking. The aim of OWP is to assign different objectness weights to the patches of bounding box to reduce the influences of background information and partial occlusion. We propose to compute the objectness weights of patches in OWP by integrating multiple cues (background, foreground and local spatial consistency) together in a general optimization model. Also, the proposed model has a simple closed-form solution and thus can be computed efficiently. We incorporate our OWP into structured SVM tracking framework and provide a new robust tracking method. Extensive experiments on two standard benchmark datasets OTB100 and Temple-Color demonstrate the effectiveness and benefits of the proposed tracking method.
|
|
18:00-18:20, Paper WePMOT3.4 | |
Dual-SVM Tracker Via Multiple Support Instance and LEVER Strategy |
Ma, Ding | Harbin Inst. of Tech |
Wu, Xiangqian | Harbin Inst. of Tech |
Bu, Wei | Harbin Inst. of Tech |
Keywords: Motion and tracking, Video analysis, Applications of computer vision
Abstract: Visual tracking can be modeled as a binary classification problem, and the classic classifiersupport vector machine (SVM) based methods have been demonstrated encouraging performance in recent object tracking benchmarks. However, the performance of SVM is too sensitive to noisy training data during online update. In this paper, we propose an efficient dual-SVM based tracker to improve classification performance for visual tracking. The tracker proposed consists of two models: the holistic model and the part model. To learn the holistic model,the support instances are derived from the RMI-SVM trained in a deep feature space. As for the part model to highlight local structure of the target, a linear SVM is learned to further encode local details of the target, selecting candidate instances from the support instances by the confidence as input. To fuse the holistic model and the part model, we design a simple but efficient decision strategy (LEVER) to enforce the dual-SVM to focus on the target. The proposed LEVER is updated incrementally to capture changes of the appearance of the target. Extensive experimental results show that the proposed tracker performs favorably against state-of-the-art methods.
|
|
WePMOT4 |
311B, 3rd Floor |
WePMOT4 Gait and Person Re-Identification (311B, 3rd Floor) |
Oral Session |
|
17:00-17:20, Paper WePMOT4.1 | |
3D Gait Recognition Based on Functional PCA on Kendall's Shape Space |
Hosni, Nadia | Ec. Nationale Des Sciences Informatiquesuniv. of Manouba |
Drira, Hassen | LIFL (UMR Lille1/CNRS8022), Univ. De Lille1 |
Chaieb, Faten | CRISTAL Lab. ENSI, Manouba Univ |
Ben Amor, Boulbaba | IMT Lille Douai/CRIStAL (UMR CNRS 9189) |
Keywords: Gait recognition, Shape modeling and encoding, Classification
Abstract: In this paper we propose a novel gait recognition approach from animated 3D skeletal data. Our approach is based on two disparate ideas from Shape Analysis and Functional Data Analysis (FDA) for a joint geometric-functional analysis. That is, skeletal sequences are viewed as time-parametrized trajectories on the Kendall's shape space when scaling, translation and rotation variations are filtered out from fixed-time 3D skeletons. A Riemannian Functional Principal Component Analysis (RFPCA) is carried out on our manifold-valued trajectories in order to build a new basis of principal functions, termed EigenTrajectories. Thus, each trajectory, could be projected into the eigenbasis which give rise to a compact signature, or EigenScores. The latter is fed to pre-trained 'One-vs-All' SVM classifiers for identity recognition and authentication. Based on the geometry of the underlying shape space, tools for re-sampling and synchronizing trajectories are naturally derived to apply the proposed variant of FPCA. We have conducted experiments on a subset of the CMU dataset. Our approach shows promising results compared to the state-of-the-art when a compact and robust signature is considered.
|
|
17:20-17:40, Paper WePMOT4.2 | |
Person Re-Identification with Vision and Language |
Yan, Fei | Univ. of Surrey |
Kittler, Josef | Univ. of Surrey |
Mikolajczyk, Krystian | Univ. of Surrey |
Keywords: Soft biometrics, Vision and language, Neural networks
Abstract: In this paper we propose a new approach to person re-identification using images and natural language descriptions. We propose a joint vision and language model based on CNN and LSTM architectures to match across the two modalities as well as to enrich visual examples for which there are no language descriptions. We also introduce new annotations in the form of natural language descriptions for two standard Re-ID benchmarks, namely CUHK03 and VIPeR. We perform experiments on these two datasets with techniques based on CNN, hand-crafted features as well as LSTM for analysing visual and natural description data. We investigate and demonstrate the advantages of using natural language descriptions compared to attributes as well as CNN compared to LSTM in the context of Re-ID. We show that the joint use of language and vision can significantly improve the state-of-the-art performance on standard Re-ID benchmarks.
|
|
17:40-18:00, Paper WePMOT4.3 | |
Does a Body Image Tell Age? |
Yuan, Baoyu | Sun Yat-Sen Univ |
Wu, Ancong | Sun Yat-Sen Univ |
Zheng, Wei-Shi | Sun Yat-Sen Univ |
Keywords: Soft biometrics, Visual surveillance
Abstract: Age estimation is an important task in computer vision and is widely used in applications. However, such a technology is largely affected by the resolution of face, and it would be a challenge if one has to estimate the age of a person at a distance. While body image of a person is often captured more clearly, when and how to use body-based visual cues for age estimation are largely under studied. In this work, we argue that body-based visual cues are better for estimating the age group and can assist the estimation of exact age value. For this purpose, we develop a Body-based Age Net (BAN) that unifies selective local convolution features and contextual convolution features. The network is designed based on two assumptions: 1) a person's wearing is closely related to his/her age group property; 2) some selective local parts of a body are more discriminative for age group estimation. We have contributed a large-scale and publicly available Body Age (BAG) dataset. We have quantitatively evaluated the proposed model on BAG.
|
|
18:00-18:20, Paper WePMOT4.4 | |
Attend and Align: Improving Deep Representations with Feature Alignment Layer for Person Retrieval |
Xu, Qin | Tsinghua Univ |
Sun, Yifan | Tsinghua Univ |
Li, Yali | Tsinghua Univ |
Wang, Shengjin | Tsinghua Univ |
Keywords: Other biometrics, Learning-based vision, Visual surveillance
Abstract: In fine-grained recognition, object misalignment and background noise are two long-standing factors that influence the robustness of deep learning models. This paper mainly focuses on person re-identification (re-ID) and introduces a feature alignment layer (FAL) which alleviates the target misalignment and the background noise simultane- ously. Through attention mechanism, FAL informs the underlying importance of each pixel on feature maps, i.e., whether the pixel is beneficial towards discriminating different persons. Then the discriminative regions relocate to the center and are stretched to fill the feature maps. Such an “attend and align” mechanism is specified into two steps: target position prediction and value assignment. In the first step, a pixel on feature maps learns to find a target position which is ID-discriminative. In the second step, the pixel is assigned with a new value using the context of the predicted position. Moreover, FAL can be easily plugged into a canonical Convolutional Neural Network (CNN) and learned in an end-to-end manner. In experiment, our method yields competitive results compared with the state-of-the-art approaches on three person re-ID datasets, Market-1501, DukeMTMC-reID and CUHK03. We also demonstrate that our method improves a competitive fine-grained recognition baseline on CUB-200-2011.
|
| |