| |
Last updated on August 12, 2018. This conference program is tentative and subject to change
Technical Program for Tuesday August 21, 2018
|
TuAMOT1 |
Ballroom C, 1st Floor |
TuAMOT1.A Machine Learning and Classification (Ballroom C, 1st Floor) |
Oral Session |
|
10:30-10:50, Paper TuAMOT1.1 | |
Boosting Black-Box Variational Inference by Incorporating the Natural Gradient |
Trusheim, Felix | Robert Bosch GmbH |
Keywords: Regression, Probabilistic graphical model, Scene understanding
Abstract: In this paper we present a modification of the popular Black-Box Variational Inference (BBVI) approach which significantly improves the computational efficiency of the inference. We achieve this performance boost by replacing the standard gradient in the stochastic gradient ascent framework of BBVI with the natural gradient. Our experimental results (e.g. training of neutral networks) show that the proposed method outperforms the original BBVI algorithm on both synthetic and real data.
|
|
10:50-11:10, Paper TuAMOT1.2 | |
Introduce More Characteristics of Samples into Cross-Domain Sentiment Classification |
Fu, Xianghua | Shenzhen Univ |
Liu, Wangwang | Shenzhen Univ |
Keywords: Domain adaptation, Document understanding, Neural networks
Abstract: Because of the discrepancy between different domains, the sentiment classifier trained in a source domain can't get a good performance in a target domain. Domain adaptation algorithms aim at solving such problems. One of the main algorithms aim at finding domain-invariable representations of inputs, which pays more attention to the common features of different domains and ignores the characteristics of samples themselves. In our paper, we propose a Fuzziness Based Domain-Adversarial Neural Network with Auto-Encoder (Fuzzy-DAAE). It not only uses a domain classifier to find domain-invariable features, but also uses an auto-encoder to reconstruct inputs to keep characteristics of samples. In order to introduce more supervised information of target samples, we also add unlabeled target samples and their predicted labels to the original training data according to their fuzziness and then retrain the whole model. Experiments on Amazon product reviews show that our proposed model has the best or comparative results compared with the existing models. It's worthwhile to notice that our model can be used in any other domain adaptation tasks, not limited to cross-domain sentiment classification.
|
|
11:10-11:30, Paper TuAMOT1.3 | |
Discriminative Collaborative Representation and Its Application to Audio Signal Classification |
Jiang, Yuechi | The Hong Kong Pol. Univ |
Leung, Frank Hung Fat | The Hong Kong Pol. Univ |
Keywords: Classification, Audio and acoustic processing and analysis
Abstract: In this paper, we propose Discriminative Collaborative Representation (DCR) as an extension to Collaborative Representation (CR), by adding an extra discriminative term to the original formulation of CR. In the literature, both CR and Sparse Representation (SR) have been shown to be good in signal classification. Compared to SR, CR is more computationally efficient, but does not give obvious performance improvement. Therefore, we propose DCR, which aims at improving the performance of CR in signal classification. Besides, we extend DCR to Kernel DCR (KDCR), which generalizes DCR by introducing kernel functions. Comparisons among SR, CR and DCR are made in doing two audio signal classification tasks. Experimental results show that DCR can outperform CR and SR in both classification tasks, which demonstrates the effectiveness of our proposed DCR and the usefulness of the extra discriminative term.
|
|
11:30-11:50, Paper TuAMOT1.4 | |
Cross-Domain Semantic Feature Learning Via Adversarial Adaptation Networks |
Li, Rui | City Univ. of Hong Kong |
Cao, Wenming | City Univ. of Hong Kong |
Qian, Sheng | City Univ. of Hong Kong |
Wong, Hau-San | City Univ. of Hong Kong |
Wu, Si | South China Univ. of Tech |
Keywords: Domain adaptation, Deep learning, Image classification
Abstract: Existing domain adaptation approaches generalize models trained on the labeled source domain data to the unlabeled target domain data by forcing feature distributions of two domains closer. However, these approaches are likely to ignore the semantic information during the feature alignment between source and target domain. In this paper, we propose a new unsupervised domain adaptation framework to learn the cross-domain features and disentangle the semantic information concurrently. Specifically, we firstly combine the task-specific classification and domain adversarial learning to obtain the cross-domain features by mapping the data of both domains with the shared feature extractor. Secondly, we integrate the domain adversarial learning and the within-domain reconstruction to disentangle the semantic information from the domain information. Thirdly, we include a cross-domain transformation to further refine the feature extractor, which in turn improves the performances of the task classifier. We compare our proposed model to previous state-of-the-art methods on domain adaptation digit classification tasks. Experimental results show that our model achieves better performances than the other counterparts, which demonstrates the superiority and effectiveness of our model.
|
|
11:50-12:10, Paper TuAMOT1.5 | |
Cayley-Klein Metric Learning with Shrinkage-Expansion Constraints |
Bi, Yanhong | Inst. of Automation, Chinese Acad. of Sciences (CASIA) |
Fan, Bin | Inst. of Automation, Chinese Acad. of Sciences |
Wu, Fuchao | Inst. of Automation, Chinese Acad. of Science |
Keywords: Manifold learning, Image classification
Abstract: Cayley-Klein metric is a specific kind of non-Euclidean metric in projective space. Recently, it has been introduced into metric learning with encouraging performance when dealing with computer vision tasks. However, the original Cayley-Klein metric learning methods with conventional pairwise and triplet-wise constraints, which are fixed bound based constraints, may not perform well when the intra- and inter-class variations of data distribution become complex. Pairwise constraints restrict the distance between samples of a similar pair to be lower than a fixed upper bound, and the distance between samples of a dissimilar pair higher than a fixed lower bound. Triplet-wise constraints restrict the distance between samples of a similar pair to be smaller than that between a pair of samples from different classes. In this paper, we propose a novel Cayley-Klein metric learning method (CKseML) with adaptive shrinkage-expansion pairwise constraints. CKseML is very effective in learning metric from data with complex distributions. Our experimental results demonstrate that CKseML achieves better performance than the original Cayley-Klein metric learning methods.
|
|
12:10-12:30, Paper TuAMOT1.6 | |
Estimating Prediction Qualities without Ground Truth: A Revisit of the Reverse Testing Framework |
Bhaskaruni, Venkata Sai Krishna Dheeraj | Univ. of Wyoming |
Moss, Fiona | Univ. of Wyoming |
Lan, Chao | Univ. of Wyoming |
Keywords: Performance evaluation, Model selection
Abstract: To evaluate prediction qualities of machine learning models, it is typically assumed testing samples are labeled. However, testing labels are not always available in practice. A traditional solution is to approximate prediction qualities on testing samples by the qualities on labeled training samples. But this may be limited in that it completely ignores testing samples. In this paper, we present a new approach to estimate prediction qualities on unlabeled testing sample, based on the reverse testing framework. We evaluate the approach with various quality metrics in classification and anomaly detection tasks, and over numerous real-world data sets. Experimental results show the proposed approach gives a more accurate estimate of prediction qualities on testing sample than those on training samples.
|
|
TuAMOT2 |
309A, 3rd Floor |
TuAMOT1.B Deep Learning 1 (309A, 3rd Floor) |
Oral Session |
|
10:30-10:50, Paper TuAMOT2.1 | |
RotateConv: Making Asymmetric Convolutional Kernels Rotatable |
Ma, Jiabin | Inst. of Automation, Chinese Acad. of Sciences |
Guo, Weiyu | Inst. of Automation, Chinese Acad. of Sciences |
Wang, Wei | National Lab. of Pattern Recognition |
Wang, Liang | Inst. of Automation, Chinese Acad. of Sciences |
Keywords: Neural networks, Image classification, Deep learning
Abstract: Abstract—In deep Convolutional Neural Networks(CNN), kernel shape designs influence a lot to model size and performance. In this work, our proposed method, RotateConv, applies a novel kernel shape to massively reduce the number of parameters while maintaining considerable performance. The new shape is extremely simple as a line segment one, and we equip it with the rotatable ability which aims to learn diverse features with respect to different angles. The kernel weights and angles are learned simultaneously during end-to-end training with the standard back-propagation algorithm. There are two variants of RotateConv that only have 2 and 4 parameters respectively depending on whether using weight sharing, which are much compressed than the normal 3 × 3 kernel with 9 parameters. In experiments, we validate our RotateConv with two classical models, ResNet and DenseNet, on four image classification benchmark databases, MNIST, CIFAR10, CIFAR100 and SVHN.
|
|
10:50-11:10, Paper TuAMOT2.2 | |
Learning Combinations of Activation Functions |
Manessi, Franco | Lastminute.com Group |
Rozza, Alessandro | Lastminute.com Group |
Keywords: Deep learning, Neural networks, Image classification
Abstract: In the last decade, an active area of research has been devoted to design novel activation functions that are able to help deep neural networks to converge, obtaining better performance. The training procedure of these architectures usually involves optimization of the weights of their layers only, while non-linearities are generally pre-specified and their (possible) parameters are usually considered as hyper-parameters to be tuned manually. In this paper, we introduce two approaches to automatically learn different combinations of base activation functions (such as the identity function, ReLU, and tanh) during the training phase. We present a thorough comparison of our novel approaches with well-known architectures (such as LeNet-5, AlexNet, and ResNet-56) on three standard datasets (Fashion-MNIST, CIFAR-10, and ILSVRC-2012), showing substantial improvements in the overall performance, such as an increase in the top-1 accuracy for AlexNet on ILSVRC-2012 of 3.01 percentage points.
|
|
11:10-11:30, Paper TuAMOT2.3 | |
Learning Evasion Strategy in Pursuit-Evasion by Deep Q-Network |
Zhu, Jiagang | Chinese Acad. of Sciences, Inst. of Automation |
Zou, Wei | CASIA |
Zhu, Zheng | CASIA |
Keywords: Deep learning, Reinforcement learning, Vision for robotics
Abstract: This paper presents an approach for learning the evasion strategy for the evader in pursuit-evasion against the pursuers with Deep Q-network (DQN). To give the immediate reward to the agent, a reward function considering both the evader escaping from being surrounded by the pursuers and keeping distance from the pursuers is handcrafted. This is a combination of the artificial potential field method with deep reinforcement learning. The evasion strategy is verified by a series of experiments in three different game scenarios. The training process and the stability, the value function are analyzed respectively. The three learned agents are compared with a random agent and a repulsive agent. We show the effectiveness of our method.
|
|
11:30-11:50, Paper TuAMOT2.4 | |
Data Augmentation with Improved Generative Adversarial Networks |
Shi, Hongjiang | Shanghai Univ |
Wang, Lu | Shanghai Univ |
Ding, Guangtai | Shanghai Univ |
Yang, Fenglei | Shanghai Univ |
Li, Xiaoqiang | Shanghai Univ |
Keywords: Deep learning, Classification
Abstract: Data augmentation is always a routinely trick in neural network training to improve generalization of a model. However, traditional transformation based methods are domain-specific, the transformations are required to be carefully designed. Recently, Generative Adversarial Networks (GAN) has been proposed to generate new samples which match the real data distribution. But directly using GAN generated samples in data augmentation faces the problems of label absence and uncertain data quality. In this paper, we propose an efficient and robust data augmentation method using GAN generated samples. This method proposes a modified GAN to generate more diverse samples and label them with a soft distribution labeling method. With an improved stochastic gradient descent, all the data are used to train the final classifier. The experiments are conducted on the widely used datasets: MNIST, SVHN and CIFAR-10. Our method empirically obtains promising results, even with few original data.
|
|
11:50-12:10, Paper TuAMOT2.5 | |
Artsy–GAN: A Style Transfer System with Improved Quality, Diversity and Performance |
Liu, Hanwen | BOE |
Navarrete Michelini, Pablo | BOE Tech. Group Co., Ltd |
Zhu, Dan | BOE |
Keywords: Deep learning, Image processing and analysis
Abstract: This paper proposes Artsy–GAN: a generative adversarial approach for style transfer. Style transfer has focused mostly on transferring the style of one image (e.g. painting) to another image (e.g. a photograph). Important progress has been done to process any image in real–time and, more recently, with arbitrary style images. A different approach has been proposed based on Generative Adversarial Networks (GAN), by translating an image from one context (e.g.photograph) to another (e.g. Van Gogh painting). To achieve this image–to–image translation, forexample, Cycle–GAN uses a cycle consistency requirement to be able to recover the original image after translation and thus keep the content from the input images. This is complex and slow to train. Another disadvantage of this systems is that they take the source of randomness only from the input image, limiting the diversity of the output. In this work, we improve the quality, efficiency and diversity in three ways. First, we use perceptual loss to replace the reconstructor with significant improvement in quality and speed of training. Second, we improve the speed for predicting by processing images with chroma sub–sampling. Third, we improve diversity by introducing noise in the input of the generator and a new loss function that encourages to generate different details for the same content image. Experiment results show that, compared to the state–of–art, Our method could improve the quality and diversity of the output, as well as the speed advantage.
|
|
TuAMOT3 |
309B, 3rd Floor |
TuAMOT2 Learning Based Vision (309B, 3rd Floor) |
Oral Session |
|
10:30-10:50, Paper TuAMOT3.1 | |
R^2-ResNeXt: A ResNeXt-Based Regression Model with Relative Ranking for Facial Beauty Prediction |
Lin, Luojun | South China Univ. of Tech |
Liang, Lingyu | South China Univ. of Tech |
Jin, Lianwen | South China Univ. of Tech |
Keywords: Image classification
Abstract: The purpose of facial beauty prediction (FBP) is to develop a machine that automatically evaluates facial attractiveness in a human perceptual manner. One of the essential problem of facial beauty prediction is the discriminative facial representation of the prediction model. Previous methods formulate FBP as a specific supervised learning of classification, regression, or ranking. We find that the relative ranking information is useful to improve the regression model of FBP. Based on this observation, this paper proposes a regression model guided by the relative ranking with the state-of-the-art ResNeXt structure to achieve FBP, and we call the model as R^2-ResNeXt. The R^2-ResNeXt facilitates to learn the representation and predictor guided by relative ranking for facial attractiveness assessment in an end-to-end manner. To train the R^2-ResNeXt, we develop an aggregated loss that combines regression loss and pairwise ranking loss linearly. We also design a method to construct a dataset containing relatively-labelled image pairs whose individual images are sampled from the SCUT-FBP benchmark database. The experimental results on the SCUT-FBP benchmark show that our R^2-ResNeXt achieves the state-of-the-art performance compared with related literatures, and further indicates the effectiveness of the deep residual architecture and relative beauty ranking into regression task for facial beauty prediction.
|
|
10:50-11:10, Paper TuAMOT3.2 | |
Learning Intrinsic Image Decomposition by Deep Neural Network with Perceptual Loss |
Han, Guangyun | Sun Yat-Sen Univ |
Xie, Xiaohua | Sun Yat-Sen Univ |
Zheng, Wei-Shi | Sun Yat-Sen Univ |
Lai, Jian-huang | Sun Yat-Sen Univ |
Keywords: Illumination and reflectance modeling, Learning-based vision, Low-level vision
Abstract: Intrinsic Image Decomposition (IID) refers to recovering the albedo and shading from images, and it plays important roles in addressing computer vision tasks such as illumination-invariant object recognition and image recoloring. IID is an ill-posed problem and lacks of actual labelled samples for learning. This paper presents a deep neural network (DNN) based method to address this problem. To facilitate the training of DNN, we synthesize an intrinsic image dataset through rendering 3D models. To make the learnt model well generalize to real-world images with better visual results, we employ the perceptual loss in model learning. The perceptual loss is constructed upon the activations of a neural network pre-trained on real images, such as the VGG network. Such a loss function implicitly introduces knowledge from real-world images and has the ability of multi-level semantic understanding on the decomposed results. Experimental results show that our model trained on synthetic single-object dataset can produce well decomposition results not only on synthetic images but also on real-world scene-level images (containing multiple objects).
|
|
11:10-11:30, Paper TuAMOT3.3 | |
Learning to Learn Second-Order Back-Propagation for CNNs Using LSTMs |
Roy, Anirban | SRI International |
Todorovic, Sinisa | Oregon State Univ |
Keywords: Learning-based vision, Deep learning, Transfer learning
Abstract: Convolutional neural networks (CNNs) typically suffer from slow convergence rates in training, which limits their wider application. This paper presents a new CNN learning approach, based on second-order methods, aimed at improving: a) Convergence rates of existing gradient-based methods, and b) Robustness to the choice of learning hyper-parameters (e.g., learning rate). We derive an efficient back-propagation algorithm for simultaneously computing both gradients and second derivatives of the CNN's learning objective. These are then input to a Long Short Term Memory (LSTM) to predict optimal updates of CNN parameters in each learning iteration. Both meta-learning of the LSTM and learning of the CNN are conducted jointly. Evaluation on image classification demonstrates that our second-order back-propagation has faster convergences rates than standard gradient-based learning for the same CNN, and that it converges to better optima leading to better performance under a budgeted time for learning. We also show that an LSTM learned to learn a small CNN network can be readily used for learning a larger network.
|
|
11:30-11:50, Paper TuAMOT3.4 | |
Multi-Scale Recurrent Encoder-Decoder Network for Dense Temporal Classification |
Choo, Sungkwon | Seoul National Univ |
Seo, Wonkyo | Seoul National Univ |
Jeong, Dong-ju | Seoul National Univ |
Cho, Nam Ik | Seoul National Univ |
Keywords: Video analysis, Learning-based vision, Object detection
Abstract: The temporal events in video sequences often have long-term dependencies which are difficult to be handled by a convolutional neural network (CNN). Especially, the dense pixel-wise prediction of video frames is a difficult problem for the CNN as huge memories and a large number of parameters are needed to learn the temporal correlation. To overcome these difficulties, we propose a recurrent encoder-decoder network which compresses the spatiotemporal features at the encoder and restores them to the original sized results at the decoder. We adopt a convolutional long short-term memory (LSTM) into the encoder-decoder architecture, which successfully learns the spatiotemporal relation with relatively a small number of parameters. The proposed network is applied to one of the dense pixel-prediction problems, specifically, the background subtraction in video sequences. The proposed network is trained with limited duration of video frames, and yet it shows good generalization performance for different videos and time duration. Also, by additional video specific learning, it shows the best performance on a benchmark dataset (CDnet 2014).
|
|
11:50-12:10, Paper TuAMOT3.5 | |
A Convolutional Neural Network for Pixelwise Illuminant Recovery in Colour and Spectral Images |
Robles-Kelly, Antonio | Deakin Univ |
Wei, Ran | NICTA |
Keywords: Illumination and reflectance modeling, Physics-based vision, Learning-based vision
Abstract: Here, we present a pixelwise illuminant recovery method for both, trichromatic and multi or hyperspectral images which employs a convolutional neural nettwork. The network used here is based upon the simple, yet effective architecture employed by the CIFAR10-quick netcite{snoek:2012}. The network is trained using a loss function which employs the angular difference between the target illuminant and the estimated one as the data term. The loss used here also includes a regularisation term which encourages smoothness in the spectral domain. Moreover, the network takes, at input, a tensor which is constructed making use of an image patch at different scales. This allows the network to predict the illuminant per-pixel using locally supported multiscale information. We illustrate the utility of our method for both, colour and hyperspectal illuminant recovery and compare our results against other techniques elsewhere in literature.
|
|
12:10-12:30, Paper TuAMOT3.6 | |
Bottom-Up Pose Estimation of Multiple Person with Bounding Box Constraint |
Li, Miaopeng | Zhejiang Univ |
Zhou, Zimeng | Zhejiang Univ |
Jie, Li | Zhejiang Univ |
Liu, Xinguo | Zhejiang Univ |
Keywords: Learning-based vision, Applications of pattern recognition and machine learning, Deep learning
Abstract: In this work, we propose a new method for multi-person pose estimation which combines the traditional bottom-up and the top-down methods. Specifically, we perform the network feed-forwarding in a bottom-up manner, and then parse the poses with bounding box constraints in a top-down manner. In contrast to the previous top-down methods, our method is robust to bounding box shift and tightness. We extract features from an original image by a residual network and train the network to learn both the confidence maps of joints and the connection relationships between joints. During testing, the predicted confidence maps, the connection relationships and the bounding boxes are used to parse the poses of all persons. The experimental results showed that our method learns more accurate human poses especially in challenging situations and gains better time performance, compared with the bottom-up and the top-down methods.
|
|
TuAMOT4 |
310, 3rd Floor |
TuAMOT3 Image Processing (310, 3rd Floor) |
Oral Session |
|
10:30-10:50, Paper TuAMOT4.1 | |
Connected Components Labeling on DRAGs |
Bolelli, Federico | Univ. Degli Studi Di Modena E Reggio Emilia |
Baraldi, Lorenzo | Univ. of Modena and Reggio Emilia |
Cancilla, Michele | Univ. Degli Studi Di Modena E Reggio Emilia |
Grana, Costantino | Univ. Degli Studi Di Modena E Reggio Emilia |
Keywords: Image processing and analysis, Applications of pattern recognition and machine learning, Performance evaluation
Abstract: In this paper we introduce a new Connected Components Labeling (CCL) algorithm which exploits a novel approach to model decision problems as Directed Acyclic Graphs with a root, which will be called Directed Rooted Acyclic Graphs (DRAGs). This structure supports the use of sets of equivalent actions, as required by CCL, and optimally leverages these equivalences to reduce the number of nodes (decision points). The advantage of this representation is that a DRAG, differently from decision trees usually exploited by the state-of-the-art algorithms, will contain only the minimum number of nodes required to reach the leaf corresponding to a set of condition values. This combines the benefits of using binary decision trees with a reduction of the machine code size. Experiments show a consistent improvement of the execution time when the model is applied to CCL.
|
|
10:50-11:10, Paper TuAMOT4.2 | |
Lightweight Deep Residue Learning for Joint Color Image Demosaicking and Denoising |
Huang, Tao | Xidian Univ |
Wu, FangFang | Xidian Univ |
Dong, Weisheng | Xidian Univ |
Shi, Guangming | Xidian Univ |
Li, Xin | West Virginia Univ |
Keywords: Enhancement, restoration and filtering, Image processing and analysis, Sparse learning
Abstract: Color demosaicking and image denoising each plays an important role in digital cameras. Conventional model-based methods often fail around the areas of strong textures and produce disturbing visual artifacts such as aliasing and zippering. Recently developed deep learning based methods were capable of obtaining images of better qualities though at the price of high computational cost, which make them not suitable for real-time applications. In this paper, we propose a lightweight convolutional neural network for joint demosaicking and denoising (JDD) problem with the following salient features. First, the densely connected network is trained in an end-to-end manner to learn the mapping from the noisy low-resolution space (CFA image) to the clean high-resolution space (color image). Second, the concept of deep residue learning and aggregated residual transformations are extended from image denoising and classification to JDD supporting more efficient training. Third, the design of our endto-end network architecture is inspired by a rigorous analysis of JDD using sparsity models. Experimental results conducted for both demosaicking-only and JDD tasks have shown that the proposed method performs much better than existing state-ofthe-art methods (i.e., higher visual quality, smaller training set and lower computational cost).
|
|
11:10-11:30, Paper TuAMOT4.3 | |
Joint Haze-Relevant Features Selection and Transmission Estimation Via Deep Belief Network for Efficient Single Image Dehazing |
Ling, Zhigang | Hunan Univ |
Li, Xiuxin | Hunan Univ |
Zou, Wen | Hunan Univ |
Liu, Min | Hunan Univ |
Keywords: Enhancement, restoration and filtering, Regression, Deep learning
Abstract: Haze-relevant image features are widely used in haze density perception and transmission estimation for single image dehazing, since hazy images usually have distinct features or characters from clean images. However, little attention has been paid to identify haze-relevant features, and further select some compact but informative image features for image dehazing. In this paper, we propose a novel joint feature selection and transmission estimation model named JFSTE via Deep Belief Network (DBN) for image dehazing. First, we develop a one-to-one linear feature selection layer in the DBN, in which each input feature only connects to one node in next layer with a binary weight. Meanwhile, a simple feature-selection strategy is proposed to determine these binary weights by considering the importance of features and the expected selected feature numbers. Second, in order to identify haze-relevant features of hazy image and narrow down their relevance, the minimum redundancy maximum relevancy is introduced into the joint optimization of our network. A comparative study with state-of-the-art approaches manifests a competitive performance of our proposed method.
|
|
11:30-11:50, Paper TuAMOT4.4 | |
In2I : Unsupervised Multi-Image-To-Image Translation Using Generative Adversarial Networks |
Perera, Pramuditha | Rutgers Univ |
Mahdi, Abavisani | Rutgers Univ |
Patel, Vishal | Rutgers, the State Univ. of New Jersey |
Keywords: Image processing and analysis, Low-level vision, Deep learning
Abstract: In unsupervised image-to-image translation, the goal is to learn the mapping between an input image and an output image using a set of unpaired training images. In this paper, we propose an extension of the unsupervised image-to-image translation problem to multiple input setting. Given a set of paired images from multiple modalities, a transformation is learned to translate the input into a specified domain. For this purpose, we introduce a Generative Adversarial Network (GAN) based framework along with a multi-modal generator structure and a new loss term, latent consistency loss. Through various experiments we show that leveraging multiple inputs generally improves the visual quality of the translated images. Moreover, we show that the proposed method outperforms current state-of-the-art unsupervised image-to-image translation methods.
|
|
11:50-12:10, Paper TuAMOT4.5 | |
SESR: Single Image Super Resolution with Recursive Squeeze and Excitation Networks |
Cheng, Xi | Nanjing Univ. of Science and Tech |
Li, Xiang | NJUST |
Yang, Jian | Nanjing Univ. of Science and Tech |
Tai, Ying | Youtu Lab, Tencent |
Keywords: Super-resolution
Abstract: Single image super resolution is a very important computer vision task, with a wide range of applications. In recent years, the depth of the super-resolution model has been constantly increasing, but with a small increase in performance, it has brought a huge amount of computation and memory consumption. In this work, in order to make the super resolution models more effective, we proposed a novel single image super resolution method via recursive squeeze and excitation networks (SESR). By introducing the squeeze and excitation module, our SESR can model the interdependencies and relationships between channels and that makes our model more efficiency. In addition, the recursive structure and progressive reconstruction method in our model minimized the layers and parameters and enabled SESR to simultaneously train multi-scale super resolution in a single model. After evaluating on four benchmark test sets, our model is proved to be above the state-of-the-art methods in terms of speed and accuracy.
|
|
12:10-12:30, Paper TuAMOT4.6 | |
Deep Joint Noise Estimation and Removal for High ISO JPEG Images |
Yue, Huanjing | Tianjin Univ |
Zhou, Shengdi | Tianjin Univ |
Yang, Jingyu | Tianjin Univ |
Sun, Xiaoyan | Microsoft Res |
Hou, Chunping | Tianjin Univ |
Keywords: Enhancement, restoration and filtering, Image processing and analysis, Low-level vision
Abstract: Capturing images under high ISO mode introduces much noise.The statistics of high ISO noise is quite different from that of Gaussian noise. Therefore, this kind of noise is difficult to be removed by traditional Gaussian noise removal methods. This paper proposes a convolutional neural network (CNN) based method to jointly estimate and remove high ISO noise. There are two contributions in this paper. First, we propose a CNN based noise estimation method to estimate the pixel-wise noise level. Due to the Bayer down-sampling process in imaging, the noise variance map is characterized by Bayer patterns. Therefore, we propose packing 2x2 blocks in a noisy image into 4D vectors, which makes the pixels with similar noise levels be neighbors. Second, the noise variance map is correlated with the image content. Thus, we propose concatenating the estimated noise variance map with the noisy image, and feed the fused data to the denoising network. The two networks are trained together in an end-to-end fashion. Experimental results demonstrate that the proposed method outperforms state-of-the-art noise estimation and removal methods.
|
|
TuPMP |
North Foyer & Park View Foyer, 3rd Floor |
Poster Session TuPMP, Coffee Break (North Foyer & Park View Foyer, 3rd
Floor) |
Poster Session |
|
15:00-17:00, Paper TuPMP.1 | |
Linear Discriminative Sparsity Preserving Projections for Dimensionality Reduction |
Zhang, Jianbo | Northeastern Univ. at Qinhuangdao |
Wang, Jin-Kuan | Northeastern Univ |
Keywords: Dimensionality reduction, Sparse learning, Face recognition
Abstract: Linear discriminant analysis (LDA) is a traditional method for dimensionality reduction. However, it is cannot be well applied for the data from high-dimensionality space. Recently, sparse subspace learning (SSL) has been proposed and proved to be more suitable for high-dimensional data, such as face dataset. To exploit the merits of SSL and the traditional methods, in this paper, we propose a new dimensionality reduction method called linear discriminative sparsity preserving projections (LDSPP). LDSPP establishes the objective function by adding sparsity reconstruction matrix to linear discriminant analysis (LDA) model for dimensionality reduction. Experiments on Yale and ORL face image datasets demonstrate the effectiveness of the proposed method.
|
|
15:00-17:00, Paper TuPMP.2 | |
Human Activity Recognition Based on Convolutional Neural Network |
Xu, Wenchao | Shanghai Key Lab. of Multidimensional Information Processi |
Pang, Yuxin | Shanghai Key Lab. of Multidimensional Information Processi |
Yang, Yanqin | Shanghai Key Lab. of Multidimensional Information Processi |
Liu, Yanbo | Shanghai Jiaotong Univ |
Keywords: Classification, Neural networks
Abstract: Smartphones are ubiquitous and becoming increasingly sophisticated, with ever-growing sensing powers. Recent years, more and more applications of activity recognition based on sensors are developed for routine behavior monitoring and helping the users form a healthy habit. In this field, the method of recognizing the physical activities (e.g., sitting, walking, jogging, etc) becomes the pivotal, core and urgent issue. In this study, we construct a Convolutional Neural Network (CNN) to identify human activities using the data collected from the three-axis accelerometer integrated in users’smartphones. The daily human activities that are chosen to be recognized include walking, jogging, sitting, standing, upstairs and downstairs. The three-dimensional (3D) raw accelerometer data is directly used as the input for training the CNN without any complex pretreatment. The performance of our CNN-based method for multi human activity recognition showed 91.97% accuracy, which outperformed the Support Vector Machine (SVM) approach of 82.27% trained and tested with six kinds of features extracted from the 3D raw accelerometer data. Therefore, our proposed approach achieved high recognition accuracy with low computational cost.
|
|
15:00-17:00, Paper TuPMP.3 | |
A Comprehensive Study on Upper-Body Detection with Deep Neural Networks |
Zhu, Yamei | Tongji Univ |
Zhang, Lin | Tongji Univ |
Keywords: Deep learning, Object detection
Abstract: The pedestrian detection task which aims to predict bounding-boxes of all the pedestrian instances in an image is of paramount importance for many real-world applications and has attracted much attention within the computer vision community. However, the researchers generally ignore the critical issue that due to the reasons of partial occlusion or being out of FOV, the definition for “pedestrian” is ill-posed in many cases and even humans will find it difficult to give accurate bounding-boxes. It is found that in many real applications, pedestrian detection can be substituted by upper-body detection, which is more robust and is much less affected by occlusion or being partially out of FOV. However, few studies have been conducted in this area. To fill this research gap to some extent, we make two contributions in this paper. Firstly, in order to facilitate the study of upper-body detection, a large-scale benchmark dataset is established. This dataset comprises 9585 images extracted from typical surveillance video clips and for each image, all the upper-body instances were carefully labeled. Secondly, the performances of four state-of-the-art object-detection frameworks were thoroughly evaluated in the context of upper-body detection, which can serve as a baseline for other researchers to develop even more sophisticated methods. To make the results fully reproducible, the collected dataset has been made publicly available at https://github.com/AmazingMei/upper-body-detection.
|
|
15:00-17:00, Paper TuPMP.4 | |
Structure Learning of Bayesian Networks by Finding the Optimal Ordering |
He, Chuchao | Northwestern Pol. Univ |
Gao, Xiao-guang | Northwestern Pol. Univ |
Guo, Zhigao | Northwestern Pol. Univ |
Keywords: Probabilistic graphical model, Data mining, Model selection
Abstract: Ordering-based search methods have advantages over graph-based search methods for structure learning of Bayesian networks in terms of both efficiency and accuracy. With the aim of further increasing the accuracy of ordering-based search methods, we propose to increase the search space, which can facilitate escaping from local optima. We present our search operators with majorizations, which are easy to implement. Experiments demonstrate that the proposed algorithm achieves significant accuracy improvement and exhibits high efficiency at the same time on both synthetic and real data sets. With regard to further improve the algorithm efficiency on learning large scale networks, we discuss a solution at the end of the paper.
|
|
15:00-17:00, Paper TuPMP.5 | |
Precision Learning: Towards Use of Known Operators in Neural Networks |
Maier, Andreas | Friedrich-Alexander-Univ. Erlangen-Nürnberg |
Schebesch, Frank | FAU Erlangen-Nürnberg |
Syben, Christopher | FAU Erlangen-Nuremberg |
Würfl, Tobias | FAU Erlangen-Nuremberg |
Steidl, Stefan | Friedrich-Alexander-Univ. Erlangen-Nürnberg |
Choi, Jang-Hwan | Ewha Womans Univ |
Fahrig, Rebecca | FAU Erlangen-Nuremberg |
Keywords: Neural networks, Regression, Classification
Abstract: In this paper, we consider the use of prior knowledge within neural networks. In particular, we investigate the effect of a known transform within the mapping from input data space to the output domain. We demonstrate that use of known transforms is able to change maximal error bounds and that these are additive for the entire sequence of transforms. In order to explore the effect further, we consider the problem of X-ray material decomposition as an example to incorporate additional prior knowledge. We demonstrate that inclusion of a non-linear function known from the physical properties of the system is able to reduce prediction errors therewith improving prediction quality from SSIM values of 0.54 to 0.88. This approach is applicable to a wide set of applications in physics and signal processing that provide prior knowledge on such transforms. Also maximal error estimation and network understanding could be facilitated using this new concept of precision learning.
|
|
15:00-17:00, Paper TuPMP.6 | |
Free Space, Visible and Missing Lane Marker Estimation Using the PsiNet and Extra Trees Regression |
John, Vijay | Toyota Tech. Inst |
Meenakshi Karunakaran, Nithilan | Toyota Tech. Inst |
Guo, Chunzhao | Toyota Central R&D Labs., Inc |
Kidono, Kiyosumi | TOYOTA Central R&D Labs., Inc |
Mita, Seiichi | Toyota Tech. Inst |
Keywords: Multilabel learning, Applications of pattern recognition and machine learning, Deep learning
Abstract: In this paper, a vision-based multilabel deep learning framework is combined with an extra trees regression framework to estimate the free space, visible ego-lane markers and missing ego-lane markers. The multilabel deep learning framework, the PsiNet, with two semantic segmentation layers and one multiclass classifer layer, estimates the free space and visible ego-lane markers, while the deep learning-based extra trees regression framework estimates the missing ego-lane markers. The missing ego-lane markers are predicted using image-based deep features extracted from the multilabel framework. To account for spatial variation in the missing ego lane markers, multiple extra trees regression models are trained. During testing, the multiclass label estimated by the multilabel framework is used to retrieve the corresponding extra trees regression model. The proposed framework combining the deep learning-based semantic segmentation and regression frameworks is termed the PsiNet-ET framework. We validate our proposed framework using the multiple acquired datasets. A comparative analysis with baseline algorithms, along with a parametric analysis is performed. The experimental results show that the proposed framework robustly estimates the free space, visible and missing lane markers even for challenging road scenes.
|
|
15:00-17:00, Paper TuPMP.7 | |
Approximate Cluster Heat Maps of Large High-Dimensional Data |
Rathore, Punit | The Univ. of Melbourne |
Bezdek, James C. | - |
Kumar, Dheeraj | Purdue Univ |
Rajasegarar, Sutharshan | Deakin Univ |
Palaniswami, Marimuthu | The Univ. of Melbourne |
Keywords: Clustering, Applications of pattern recognition and machine learning, Data mining
Abstract: The problem of determining whether clusters are present in numerical data (tendency assessment) is an important first step of cluster analysis. One tool for cluster tendency assessment is the visual assessment of tendency (VAT) algorithm. VAT and improved VAT (iVAT) produce an image that provides visual evidence about the number of clusters to seek in the original dataset. These methods have been successful in determining potential cluster structure in various datasets, but they can be computationally expensive for datasets with a very large number of samples. A scalable version of iVAT called siVAT approximates iVAT images, but siVAT can be computationally expensive for big datasets. In this article, we introduce a modification of siVAT called siVAT+ which approximates cluster heat maps for large volumes of high dimensional data much more rapidly than siVAT. We compare siVAT+ with siVAT on six large, high dimensional datasets. Experimental results confirm that siVAT+ obtains images similar to siVAT images in a few seconds, and is 8-55 times faster than siVAT.
|
|
15:00-17:00, Paper TuPMP.8 | |
An Approximate Bayesian Long Short-Term Memory Algorithm for Outlier Detection |
Chen, Chao | Univ. of South Carolina |
Lin, Xiao | Univ. of South Carolina |
Terejanu, Gabriel | Univ. of South Carolina |
Keywords: Neural networks, Regression, Computer-aided detection and diagnosis
Abstract: Long Short-Term Memory networks trained with gradient descent and back-propagation have received great success in various applications. However, point estimation of the weights of the networks is prone to over-fitting problems and lacks important uncertainty information associated with the estimation. However, exact Bayesian neural network methods are intractable and non-applicable for real-world applications. In this study, we propose an approximate estimation of the weights uncertainty using Ensemble Kalman Filter, which is easily scalable to a large number of weights. Furthermore, we optimize the covariance of the noise distribution in the ensemble update step using maximum likelihood estimation. To assess the proposed algorithm, we apply it to outlier detection in five real-world events retrieved from the Twitter platform.
|
|
15:00-17:00, Paper TuPMP.9 | |
Online Low-Rank Metric Learning Via Parallel Coordinate Descent Method |
Gan, Sun | Shenyang Inst. of Automation, Chinese Acad. of Sciences, U |
Yang, Cong | Shenyang Inst. of Automation, Chinese Acad. of Sciences |
Qiang, Wang | Shenyang Inst. of Automation, Chinese Acad. of Sciences, U |
Xiaowei, Xu | Department of Information Science, Univ. of Arkansas at Lit |
Keywords: Online learning, Classification, Dimensionality reduction
Abstract: Recently, many machine learning problems rely on a valuable tool: metric learning. However, in many applications, large-scale applications embedded in high-dimensional feature space may induce both computation and storage requirements to grow quadratically. In order to tackle these challenges, in this paper, we intend to establish a robust metric learning formulation with the expectation that online metric learning and parallel optimization can solve large-scale and high-dimensional data efficiently, respectively. Specifically, based on the matrix factorization strategy, the first step aims to learn a similarity function in the objective formulation for similarity measurement; in the second step, we derive a variational trace norm to promote low-rankness on the transformation matrix. After converting this variational regularization into its separable form, for the model optimization, we present an parallel block coordinate descent method to learn the optimal metric parameters, which can handle the high-dimensional data in an efficient way. Crucially, our method shares the efficiency and flexibility of block coordinate descent method, and it is also guaranteed to converge to the optimal solution. Finally, we evaluate our approach by analyzing scene categorization dataset with tens of thousands of dimensions, and the experimental results show the effectiveness of our proposed model.
|
|
15:00-17:00, Paper TuPMP.10 | |
Graph Embedding-Based Ensemble Learning for Image Clustering |
Luo, Xiaohui | Soochow Univ |
Zhang, Li | Soochow Univ |
Li, Fan-Zhang | Soochow Univ |
Wang, Bangjun | Soochow Univ |
Keywords: Ensemble learning, Clustering, Dimensionality reduction
Abstract: As a manifold learning algorithm, unsupervised large graph embedding (ULGE) has been proposed to deal with large-scale dataset for clustering. This paper improves ULGE and proposes a graph embedding-based ensemble learning (GEEL) algorithm. We take the dimensionality reduction algorithm in ULGE and the K-means clustering algorithm as an individual learner in our ensemble learning. For each individual learner, the K-means clustering method is first used to generate anchors. Then, the low-dimensional embedding of the sample data is obtained. Finally, the K-means clustering method is used again and performed on the low-dimensional data, which results in a clustering. The diversity of ensemble learning lies on the unstable of K-means. To combine multiple clusterings, we first match these clusterings with a reference clustering using the bestMap method, where the reference clustering is randomly chosen from multiple ones. A majority voting rule is adopted to these matched clusterings to generate the final clustering. A large number of experiments show the efficiency and effectiveness of the proposed method.
|
|
15:00-17:00, Paper TuPMP.11 | |
An Extensive Study of Cycle-Consistent Generative Networks for Image-To-Image Translation |
Liu, Yu | Leiden Univ |
Guo, Yanming | National Univ. of Defense Tech |
Chen, Wei | Leiden Univ |
Lew, Michael | Leiden Univ |
Keywords: Deep learning, Applications of computer vision, Learning-based vision
Abstract: Image-to-image translation between different domains has been an important research direction, with the aim of arbitrarily manipulating the source image content to become similar to a target image. Recently, cycle-consistent generative network (CycleGAN) has become a fundamental approach for general-purpose image-to-image translation, while almost no work has examined what factors may influence its performance. To provide more insights, we propose two new models roughly based on CycleGAN, namely Long CycleGAN and Nest CycleGAN. First, Long CycleGAN cascades several generators to perform the domain translation in a long cycle. It shows the benefit of stacking more generators on the generation quality. In addition to the long cycle, Nest CycleGAN develops new inner cycles to bridge intermediate generators directly, which can help constrain the unsupervised mappings. In the experiments, we conduct qualitative and quantitative comparisons for tasks including photo<->label, photo<->sketch, and photo colorization. The quantitative and qualitative results demonstrate the effectiveness of our two proposed models.
|
|
15:00-17:00, Paper TuPMP.12 | |
Local and Global Bayesian Network Based Model for Flood Prediction |
Wu, Yirui | Hohai Univ |
Xu, Weigang | Hohai Univ |
Feng, Jun | Hohai Univ |
Palaiahnakote, Shivakumara | National Univ. of Singapore |
Lu, Tong | State Key Lab. for Software Tech. Nanjing Univ |
Keywords: Applications of pattern recognition and machine learning, Probabilistic graphical model, Regression
Abstract: To minimize the negative impacts brought by floods, researchers from pattern recognition community pay special attention to the problem of flood prediction by involving technologies of machine learning. In this paper, we propose to construct hierarchical Bayesian network to predict floods for small rivers, which appropriately embed hydrology expert knowledge for high rationality and robustness. We present the construction of the hierarchical Bayesian network in two stages comprising local and global network construction. During the local network construction, we firstly divide the river watershed into small local regions. Following the idea of a famous hydrology model - the Xinanjiang model, we establish the entities and connections of the local Bayesian network to represent the variables and physical processes of the Xinanjiang model, respectively. During the global network construction, intermediate variables for local regions, computed by the local Bayesian network, are coupled to offer an estimation for time-varying values of flow rate by proper inferences of the global network. At last, we propose to improve the output of Bayesian network by utilizing former flow rate values. We demonstrate the accuracy and robustness of the proposed method by conducting experiments on a collected dataset with several comparative methods.
|
|
15:00-17:00, Paper TuPMP.13 | |
Multi-Source Clustering Based on Spectral Recovery |
Yin, Hongwei | Soochow Univ |
Li, Fan-Zhang | Soochow Univ |
Zhang, Li | Soochow Univ |
Keywords: Clustering, Multiview learning, Data mining
Abstract: The research and analysis on multi-source data is one of important tasks in information science. Compared with traditional single-source data learning algorithms, multi-source data learning ones can describe objects more real and complete. Meanwhile, the learning process of multi-source data is more in line with the cognitive mechanism of human brain. So far, the research on multi-source data learning algorithms includes three classes, multi-source data transfer learning, multi-source data collaborative learning and multi-source multi-view learning. The traditional multi-source multi-view learning algorithms lack the ability of handling with the data missing issue, which means that these algorithms require the multi-source data to be complete. This paper proposes a multi-source clustering algorithm. Based on the spectral properties of Laplace operator, we first obtain the complete representation of multi-source data. Then, we utilize the multi-view spectral embedding (MVSE) to construct the fusion model. Experimental results show that our proposed method can improve the ability of clustering efficiently in the case of data missing.
|
|
15:00-17:00, Paper TuPMP.14 | |
A Voting-Near-Extreme-Learning-Machine Classification Algorithm |
Hou, Hui-Rang | Tianjin Univ |
Meng, Qing-Hao | Tianjin Univ |
Zhang, Xiao-Nei | Tianjin Univ |
Keywords: Classification, Applications of pattern recognition and machine learning, Brain-computer interface
Abstract: For the classifiaction tasks within two classes, a feature extraction method combining the principal component analysis (PCA) and the linear discriminant analysis (LDA) is adopted, and an improved extreme learning machine (ELM), i.e., the near extreme learning machine (NELM classification algorithm), is presented. To further improve the classification performance, a voting-NELM (VNELM) is proposed. To examine the performance of our proposed classification algorithm, two different tests were carried out: slow cortical potential (SCP) signal classification and Chinese liquor (true or false) recognition. Experimental results reveal that for the SCP signal classification using the BCI competition II dataset Ia, an accuracy of 93.52% is obtained through the VNELM algorithm, better than that (92.30%) of the state-of-the-art method (i.e., the best improved ELM algorithm, V-ELM). When applied the VNELM algorithm to the Chinese liquor recognition, all the single-sensor-based classification results are better than that of the ELM and the V-ELM, and a better average accuracy of 99.25% is obtained based on the multi-sensor response signals, increasing the accuracy by 26% and 6.25% from that (73.25% and 93.00%) of the ELM and the V-ELM, respectively.
|
|
15:00-17:00, Paper TuPMP.15 | |
Wasserstein Generative Recurrent Adversarial Networks for Image Generating |
Zhang, Chunping | Chongqing Univ |
Feng, Yong | Chongqing Univ |
Shang, Jiaxing | Chongqing Univ |
Qiang, Baohua | Guilin Univ. of Electronic Tech |
Keywords: Neural networks, Deep learning, Scene understanding
Abstract: Most generative models are generating images at a time, but in fact, painting is usually done iteratively and repeatedly. Generative Adversarial Networks (GAN) are well known for generating images, however, it is hard to train stably. To tackle this problem, we propose a framework named the Wasserstein generative recurrent adversarial networks (WGRAN), which merges Wasserstein distance with recurrent neural networks to iteratively generate realistic looking images and trains our model in an adversarial way. Therefore, our generative model is gradually generates images using the feedback of discriminate model. And our approach allows us to control the number of iterations of generation. We train our model on various image datasets and compare our model with the recurrent generative adversarial networks (GRAN) and other state-of-the-art generative models using Generative Adversarial Metric. From these experiments, we show evidence that our model has the ability to generate high quantity images.
|
|
15:00-17:00, Paper TuPMP.16 | |
Image Captioning Using Adversarial Networks and Reinforcement Learning |
Yan, Shiyang | Xi'an Jiaotong-Liverpool Univ |
Wu, Fangyu | Xi’an Jiaotong-Liverpool Univ |
Smith, Jeremy Simon | Univ. of Liverpool |
Lu, Wenjin | Xi'an Jiaotong-Liverpool Univ |
Zhang, Bailing | XianJiaoTong-Liverpool Univ |
Keywords: Deep learning, Image processing and analysis, Reinforcement learning
Abstract: Image captioning is a significant task in artificial intelligence which connects computer vision and natural language processing. With the rapid development of deep learning, the sequence to sequence model with attention, has become one of the main approaches for the task of image captioning. Nevertheless, a significant issue exists in the current framework: the exposure bias problem of Maximum Likelihood Estimation (MLE) in the sequence model. To address this problem, we use generative adversarial networks (GANs) for image captioning, which compensates for the exposure bias problem of MLE and also can generate more realistic captions. GANs, however, cannot be directly applied to a discrete task, like language processing, due to the discontinuity of the data. Hence, we use a reinforcement learning (RL) technique to estimate the gradients for the network. Also, to obtain the intermediate rewards during the process of language generation, a Monte Carlo roll-out sampling method is utilized. Experimental results on the COCO dataset validate the improved effect from each ingredient of the proposed model. The overall effectiveness is also evaluated.
|
|
15:00-17:00, Paper TuPMP.17 | |
Transparent Random Dot Markers |
Uchiyama, Hideaki | Kyushu Univ |
Oyamada, Yuji | Tottori Univ |
Keywords: Applications of pattern recognition and machine learning, Image classification
Abstract: This paper presents random dot markers printed on transparent sheets as transparent fiducial markers. They are extremely unobstructive, and useful for novel user interfaces. However, the marker identification is required to be robust to observable backgrounds of the transparent sheets. To realize such markers, we propose a graph based framework for geometric feature based robust point matching between two sets of points. Instead of building one-to-one correspondences, we first build one-to-many correspondences using a 2D affinity matrix, and then globally optimize the matching assignment from the matrix. Especially, we incorporate pairwise relationship between neighboring points using local geometric descriptors into the matrix, and finally solve it with spectral matching. In the evaluation, we investigate the effectiveness of the global assignment from one-to-many correspondences, and finally show that our proposed method is enough robust to identifying overlapped markers.
|
|
15:00-17:00, Paper TuPMP.18 | |
A Deep Graphical Model for Layered Knowledge Transfer |
Lu, Wei | Univ. of Electronic Science and Tech. of China |
Chung, Fu-lai | Hong Kong Pol. Univ |
Keywords: Probabilistic graphical model, Domain adaptation, Deep learning
Abstract: Deep architectures can now be well trained on massive labeled data. However, there exist many application scenarios where labeled data are sparse or absent. Domain adaptation and multi-task transfer learning provide attractive options when related labeled data or tasks are abundant from different domains. In this paper, a new graphical modeling approach to multi-layer factorization based domain adaptation is explored to address the scenarios that insufficient labeled data are available for supervised learning. A deep convolutional factorization based transfer learning (DCFTL) algorithm is proposed to facilitate layer-wise transfer learning between domains. Completely based on graphical model representation, the proposed framework can seamlessly merge inference and learning, and has clear interpretability of conditional independence. The empirical performances on image classification tasks in both supervised and semi-supervised adaptation settings illustrate the effectiveness and generalization of the proposed deep layered knowledge transfer framework.
|
|
15:00-17:00, Paper TuPMP.19 | |
Target Group Distribution Pattern Discovery Via Convolutional Neural Network |
Xu, Xin | Nanjing Res. Inst. of Electronic Engineering (NRIEE) |
Wang, Wei | Nanjing Univ |
Keywords: Applications of pattern recognition and machine learning, Deep learning, Classification
Abstract: Target group distribution pattern analysis has a wide potential application in various domains, i.e., weather forecast based on cloud system distribution, target correlation and tracking based on distribution relationships, forest sustainable management based on tree distribution patterns and so on. However, existing work in target group distribution pattern analysis generally concentrates on either the distribution tendency or the distribution shape of the group while ignore the subtle difference in the density variation in the patterns. To address the above issue, we propose an effective target group distribution pattern discovery method via convolutional neural network (CNN) to discriminate such delicate target group distribution patterns. Firstly, we transform the spatial target group distribution samples into 2D images. Upon that, we design a bagged convolutional neural network (CNN) model. Finally, we apply the bagged CNN model for target group distribution pattern identification. Extensive experiments on synthetic data sets indicate that our method has outperformed the classical machine learning methods significantly.
|
|
15:00-17:00, Paper TuPMP.20 | |
Grouped Multi-Task CNN for Facial Attribute Recognition |
Yip, Chitung | Sun Yat-Sen Univ |
Hu, Haifeng | Sun Yat-Sen Univ |
Keywords: Multitask learning, Deep learning, Other biometrics
Abstract: The main goal of facial attribute recognition is to determine various attributes of human faces, e.g. facial expressions, shapes of mouth and nose, headwears, age and race, by extracting features from the images of human faces. Facial attribute recognition has a wide range of potential application, including security surveillance and social networking. The available approaches, however, fail to consider the correlations and heterogeneities between different attributes. This paper proposes that by utilizing these correlations properly, an improvement can be achieved on the recognition of different attributes. Therefore, we propose a facial attribute recognition approach based on the grouping of different facial attribute tasks and a multi-task CNN structure. Our approach can fully utilize the correlations between attributes, and achieve a satisfactory recognition result on a large number of attributes with limited amount of parameters. Several modifications to the traditional architecture have been tested in the paper, and experiments have been conducted to examine the effectiveness of our approach.
|
|
15:00-17:00, Paper TuPMP.21 | |
Joint Semi-Supervised Learning and Re-Ranking for Vehicle Re-Identification |
Wu, Fangyu | Xi’an Jiaotong-Liverpool Univ |
Yan, Shiyang | Xi'an Jiaotong-Liverpool Univ |
Smith, Jeremy Simon | Univ. of Liverpool |
Zhang, Bailing | XianJiaoTong-Liverpool Univ |
Keywords: Deep learning, Semi-supervised learning, Applications of computer vision
Abstract: Vehicle re-identification (re-ID) remains an unproblematic problem due to the complicated variations in vehicle appearances from multiple camera views. Most existing algorithms for solving this problem are developed in the fully-supervised setting, requiring access to a large number of labeled training data. However, it is impractical to expect large quantities of labeled data because the high cost of data annotation. Besides, re-ranking is a significant way to improve its performance when considering vehicle re-ID as a retrieval process. Yet limited effort has been devoted to the research of re-ranking in the vehicle re-ID. To address these problems, in this paper, we propose a semi-supervised learning system based on the Convolutional Neural Network (CNN) and re-ranking strategy for Vehicle re-ID. Specifically, we adopt the structure of Generative Adversarial Network (GAN) to obtain more vehicle images and enrich the training set, then a uniform label distribution will be assigned to the unlabeled samples according to the Label Smoothing Regularization for Outliers (LSRO), which regularizes the supervised learning model and improves the performance of re-ID. To optimize the re-ID results, an improved re-ranking method is exploited to optimize the initial rank list. Experimental results on publically available datasets, VeRi-776 and VehicleID, demonstrate that the method significantly outperforms the state-of-the-art.
|
|
15:00-17:00, Paper TuPMP.22 | |
Multi-Frequency Decomposition with Fully Convolutional Neural Network for Time Series Classification |
Han, Yongming | Beijing Univ. of Chemical Tech |
Zhang, Shuheng | Beijing Univ. of Chemical Tech |
Geng, Zhiqiang | Beijing Univ. of Chemical Tech |
Keywords: Classification, Deep learning
Abstract: Fully convolutional neural network (FCN) has achieved state-of-the-art performance in the task of time series classification without any heavy preprocessing. However, the FCN cannot effectively capture features of different frequencies. Therefore, this paper proposed a novel FCN structure based on the multi-frequency decomposition (MFD) method. In order to extract more features of different frequencies, the MFD based on real fast Fourier transform (RFFT) is set as a layer of the FCN to decompose the original signal into n sub-signals of different frequency bands. And then the improved FCN fuse those features of different frequencies together to obtain time series classification. Finally, compared with the existing state-of-the-art methods, the proposed method is effectively verified through some datasets in UCR Time Series Classification archive.
|
|
15:00-17:00, Paper TuPMP.23 | |
A Novel Asymmetric Embedding Model for Knowledge Graph Completion |
Geng, Zhiqiang | Beijing Univ. of Chemical Tech |
Li, Zhongkun | Coll. of Information Science and Tech. Beijing Univ |
Han, Yongming | Beijing Univ. of Chemical Tech |
Keywords: Structured prediction, Human document interaction, Data mining
Abstract: Modeling knowledge graph completion by encoding each entity and relation into a continuous tensor space becomes very hot. Meanwhile, many models including TransE, TransH, TransR, CTransR, TransD, TranSpare, TransDR, STransE, DT, FT and OrbitE are proposed for knowledge graph completion. However, all these previous works take less attention to the asymmetrical and the imbalance of many relations (some relations link a subject and many objects, and other relations link many subjects and many objects). Therefore, this paper proposes a novel asymmetrical embedding model(AEM) for knowledge graph completion. Because of the different properties of the head and tail entities in the triplets of the same relationship, every head entity vector and every tail entity vector are weighted by the corresponding head relation vector and the corresponding tail relation vector, respectively. And then new entity vector representations are obtained and the new entity vectors in the same triple are similar. Because the AEM weights each dimension of the entity vectors, it can accurately represent the latent attributes of entities and relationships. Moreover, the number of parameters of the AEM is so small that it is easier to train. Finally, compared with previous embedding models, the AEM obtains a better link prediction performance through two benchmark datasets FB15K and WN18.
|
|
15:00-17:00, Paper TuPMP.24 | |
Enhancing Knowledge Graph Completion with Positive Unlabeled Learning |
Jinghao, Niu | Inst. of Automation Chinese Acad. of Sciences |
Sun, Zhengya | Inst. of Automation, Chinese Acad. of Sciences |
Zhang, Wensheng | Inst. of Automation, Chinese Acad. of Sciences |
Keywords: Semi-supervised learning, Data mining, Applications of pattern recognition and machine learning
Abstract: Knowledge graphs have proven to be incredibly useful for many artificial intelligence applications. Although typical knowledge graphs may contain a huge amount of facts, they are far from being complete, which motivates an increasing research interest in learning statistical models for knowledge graph completion. Learning such models relies on sampling appropriate number of negative examples, as only the positive examples are contained in the data set. However, this would introduce errors or heuristic biases which restrict the sampler to visit other potentially reliable negative examples for better prediction models. In this paper, we present a novel perspective on skillfully selecting the negative examples for knowledge graph completion. We develop a two-stage logistic regression filter under the positive-unlabeled learning (PU learning) framework, which enables an automatic and iterative refinement of the negative candidate pools. We then contrast positive examples with the resulting negative ones based on the improved embedding-based models. In particular, we work with a cost-sensitive loss function by weighting the semantic differences between negative examples and particular positive ones. This weighting scheme reflects the importance of predicting the preferences between them correctly. In experiments, we validate the effectiveness of negative examples in refining and weighting schemes, respectively. Besides this, our proposed prediction model also outperforms the state-of-the-art methods on two public datasets.
|
|
15:00-17:00, Paper TuPMP.25 | |
Model-Free Knockoffs for SLOPE-Adaptive Variable Selection with Controlled False Discovery Rate |
Humayoo, Mahammad | Inst. of Computing Tech. UCAS |
Cheng, Xueqi | Inst. of Computing Tech. UCAS |
Keywords: Model selection, Regression, Dimensionality reduction
Abstract: Automatic selection of true explanatory variables and controlling fraction of false discovery rate (FDR) in the linear model has received considerable attention in machine learning. The ordered regularization is an important component of the linear model and plays a key role in solving such kind of problems. Although there have been some models proposed for determining relevant features in either low-dimensional or high-dimensional space. However, there exists no single sorted model that can work in both low-high dimensions (ngeq p quad or quad pgg n) cases. This paper introduces a model called mSLOPE (model-free SLOPE) which is a mixed methodology based on model-free (MF) knockoffs and sorted L-one penalized estimation (SLOPE). mSLOPE uses original design matrix augmented with MF knockoffs matrix. The original feature matrix and MF matrix have the same covariance structure. Advantages of mSLOPE include: (i) it identifies true regressors in any dimension, (ii) it is an adaptive and computationally tractable and (iii) it gains power relative to its competitors through an exact control of FDR in any dimensions. Experimental results on both synthetic and real data show that mSLOPE gains superior power and accurate FDR control than state-of-the-art baselines.
|
|
15:00-17:00, Paper TuPMP.26 | |
Generating Mesh-Based Shapes from Learned Latent Spaces of Point Clouds with VAE-GAN |
Kingkan, Cherdsak | Tohoku Univ |
Hashimoto, Koichi | Tohoku Univ |
Keywords: Deep learning, 3D reconstruction, Shape modeling and encoding
Abstract: We study the problem of mesh-based object generation. We propose a framework that generates mesh-based objects from point clouds in an end-to-end manner by using a combination of variational autoencoder and generative adversarial network. Instead of converting point cloud to other representations like voxels before input into the network, our network directly consumes the point cloud and generates the corresponding 3D object. Given point clouds of objects, our network encodes local and global geometry structures of point clouds into latent representations. These latent vectors are then leveraged to generate the implicit surface representations of objects corresponding to those point clouds. Here, the implicit surface representation is Signed Distance Function (SDF) which preserves the inside-outside information of objects. Then we can easily reconstruct polygon mesh surfaces of objects. This could be very helpful in a situation where there is a need of 3D shapes and only point clouds of objects are available. Experiments demonstrate that our network which makes use of both local and global geometry structure can generate high-quality mesh-based objects from corresponding point clouds. We also show that using PointNet-like structure as an encoder can help to achieve better results.
|
|
15:00-17:00, Paper TuPMP.27 | |
Scalable Spectral Clustering with Cosine Similarity |
Chen, Guangliang | San Jose State Univ |
Keywords: Clustering, Large scale document analysis
Abstract: We propose a unified scalable computing framework for three versions of spectral clustering - Normalized Cut (Shi and Malik, 2000), the Ng-Jordan-Weiss (NJW) algorithm (2001), and Diffusion Maps (Coifman and Lafon, 2006), in the setting of cosine similarity. We assume that the input data is either sparse (e.g., as a document-term frequency matrix) or of only a few hundred dimensions (e.g., for small images or data obtained through PCA). We show that in such cases, spectral clustering can be implemented solely based on efficient operations on the data matrix such as elementwise manipulation, matrix-vector multiplication and low-rank SVD, thus entirely avoiding the weight matrix. Our algorithm is simple to implement, fast to run, accurate and robust to outliers. We demonstrate its superior performance through extensive experiments which compare our scalable algorithm with the plain implementation on several benchmark data sets.
|
|
15:00-17:00, Paper TuPMP.28 | |
Selective Ensemble Network for Accurate Crowd Density Estimation |
Jeong, Jiyeoup | Seoul National Univ |
Jeong, Hawook | Samsung Electronics Co., Ltd |
Lim, Jongin | Seoul National Univ |
Choi, Jongwon | Seoul National Univ |
Yun, Sangdoo | Seoul National Univ |
Choi, Jin Young | Automation and System Res. Inst. Seoul National Univ |
Keywords: Deep learning, Scene understanding, Applications of computer vision
Abstract: This paper proposes a selective ensemble deep network architecture for crowd density estimation and people counting. In contrast to existing deep network-based methods, the proposed method incorporates two sub-networks for local density estimation: one to learn sparse density regions and one to learn dense density regions. Locally estimated density maps from the two sub-networks are selectively combined in ensemble fashion using a gating network to estimate an initial crowd density map. The initial density map is refined as a high resolution map, using another sub-network that draws on contextual information in the image. In training, a novel adaptive loss scheme is applied to resolve an ambiguity in the crowded region. The proposed scheme improves both density map accuracy and counting accuracy by adjusting the weighting value between density loss and counting loss according to the degree of crowdness and training epochs. Experiments using public datasets confirm that the proposed method outperforms state-of-the-art methods. Through self-evaluation, the effectiveness of each part in the network is also verified.
|
|
15:00-17:00, Paper TuPMP.29 | |
Enhanced Network Embedding with Text Information |
Yang, Shuang | Jilin Univ |
Yang, Bo | Jilin Univ |
Keywords: Data mining, Classification
Abstract: Network embedding aims at learning the low-dimensional and continuous vector representation for each node in networks, which is useful in many real applications. While most existing network embedding methods only focus on the network structure, the rich text information associated with nodes, which is often closely related to network structure, is widely neglected. Thus, how to effectively incorporate text information into network embedding is a problem worth studying. To solve the problem, we propose a Text Enhanced Network Embedding (TENE) method under the framework of non-negative matrix factorization to integrate network structure and text information together. We explore the consistent relationship between node representations and text cluster structure to make the network embedding more informative and discriminative. TENE learns the representations of nodes under the guidance of both proximity matrix which captures the network structure and text cluster membership matrix derived from clustering for text information. We evaluate the quality of network embedding on the task of multi-class classification of nodes. Experimental results on all three real-world datasets show the superior performance of TENE compared with baselines.
|
|
15:00-17:00, Paper TuPMP.30 | |
Knowledge Graph Embedding with Multiple Relation Projections |
Do, Kien | Deakin Univ |
Tran, Truyen | Deakin Univ |
Venkatesh, Svetha | Deakin Univ |
Keywords: Structured prediction, Deep learning
Abstract: Knowledge graphs contain rich relational structures of the world, and thus complement data–driven knowledge discovery from heterogeneous data. One of the most effective methods is to embed symbolic relations and entities into continuous spaces, where relations are approximately linear translation between projected images of entities in the relation space. However, state-of-art relation projection methods such as TransR, TransD or TransSparse do not model the correlation between relations, and thus are not scalable to complex knowledge graphs with thousands of relations, both in computational demand and in statistical robustness. To this end, we introduce TransF, a novel translation–based method which mitigates the burden of relation projection by explicitly modeling the basis subspaces of projection matrices. As a result, TransF is far more light weight than the existing projection methods, and is robust when facing a high number of relations. Experimental results on canonical link prediction task show that our proposed model outperforms competing rivals by a large margin and achieves state-of-the-art performance. Especially, TransF improves by 9%/5% on the head/tail entity prediction task with respect to N-to-1/1-to-N relations over the best performing translation-based method.
|
|
15:00-17:00, Paper TuPMP.31 | |
Unsupervised Domain Adaptation for Neural Machine Translation |
Yang, Zhen | Chinese Acad. of Science, Inst. of Automation |
Chen, Wei | Inst. of Automation, Chinese Acad. of Sciences |
Wang, Feng | Inst. of Automation, Chinese Acad. of Sciences |
Xu, Bo | Inst. of Automation, Chinese Acad. of Sciences |
Keywords: Domain adaptation, Deep learning, Sequence modeling
Abstract: Impressive neural machine translation (NMT) results are achieved in domains with large-scale, high quality bilingual training corpora. However, transferring to a target domain with significant domain shifts but no bilingual training corpora remains largely unexplored. To address the aforementioned setting of unsupervised domain adaptation, we propose a novel adversarial training procedure for NMT to leverage the widespread monolingual data in target domain. Two discriminative networks, namely the domain discriminator and pair discriminator, are introduced to guide the translation model. The domain discriminator evaluates whether the sentences generated by the translation model are indistinguishable from the ones in target domain. The pair discriminator assesses whether the generated sentences are paired with the source-side sentences. The translation model acts as an adversary to the two discriminators, which aims to generate sentences uneasily discriminated by the discriminators. We tested our approach on Chinese-English and English-German translation tasks. Experimental results show that our approaches achieve great success in unsupervised domain adaptation for NMT.
|
|
15:00-17:00, Paper TuPMP.32 | |
A Graph-Based Approach for Static Ensemble Selection in Remote Sensing Image Analysis |
Faria, Fabio Augusto | Federal Univ. of Săo Paulo |
Sarkar, Sudeep | Univ. of South Florida |
Keywords: Ensemble learning, Classification, Applications of pattern recognition and machine learning
Abstract: Many works in the literature have used machine learning techniques to solve their classification problems in different knowledge areas, e.g., medicine, agriculture, and remote sensing. Since there is no a single machine learning technique that achieves the best results for all kind of applications, a good alternative is the fusion of classification techniques, also known as multiple classifier systems (MCS). A common challenge in MCS is the selection of a few classifiers among many classifiers that are available in the literature; using all possible classifiers is not a feasible alternative. The choice of the classifiers becomes an essential factor, i.e., we need an ensemble selection approach. In this work, we propose a novel graph-based approach for static ensemble selection (GASES) to find or choose the best classifier set for remote sensing image classification. Experiments demonstrate that GASES improves performance by up to 70% over different baseline approaches when fusing classifiers. It decreases the number of classifiers used while retaining the effectiveness of using all of the classifiers. Furthermore, our proposed method is a more straightforward and intuitive technique for static ensemble selection scheme than other baseline approaches such as Consensus and Kendall.
|
|
15:00-17:00, Paper TuPMP.33 | |
Deep Spatiotemporal Representation of the Face for Automatic Pain Intensity Estimation |
Tavakolian, Mohammad | Univ. of Oulu |
Hadid, Abdenour | Univ. of OULU |
Keywords: Deep learning, Video analysis, Computer-aided detection and diagnosis
Abstract: Automatic pain intensity assessment has a high value in disease diagnosis applications. Inspired by the fact that many diseases and brain disorders can interrupt normal facial expression formation, we aim to develop a computational model for automatic pain intensity assessment from spontaneous and micro facial variations. For this purpose, we propose a 3D deep architecture for dynamic facial video representation. The proposed model is built by stacking several convolutional modules where each module encompasses a 3D convolution kernel with a fixed temporal depth, several parallel 3D convolutional kernels with different temporal depths, and an average pooling layer. Deploying variable temporal depths in the proposed architecture allows the model to effectively capture a wide range of spatiotemporal variations on the faces. Extensive experiments on the UNBC-McMaster Shoulder Pain Expression Archive database show that our proposed model yields in a promising performance compared to the state-of-the-art in automatic pain intensity estimation.
|
|
15:00-17:00, Paper TuPMP.34 | |
Accelerating the Classification of Very Deep Convolutional Network by a Cascading Approach |
Zheng, Wu | Inst. of Automation, Chinese Acad. of Science |
Zhang, Zhaoxiang | Inst. of Automation, Chinese Acad. of Sciences |
Keywords: Deep learning, Object recognition
Abstract: Large convolutional networks have achieved impressive classification performances recently. To achieve better performance, convolutional network tends to develop into deeper. However, the increase of network depth causes the linear growth of computational complexity, but cannot bring equivalent increase to the classification accuracy. To alleviate this inconsistence, we propose a cascading approach to accelerate the classification of very deep convolutional neural network. By exploiting the entropy metric to analyze the statistic differences of basic networks between the correctly and mistakenly classified images, we can assign the easily distinguished images to the shallow networks for reducing the computational complexity, and leave the difficultly classified images to the deep networks for maintaining the overall performance. Besides, the proposed cascaded networks can take advantage of the complementarity between different networks, which may boost the classification accuracy compared to the deepest network. We perform the experiments using residual networks of different depths on cifar100 dataset, on the condition of obtaining the similar accuracy to the deepest network, the results show that our cascaded ResNet32-ResNet110 and cascaded ResNet32-ResNet164 can reduce the computation time by 48.6% and 44.3% compared to ResNet110 and ResNet164, respectively. And the cascaded ResNet32-ResNet110-ResNet164 can reduce the computation time by 85.4% compared to the very deep Resnet1001.
|
|
15:00-17:00, Paper TuPMP.35 | |
Prediction Defaults for Networked-Guarantee Loans |
Cheng, Dawei | Shanghai Jiao Tong Univ |
Niu, Zhibin | Tianjin Univ |
Tu, Yi | Shanghai Jiao Tong Univ |
Zhang, Liqing | Shanghai Jiao Tong Univ |
Keywords: Applications of pattern recognition and machine learning, Data mining
Abstract: Networked-guarantee loans may cause the systemic risk related concern of the government and banks in China. The prediction of default of enterprise loans is a typical extremely imbalanced prediction problem, and the networked-guarantee make this problem more difficult to solve. Since the guaranteed loan is a debt obligation promise, if one enterprise in the guarantee network falls into a financial crisis, the debt risk may spread like a virus across the guarantee network, even lead to a systemic financial crisis. In this paper, we propose an imbalanced network risk diffusion model to forecast the enterprise default risk in a short future. Positive weighted k-nearest neighbors (p-wkNN) algorithm is developed for the stand-alone case -- when there is no default contagious; then a data-driven default diffusion model is integrated to further improve the prediction accuracy. We perform the empirical study on a real-world three-years loan record from a major commercial bank. The results show that our proposed method outperforms conventional credit risk methods in terms of AUC. In summary, our quantitative risk evaluation model shows promising prediction performance on real-world data, which could be useful to both regulators and stakeholders.
|
|
15:00-17:00, Paper TuPMP.36 | |
Identification of ASD Children Based on Video Data |
Li, Jing | Nanchang Univ |
Zhong, Yihao | Nanchang Univ |
Ouyang, Gaoxiang | Beijing Normal Univ |
Keywords: Classification, Applications of pattern recognition and machine learning, Video analysis
Abstract: Autism spectrum disorder (ASD) is a serious neurodevelopmental disorder that impairs a child's ability to communicate and interact with others. Usually, recognizing a child with ASD needs the diagnosis by pediatric psychiatrists. However, it is not only expensive and time-consuming, but also the results are influenced by subjective factors, such as the experience of a doctor. In this paper, we propose a novel method to automatically recognize ASD children in raw video data. Firstly, we use an eye tracking method to obtain the trajectory of eye movement. Then, accumulative histogram is introduced to analyze these trajectories. Afterwards, dimension reduction is applied to reducing the histogram dimension. Finally, support vector machine is used for classification. Since it is hardly to find public videos of autism, we collect a video dataset containing 189 videos captured from 53 ASD children and 136 typically developing children. Experimental results on our dataset show a high classification accuracy of 93.7%, which demonstrates our method can effectively help recognize ASD children in a more efficient way.
|
|
15:00-17:00, Paper TuPMP.37 | |
LD-CNN: A Lightweight Dilated Convolutional Neural Network for Environmental Sound Classification |
Zhang, Xiaohu | Peking Univ. Shenzhen Graduate School |
Zou, Yuexian | Peking Univ |
Wang, Wenwu | Univ. of Surrey |
Keywords: Neural networks, Classification, Audio and acoustic processing and analysis
Abstract: Environmental Sound Classification (ESC) plays a vital role in machine auditory scene perception. Deep learning based ESC methods, such as the Dilated Convolutional Neural Network (D-CNN), have achieved the state-of-art results on public datasets. However, the D-CNN ESC model size is often larger than 100MB and is only suitable for the systems with powerful GPUs, which prevent their applications in handheld devices. In this study, we take the D-CNN ESC framework and focus on reducing the model size while maintaining the ESC performance. As a result, a lightweight D-CNN (termed as LD-CNN) ESC system is developed. Our work lies on twofold. First, we propose to reduce the number of parameters in the convolution layers by factorizing a two-dimensional convolution filters (L×W) to two separable one-dimensional convolution filters (L×1 and 1×W). Second, we propose to replace the first fully connection layer (FCL) by a Feature Sum layer (FSL) to further reduce the number of parameters. This is motivated by our finding that the features of the environmental sounds have weak absolute locality property and a global sum operation can be applied to compress the feature map. Experiments on three public datasets (ESC50, UrbanSound8K, and CICESE) show that the proposed system offers comparable classification performance but with a much smaller model size. For example, the model size of our proposed system is about 2.05MB, which is 50 times smaller than the original D-CNN model, but at a loss of only 1%-2% classification accuracy.
|
|
15:00-17:00, Paper TuPMP.38 | |
Joint Knowledge Base Embedding with Neighborhood Context |
Nie, Binling | ZheJiang Univ |
Sun, ShouQian | ZheJiang Univ |
Keywords: Deep learning, Data mining
Abstract: Knowledge graph embedding significantly promotes the performance of link prediction and knowledge reasoning, which aims to encode both entities and relations into a low-dimensional semantic space. Existing translation-based methods have achieved state-of-art performances. However, the diversity of connectivity patterns observed in knowledge graph, i.e., structural equivalences, may not be effectively utilized to enhance knowledge graph embedding. To address this issue, we propose a concise but effective model, Context-enhanced Knowledge Graph Embedding (CKGE), for joint knowledge base embedding with neighborhood context. Neighborhood context obtained in our approach gain a deep insight into the diversity of connectivity patterns of knowledge graph. We incorporate the rich structural information contained in neighborhood context to expand the semantic structure of the knowledge graph, which is enable to model complex relations more precisely. And we conduct extensive experiments on link prediction, triplet classification on bench-mark datasets. The experiment results show CKGE achieve significant improvements against the baseline methods.
|
|
15:00-17:00, Paper TuPMP.39 | |
Privileged Multi-Target Support Vector Regression |
Wu, Guoqiang | Univ. of Chinese Acad. of Sciences |
Tian, Yingjie | Chinese Acad. of Sciences |
Dalian, Liu | Beijing Union Univ |
Keywords: Regression, Multitask learning, Support vector machine and kernel methods
Abstract: Multi-target regression is the problem where each instance is associated with multiple continuous target outputs simultaneously. Its major challenges arise with jointly exploring the complex input-output relationships and inter-target correlations. One representative approach is to build many independent single-target Support Vector Regression (SVR) models for each output target which can capture complex input-output relationships via the kernel trick. However, it does not involve inter-target correlations to improve the performance. Meanwhile, there are also many regularization-based methods which mainly explore the linear inter-target correlations, e.g., a low-rank constraint on the parameter matrix. However, in practice, it might be restrictive to assume the targets to be linearly related, and allowing for nonlinear relationships is a challenge. Motivated by Learning Using Privileged Information (LUPI), we propose a novel privileged multi-target support vector regression (MT-PSVR) model which can jointly explore the complex input-output relationships and nonlinear inter-target correlations. It explicitly explores inter-target correlations by viewing other targets as privileged information when training each target model. Besides, it can naturally use the kernel trick to explore both the complex input-output relationships and nonlinear inter-target correlations. Experimental results on many benchmark datasets validate the effectiveness of our approach.
|
|
15:00-17:00, Paper TuPMP.40 | |
Spectral Embedded Clustering on Multi-Manifold |
Huang, Shuning | Soochow Univ |
Zhang, Li | Soochow Univ |
Li, Fan-Zhang | Soochow Univ |
Keywords: Dimensionality reduction, Clustering, Manifold learning
Abstract: Due to the incorporation of dimensionality reduction, spectral clustering (SC) based methods have a unique advantage in dealing with high-dimensional data. However, data is mainly characterized by its distribution on multiple low-dimensional manifolds, which is ignored by some SC-based methods. In this paper, we proposed a new spectral multi-manifold embedded clustering (SMEC) method, which incorporates the local geometric information of data into the traditional SC. Thus, the designed affinity matrix in SMEC is able to capture both the local and global discriminative information, which results in improved clustering. Experimental results on seven benchmark datasets demonstrate the promising performance of SMEC.
|
|
15:00-17:00, Paper TuPMP.41 | |
Unsupervised Domain Adaptation by Regularizing Softmax Activation |
Gui, Cunbin | Beijing Univ. of Posts and Telecommunications |
Hu, Jiani | Beijing Univ. of Posts and Telecommunications |
Keywords: Domain adaptation, Deep learning, Image classification
Abstract: Abstract—In recent years, deep learning has achieved very good results with a large amount of labeled data but can’t generalize well when there is a shift between train data distribution(source domain) and test data distribution(target domain).Deep domain adaptation is an effective way to solve this problem.Many previous deep domain adaptation methods are based on Maximum Mean Discrepancy(MMD). These methods use MMD to regularize the feature-layers to learn transferable features directly. However, in this paper, we propose to use MMD to regularize the softmax predictions to learn more transferable features by backpropagating. At the same time, in order to get discriminative classifiers, we propose to depart but bridge the domain-invariant feature, which is learned by matching the feature distribution, and the classifying feature, which is before the final softmax, by a Residual-block. Our method can be implemented in almost all deep networks with softmax classifiers. In order to compare with the recent deep domain adaptation methods, we implement our method on Alexnet and outperforms almost all state-of-the-art methods on standard domain adaptation benchmarks.
|
|
15:00-17:00, Paper TuPMP.42 | |
Graph-Based Semi-Supervised Classification with CRF and RNN |
Ye, Zhili | Univ. of Chinese Acad. of Sciences |
Du, Yang | Inst. of Software Chinese Acad. of Sciences |
Wu, Fengge | Inst. of Software Chinese Acad. of Sciences |
Keywords: Semi-supervised learning, Classification, Neural networks
Abstract: Given a partially labeled graph, the semi-supervised problem of node classification is to infer the unknown labels of the unlabeled nodes. We intend to train graph-based classifiers end-to-end based on graph embedding. From the perspective of classification and feature embedding, we present two novel neural network architectures respectively for semi-supervised node classification. Motivated by pixel-level labeling tasks, we introduce Conditional Random Fields (CRFs) to smooth the classification results of Graph Convolutional Network (GCN). By formulating mean-field approximate inference for CRFs as Recurrent Neural Networks, we develop a deep end-to-end network called GCN-CRF, trained with the usual back-propagation algorithm. Moreover, in order to capture k-step relational information, we present Graph Gated Recurrent Units (Graph-GRU), implementing GRU to graph-structured data as a feed-forward process with k hidden layers. Experiments on three benchmark citation network datasets demonstrate that our two approaches outperform several recently proposed methods.
|
|
15:00-17:00, Paper TuPMP.43 | |
Deep Epitome for Unravelling Generalized Hamming Network |
Fan, Lixin | Nokia Tech |
Keywords: Deep learning, Neural networks, Image classification
Abstract: This paper gives a rigorous analysis of trained Generalized Hamming Networks (GHN) proposed by [9] and discloses an interesting finding about GHNs, i.e., stacked convolution layers in a GHN is equivalent to a single yet wide convolution layer. The revealed equivalence, on the theoretical side, can be regarded as a constructive manifestation of the universal approximation theorem [7], [16]. In practice, it has profound and multi-fold implications. For network visualization, the constructed deep epitomes at each layer provide a visualization of network internal representation that does not rely on the input data. Moreover, deep epitomes allows the direct extraction of features in just one step, without resorting to regularized optimizations used in existing visualization tools.
|
|
15:00-17:00, Paper TuPMP.44 | |
Low Rank Multi-Label Classification with Missing Labels |
Guo, Baolin | National Univ. of Defense Tech |
Hou, Chenping | National Univ. of Defense Tech |
Shan, Jincheng | National Univ. of Defense Tech |
Yi, Dongyun | National Univ. of Defense Tech |
Keywords: Classification, Multilabel learning, Data mining
Abstract: Multi-label classification has attracted significant interests in various domains. In many applications, only partial labels are available and the others are missing or not provided. How to design an accurate multi-label classifier with such partial labeled data is a challenging problem. In this paper, we propose a Low Rank multi-label classification with Missing Label method (LRML), which joints label matrix recovery and multi-label classifier learning to address the classification problem. The proposed algorithm recover the missing labels via laplacian manifold regularization derived from the feature space. By utilizing the low-rank mapping, the proposed algorithm can efficiently exploit the label correlations and analyze the high-dimensional data in the discriminant subspace simultaneously. Besides, the proposed algorithm is formulated as a convex but not smooth optimization problem. An effective algorithm which divides the problem into multiple convex and smooth sub-problems is developed, together with some theoretical analyses. Experimental results validate that our method leads to a significant improvement in performance and robustness to missing labels over other well-established algorithms.
|
|
15:00-17:00, Paper TuPMP.45 | |
Deep Structured Energy-Based Image Inpainting |
Altinel, Fazil | Tohoku Univ |
Ozay, Mete | Tohoku Univ |
Okatani, Takayuki | Tohoku Univ |
Keywords: Deep learning, Inpainting
Abstract: In this paper, we propose a structured image inpainting method employing an energy based model. In order to learn structural relationship between patterns observed in images and missing regions of the images, we employ an energy-based structured prediction method. The structural relationship is learned by minimizing an energy function which is defined by a simple convolutional neural network. The experimental results on various benchmark datasets show that our proposed method significantly outperforms the state-of-the-art methods which use Generative Adversarial Networks (GANs). We obtained 497.35 mean squared error (MSE) on the Olivetti face dataset compared to 833.0 MSE provided by the state-of-the-art method. Moreover, we obtained 28.4 dB peak signal to noise ratio (PSNR) on the SVHN dataset and 23.53 dB on the CelebA dataset, compared to 22.3 dB and 21.3 dB, provided by the state-of-the-art methods, respectively. The code is publicly available.
|
|
15:00-17:00, Paper TuPMP.46 | |
Piecewise Linear Units for Fast Self-Normalizing Neural Networks |
Chang, Yuanyuan | Nanjing Univ. of Posts and Telecommunications |
Wu, Xiaofu | Nanjing Univ. of Posts and Telecommunications |
Zhang, Suofei | Nanjing Univ. of Posts and Telecommunications |
Keywords: Classification, Neural networks, Deep learning
Abstract: Recently, self-normalizing neural networks have been proposed with a scaled version of exponential linear units (SELUs), which can force neuron activations automatically converge towards zero mean and unit variance without use of batch normalization. As the negative part of SELUs is an exponential function, it is computationally intensive. In this paper, we introduce self-normalizing piecewise linear units (SPeLUs) for fast approximation of SELUs, adopting piecewise linear functions instead of the exponential part. Various possible shapes are discussed for piecewise linear units with stable self-normalizing properties. Experiments show that SPeLUs can provide an efficient and fast alternative to SELUs, with almost similar classification performance over MNIST, CIFAR-10 and CIFAR-100 datasets. With SPeLUs, we also show that batch normalization can be simply neglected for constructing deep neural nets, which could be advantageous for fast implementation of deep neural networks.
|
|
15:00-17:00, Paper TuPMP.47 | |
Representing Relative Visual Attributes with a Reference-Point-Based Decision Model |
Law, Marc | Univ. of Toronto |
Weng, Paul | Shanghai Jiao Tong Univ |
Keywords: Support vector machine and kernel methods, Object recognition, Learning-based vision
Abstract: In many artificial intelligence, machine learning and computer vision tasks, the weighted sum model is used to value objects and define an order over them. In this paper, we consider two decision criteria defined as the (Euclidean and more generally Mahalanobis-like) distance to a reference point and investigate how they relate to the weighted sum model. In particular, we show that the distance-based representations can be seen as a relaxation of the representation induced by the weighted sum and we provide a characterization of the latter model with the former models in the case of strict orders. To illustrate our point, we consider the context of relative visual attributes. Nonetheless, our results also apply to other domains. More specifically, we present how these reference-point-based representations can be learned from pairwise comparisons and how they can be exploited for classification. Our experimental results show that those two criteria yield a more precise representation of the relative ordering for some attributes and that combining the best representations for each attribute improves recognition performance.
|
|
15:00-17:00, Paper TuPMP.48 | |
Curvature-Based Comparison of Two Neural Networks |
Yu, Tao | Shanghai Jiao Tong Univ |
Long, Huan | Shanghai Jiao Tong Univ |
Hopcroft, John | Cornell Univ |
Keywords: Neural networks, Manifold learning, Dimensionality reduction
Abstract: In this paper we focus on the comparison of two deep neural networks, AlexNets (with different initial parameters), especially the (dis)similarity. Main contribution in this paper includes 1) proposing a new close data generating algorithm which is crucial for determining the dimension of the manifold embedded in neural networks; 2) building a systematic strategy to compare curvatures of the manifolds from two given networks, which can reflect the intrinsic geometric properties of the manifolds. Based on the results and some interesting phenomenon we have disclosed during the research, we believe our work can contribute in demystifying the intrinsic mechanism of deep neural networks.
|
|
15:00-17:00, Paper TuPMP.49 | |
Directed Graph Evolution from Euler-Lagrange Dynamics |
Wang, Jianjia | Univ. of York |
Wilson, Richard | Univ. of York |
Hancock, Edwin | Univ. of York |
Keywords: Structured prediction, Probabilistic graphical model, Applications of pattern recognition and machine learning
Abstract: In this paper, we develop a variational principle from the von Neumann entropy for directed graph evolution. We minimise the change of entropy over time to investigate how directed networks evolve under the Euler-Lagrange equation. We commence from our recent work in which we show how to compute the approximate von Neumann entropy for a directed graph based on simple in and out degree statistics. To formulate our variational principle we commence by computing the directed graph entropy difference between different time epochs. This is controlled by the ratios of the in-degree and out-degrees at the two nodes forming a directed edge. It also reveals how the entropy change is related to correlations between the changes in-degree ratio and in-degree, and their initial values. We conduct synthetic experiments with three widely studied complex network models, namely ErdH{o}s-R'{e}nyi random graphs, Watts-Strogatz small-world networks, and Barab'{a}si-Albert scale-free networks, to simulate the in-degree and out-degree distribution. Our model effectively captures the directed structural transitions in the dynamic network models. We also apply the method to the real-world financial networks. These networks reflect stock price correlations on the New York Stock Exchange(NYSE) and can be used to characterise stable and unstable trading periods. Our model not only effectively captures how the directed network structure evolves with time, but also allows us to detect periods of anomalous network behaviour.
|
|
15:00-17:00, Paper TuPMP.50 | |
A New ECOC Algorithm for Multiclass Microarray Data Classification |
Sun, Mengxin | School of Software, Xiamen Univ |
Liu, Kunhong | School of Software in Xiamen Univ |
Hong, Qingqi | School of Software, Xiamen Univ |
Wang, Beizhan | School of Software in Xiamen Univ |
Keywords: Multilabel learning, Ensemble learning, Bioinformatics
Abstract: The classification of multi-class microarray datasets is a hard task because of the small samples size in each class and the heavy overlaps among classes. To effectively solve these problems, we propose a novel Error Correcting Output Code (ECOC) algorithm by Enhance Class Separability related Data Complexity measures during encoding process, named as ECOCECS. In this algorithm, two nearest neighbor related DC measures are deployed to extract the intrinsic overlapping information from microarray data. Our ECOC algorithm aims to search an optimal class split scheme by minimizing these measures. The class splitting process ends when each class is separated from others, and then the class assignment scheme is mapped as a coding matrix. Experiments are carried out on seven microarray datasets, and results demonstrate the effectiveness and robustness of our method in comparison with four state-of-art ECOC methods. In short, our work shows that it is promising to apply the DC theory to ECOC framework.
|
|
15:00-17:00, Paper TuPMP.51 | |
Structured Convex Optimization Method for Orthogonal Nonnegative Matrix Factorization |
Pan, JunJun | Hong Kong Baptist Univ |
Ng, Michael Kwok-po | Hong Kong Baptist Univ |
Zhang, Xiongjun | Central China Normal Univ |
Keywords: Clustering
Abstract: Orthogonal nonnegative matrix factorization plays an important role for data clustering and machine learning. In this paper, we propose a new optimization model for orthogonal nonnegative matrix factorization based on the structural properties of orthogonal nonnegative matrix. The new model can be solved by a novel convex relaxation technique which can be employed quite efficiently. Numerical examples in document clustering, image segmentation and hyperspectral unmixing are used to test the performance of the proposed model. The performance of our method is better than the other testing methods in terms of clustering accuracy.
|
|
15:00-17:00, Paper TuPMP.52 | |
Robust and Flexible Graph-Based Semi-Supervised Embedding |
Dornaika, Fadi | Univ. of the Basque Country |
El Traboulsi, Youssof | Doctoral School of Sciences and Tech. Lebanese Univ |
Zhu, Ruifeng | Univ. Bourgogne Franche-Comté |
Keywords: Classification, Semi-supervised learning, Manifold learning
Abstract: This paper introduces a robust and flexible graph-based semi-supervised embedding method for generic classification and recognition tasks. It combines the merits of sparsity preserving projections, margin maximization, and robust loss function. The latter reduces the effect of outliers on the regressor model needed for mapping unseen examples. Furthermore, unlike label propagation semi-supervised schemes, our proposed method is a data embedding into a space whose dimension is not limited to the number of classes. The used robust norm combines the merits of matrix L_{1,2} and L2 norms. It is suited for the Laplacian distribution of outliers and the Gaussian distribution of samples with small losses. We provide experiments on four benchmark image datasets in order to study the performance of the proposed method. These experiments show that the proposed methods can be more discriminative than other state-of-the-art methods.
|
|
15:00-17:00, Paper TuPMP.53 | |
Image-Based Air Pollution Estimation Using Hybrid Convolutional Neural Network |
Ma, Jian | Tianjin Univ |
Li, Kun | Tianjin Univ |
Han, Yahong | Tianjin Univ |
Yang, Jingyu | Tianjin Univ |
Keywords: Classification, Image classification
Abstract: Air pollution has a serious impact on our daily life, and how to quickly and easily measure the air pollution level without any expensive equipment is a quite challenging task. This paper proposes an air pollution estimation method using deep hybrid convolutional neural network from a single image, e.g., captured by a smartphone. The captured image is input to the main network, a very deep network, which solves the side effects of increased depth (degradation issues) by skip connection. This can improve network performance by simply increasing the depth of the network. Dark channel map is computed and fed into a secondary network to enrich the features with implicit representation. We have collected 1575 images of different scenes with different values of PM2.5 to train the network in the end-to-end fusion mode. Experimental results on synthetic dataset and real captured dataset demonstrate that our method achieves excellent performance on classification of air pollution levels from a single captured image.
|
|
15:00-17:00, Paper TuPMP.54 | |
Dynamic Projected Segmentation Networks for Hand Pose Estimation |
Che, Yunlong | Beihang |
Qi, Yue | Beihang Univ |
Keywords: Classification, Deep learning, Pattern recognition for human computer interaction
Abstract: Hand pose estimation in depth images is a challenging problem for human-computer interaction. In this paper,we propose a novel approach for hand pose estimation that shares the merits of both deep learning based hand segmentation and dynamics based pose optimization. For hand segmentation,we propose ’Dynamic Projected Segmentation Networks’applied at depth images, providing a pixel-wise classification result. To preserve the detailed hand-region topology structure, we design a dynamic projection based hand-region extraction method to crop the hand region from depth images. The projected hand-region is then fed into a light-weight ’Encoder-Decoder Networks’for segmentation. For pose optimization, we employ rigid body dynamics to estimate the final pose based on the segmentation results which are treated as hand geometry constraints. We verify the effectiveness of our approach by conducting experiments on two challenging datasets.
|
|
15:00-17:00, Paper TuPMP.55 | |
CascadeNet: Modified ResNet with Cascade Blocks |
Li, Xiang | Beijing Univ. of Chemical Tech |
Li, Wei | Beijing Univ. of Chemical Tech |
Xu, Xiaodong | Beijing Univ. of Chemical Tech |
Du, Qian | Mississippi State Univ |
Keywords: Neural networks, Image classification
Abstract: Different enhanced convolutional neural network (CNN) architectures have been proposed to surpass very deep layer bottleneck by using shortcut connections. In this paper,we present an effective deep CNN architecture modified on the typical Residual Network (ResNet), named as Cascade Network (CascadeNet), by repeating cascade building blocks. Each cascade block contains independent convolution paths to pass information in the previous layer and the middle one. This strategy exposes a concept of “cross-passing” which differs from the ResNet that stacks simple building blocks with residual connections. Traditional residual building block do not fully utilizes the middle layer information, but the designed cascade block catches cross-passing information for more complete features. There are several characteristics with CascadeNet: enhance feature propagation and reuse feature after each layer instead of each block. In order to verify the performance in CascadeNet, the proposed architecture is evaluated in different ways on two data sets (i.e., CIFAR-10 and HistoPhenotypes dataset), showing better results than its ResNet counterpart.
|
|
15:00-17:00, Paper TuPMP.56 | |
Dual-Resolution U-Net: Building Extraction from Aerial Images |
Lu, Kangkang | National Univ. of Singapore |
Sun, Ying | Inst. for Infocomm Res. Agency for Science, Tech |
Ong, Sim Heng | National Univ. of Singapore |
Keywords: Deep learning, Learning-based vision, Segmentation, features and descriptors
Abstract: Deep learning has been applied to segment buildings from high-resolution images with promising results. However, there still exist the problems stemming from training on split patches and class imbalances. To overcome these problems, we propose a dual-resolution U-Net that uses pairs of images as inputs to capture both high and low resolution features. We also employ a soft Jaccard loss to place more emphasis on the sparse and low accuracy samples. The images from different regions are further balanced according to their building densities. With our architecture, we achieved state-of-the-art results on the Inria aerial image labeling dataset without any post-processing.
|
|
15:00-17:00, Paper TuPMP.57 | |
Joint Head Pose Estimation with Multi-Task Cascaded Convolutional Networks for Face Alignment |
Cai, Zhenni | Nanjing Univ. of Information Science & Tech |
Liu, Qingshan | Jiangsu Key Lab. of Big Data Analysis Tech |
Wang, Shanmin | Nanjing Univ. of Information Science & Tech |
Yang, Bruce | Kiwi Tech. Inc |
Keywords: Neural networks, Deep learning, Multitask learning
Abstract: In the past decades, face alignment has been studied widely, but it has long been impeded by the problem of pose variation. Recent studies show that pose information used as additional source of information can help address the above problem. In this paper, we adopt a multi-task cascaded CNNs based framework for simultaneous face detection, dense face alignment and fine head pose estimation. Especially, our framework exploits the inherent correlation between face alignment and fine head pose estimation to boost up landmark detection robustness in the case of various poses. Experiments show that our method not only demonstrates real-time performance for face detection, dense face alignment and fine head pose estimation, but also outperforms most state-of-the-art methods for face alignment on the challenging 300-W benchmark. Especially in the case of large pose variations, it achieves outstanding results.
|
|
15:00-17:00, Paper TuPMP.58 | |
Maximum Gradient Dimensionality Reduction |
Luo, Xianghui | Univ. of Waikato |
Durrant, Robert John | Univ. of Waikato |
Keywords: Dimensionality reduction, Regression, Model selection
Abstract: We propose a novel dimensionality reduction approach based on the gradient of the regression function. Our approach is conceptually similar to Principal Component Analysis, however instead of seeking a low dimensional representation of the predictors that preserve the sample variance, we project onto a basis that preserves those predictors which induce the greatest change in the response. Our approach has the benefits of being simple and easy to implement and interpret, while still remaining very competitive with sophisticated state-of-the-art approaches.
|
|
15:00-17:00, Paper TuPMP.59 | |
Single Image Super-Resolution with Learning Iteratively Non-Linear Mapping between Low and High-Resolution Sparse Representations |
Zeng, Kun | Xiamen Univ |
Zheng, Hong | Xiamen Univ |
Qu, Yanyun | Xiamen Univ |
Qu, Xiaobo | Xiamen Univ |
Bao, Lijun | Xiamen Univ |
Chen, Zhong | Xiamen Univ |
Keywords: Sparse learning, Super-resolution, Low-level vision
Abstract: Conventional sparse coding based super-resolution (SR) methods obtained promising performance by learning over-complete dictionaries for low-resolution (LR) and high-resolution (HR) feature spaces, and assuming that the sparse representation of a HR feature vector was identical or linear to the sparse representation of the corresponding LR one. However, in fact, the relationship between LR and HR sparse domains is non-linear due to the complicated degradation of the observed image. To learn the relation more precisely, an assumption called ``the same-support constraint" is adopted in our proposed method, which forces LR/HR image patches to activate the atoms lying in the same locations of the LR/HR dictionaries. Under the same-support constraint, our approach first learns LR dictionary, and then obtains HR dictionary and a non-linear mapping between LR/HR sparse domains by training them iteratively. LR/HR dictionaries learned individually can explore structural characteristics of their corresponding feature spaces well, while the mapping learned iteratively can reveals accurately the intrinsic non-linear relationship between LR and HR sparse domains. Experimental results show that the proposed method outperforms the compared sparse learning based single image super-resolution methods.
|
|
15:00-17:00, Paper TuPMP.60 | |
Generating Facial Line-Drawing with Convolutional Neural Networks |
Wang, Yixue | Harbin Engineering Univ |
Bing, Xinyang | Harbin Engineering Univ |
Zheng, Liying | Harbin Engineering Univ |
Zhao, Shuo | Harbin Engineering Univ |
Keywords: Deep learning, Neural networks, Image processing and analysis
Abstract: Due to the vigorous development of artificial intelligence, robots has moved into our lives. In order to let the robot paint more closely to our ordinary people’s paintings, this paper presents an accurate, fast and robust algorithm by using the Convolution Neural Networks for making facial linedrawings. We first improve the SFLG (Sketchy Facial Linedrawing Generation) networks, and then design a FLG (Facial Line-drawing Generation) networks. The FLG uses complete convolution and deconvolution that can generate images by endto-end. Moreover, the FLG has no requirement to the size of the input image. Experimental results have illustrated the stableness and accuracy of proposed FLG networks. The accuracy rate of FLG networks is over 95% and it is 7% higher than SFLG networks.
|
|
15:00-17:00, Paper TuPMP.61 | |
Improved Learning in Convolutional Neural Networks with Shifted Exponential Linear Units (ShELUs) |
Grelsson, Bertil | Linköping Univ |
Felsberg, Michael | Linköping Univ |
Keywords: Classification, Neural networks
Abstract: The Exponential Linear Unit (ELU) has been proven to speed up learning and improve the classification performance over activation functions such as ReLU and Leaky ReLU for convolutional neural networks. The reasons behind the improved behavior are that ELU reduces the bias shift, it saturates for large negative inputs and it is continuously differentiable. However, it remains open whether ELU has the optimal shape and we address the quest for a superior activation function. We use a new formulation to tune a piecewise linear activation function during training, to investigate the above question, and learn the shape of the locally optimal activation function. With this tuned activation function, the classification performance is improved and the resulting, learned activation function shows to be ELU-shaped irrespective if it is initialized as a RELU, LReLU or ELU. Interestingly, the learned activation function does not exactly pass through the origin indicating that a shifted ELU-shaped activation function is preferable. This observation leads us to introduce the Shifted Exponential Linear Unit (ShELU) as a new activation function. Experiments on Cifar-100 show that the classification performance is further improved when using the ShELU activation function in comparison with ELU. The improvement is achieved when learning an individual bias shift for each neuron.
|
|
15:00-17:00, Paper TuPMP.62 | |
Learning Attribute Representation for Human Activity Recognition |
Moya Rueda, Wilmar Fernando | Tech. Univ. Dortmund |
Fink, Gernot | TU Dortmund Univ |
Keywords: Deep learning, Classification, Gesture recognition
Abstract: Attribute representations became relevant in image recognition and word spotting, providing support under the presence of unbalance and disjoint datasets. However, for human activity recognition using sequential data from on-body sensors, human-labeled attributes are lacking. This paper introduces a search for attributes that represent favorably signal segments for recognizing human activities. It presents three deep architectures, including temporal-convolutions and an IMU centered design, for predicting attributes. An empiric evaluation of random and learned attribute representations, and as well as the networks is carried out on two datasets, outperforming the state-of-the art.
|
|
15:00-17:00, Paper TuPMP.63 | |
Reflective Field for Pixel-Level Tasks |
Zhang, Liang | Xidian Univ |
Kong, Xiangwen | Xidian Univ |
Shen, Peiyi | Xidian Univ |
Zhu, Guangming | Xidian Univ |
Song, Juan | Xidian Univ |
Shah, Syed Afaq Ali | The Univ. of Western Australia |
Bennamoun, Mohammed | The Univ. of Western Australia |
Keywords: Neural networks, Deep learning, Scene understanding
Abstract: PixelNet has achieved great success in dense prediction problems with a pure pixel-level architecture, but there is still much room for improvement. In this paper, we start from PixelNet and discuss the pixel-level architecture called hypercolumn and its limitations in building feature representation with rich semantic information. To achieve this goal, we propose a concept in the context of neural networks called reflective field, representing the area reflected by the origin input. Furthermore, the proposed reflective field is used to solve the limitations of the hypercolumn architecture. Specifically, we give the method of calculating the size of the reflective field and analyze the effective reflective field in the calculated area. Then, we use the reflective field to build a new hypercolumn architecture, which has a more rational construction. The results on PASCAL VOC segmentation dataset with our new architecture are improved.
|
|
15:00-17:00, Paper TuPMP.64 | |
Adaptive Locality Preserving Based Discriminative Regression |
Wen, Jie | Shenzhen Graduate School, Harbin Inst. of Tech |
Fei, Lunke | Harbin Inst. of Tech |
Lai, Zhihui | Nanjing Univ. of Science and Tech |
Zhang, Zheng | Harbin Inst. of Tech |
Wu, Jian | Harbin Inst. of Tech |
Fang, Xiaozhao | Harbin Inst. of Tech. Shenzhen Graduate School |
Keywords: Classification, Regression, Image classification
Abstract: Classical linear regression not only lacks of the flexibility in fitting the label, but also ignores to preserve the intrinsic local geometric structure of data, which leads to overfitting. In this paper, we propose a novel discriminative regression method, called adaptive locality preserving based discriminative regression (ALPDR), to address these problems. Firstly, a locality preserving constraint regularized by the adaptive weight is introduced to preserve the intrinsic geometric structures of data, in which the similar points of the same class are adaptively pulled together by the projection. Secondly, ALPDR directly learns the discriminative target matrix from data based on the given label information, which allows more freedom in label fitting and simultaneously enlarges the margins between different classes. Thirdly, ALPDR imposes a row-sparsity constraint on the projection, which enables the method to adaptively select the most discriminative features from data such that the negative influence of noises and redundant features can be eliminated. Finally, an efficient iterative algorithm is provided to optimize the model. Extensive experiments show that the proposed method outperforms the other state-of-art methods, which proves the effectiveness of the proposed method.
|
|
15:00-17:00, Paper TuPMP.65 | |
Learning Training Samples for Occlusion Edge Detection and Its Application in Depth Ordering Inference |
Zhou, Yu | Beijing Univ. of Posts and Telecommunications |
Ma, Jianxiang | Beijing Univ. of Posts and Telecommunications |
Ming, Anlong | Beijing Univ. of Posts and Telecommunications |
Bai, Xiang | Huazhong Univ. of Sci. and Tech |
Keywords: Classification, Dimensionality reduction, Scene understanding
Abstract: This paper studies the problem of occlusion edge detection, which is applied to infer the depth order of objects in a monocular image. The key observation is that, given the fixed regression objective, the accuracy of occlusion edge detection is effectively boosted by selecting appropriate training samples in a discriminative feature subspace. Specifically, the l1-regularized logistic regression is employed to learn a more sparse yet discriminative feature subspace, while the training sample selection is formulated as a quadratic optimization with the robust Huber loss. The presented formulation avoids the noises efficiently, and hence the desirable occlusion edges can be detected. We validate the effectiveness of our approach on depth order inference problem. Experiments are conducted on two famous datasets, i.e., the Cornell depth-order dataset and the NYU2 dataset. Promising results demonstrate the superiority of our approach over the state-of-the-art approaches.
|
|
15:00-17:00, Paper TuPMP.66 | |
Multi-Source Learning for Skeleton-Based Action Recognition Using Deep LSTM Networks |
Cui, Ran | China Univ. of Mining and Tecnology |
Zhu, Aichun | Nanjing Tech. Univ |
Zhang, Sai | China Univ. of Mining and Tecnology |
Hua, Gang | China Univ. of Mining and Tecnology |
Keywords: Applications of pattern recognition and machine learning, Human behavior analysis, Neural networks
Abstract: Skeleton-based action recognition is widely concerned because skeletal information of human body can express action features simply and clearly, and it is not affected by physical features of the human body. Therefore, in this paper, the method of action recognition is based on skeletal information extracted from RGBD video. Since the skeleton coordinates we studied are two-dimensional, our method can be applied to RGB video directly. The recently proposed method based on the deep network only focuses on the temporal dynamic of action and ignores spatial configuration. In this paper, a Multi-source model is proposed based on the fusion of the temporal and spatial models. The temporal model is divided into three branches, which perceive the global-level, local-level, and detail-level information respectively. The spatial model is used to perceive the relative position information of skeleton joints. The fusion of the two models is beneficial to improve the recognition accuracy. The proposed method is compared with the state-of-the-art methods on a large scale dataset. The experimental results demonstrate the effectiveness of our method.
|
|
15:00-17:00, Paper TuPMP.67 | |
Conditional Transfer with Dense Residual Attention: Synthesizing Traffic Signs from Street-View Imagery |
Sebastian, Clint | Eindhoven Univ. of Tech |
Uittenbogaard, Ries | TU Delft |
Vijverberg, Julien | CycloMedia B.V |
Boom, Bastiaan Johannes | Cyclomedia |
De With, Peter H. N. | Eindhoven Univ. of Tech |
Keywords: Deep learning, Object recognition, Applications of computer vision
Abstract: Object detection and classification of traffic signs in street-view imagery is an essential element for asset management, map making and autonomous driving. However, some traffic signs occur rarely and consequently, they are difficult to recognize automatically. To improve the detection and classification rates, we propose to generate images of traffic signs, which are then used to train a detector/classifier. In this research, we present an end-to-end framework that generates a realistic image of a traffic sign from a given image of a traffic sign and a pictogram of the target class. We propose a residual attention mechanism with dense concatenation called Dense Residual Attention, that preserves the background information while transferring the object information. We also propose to utilize multi-scale discriminators, so that the smaller scales of the output guide the higher resolution output. We have performed detection and classification tests across a large number of traffic sign classes, by training the detector using the combination of real and generated data. The newly trained model reduces the number of false positives by 1.2 - 1.5% at 99% recall in the detection tests and an absolute improvement of 4.65% (top-1 accuracy) in the classification tests.
|
|
15:00-17:00, Paper TuPMP.68 | |
Improving Optimum-Path Forest Classification Using Unsupervised Manifold Learning |
Sugi Afonso, Luis Claudio | Federal Univ. of Sao Carlos |
Pedronette, Daniel Carlos Guimaraes | Sao Paulo State Univ |
Souza, Andre Nunes | Sao Paulo State Univ |
Papa, Joao Paulo | Sao Paulo State Univ. - UNESP |
Keywords: Manifold learning, Classification
Abstract: Appropriate metrics are paramount for machine learning and pattern recognition. In Content-based Image Retrieval-oriented applications, low-level features and pairwise-distance metrics are usually not capable of representing similarity among the objects as observed by humans. Therefore, metric learning from available data has become crucial in such applications, but just a few related approaches take into account the contextual information inherent from the samples for a better accuracy performance. In this paper, we propose a novel approach which combines an unsupervised manifold learning algorithm with the Optimum-Path Forest (OPF) classifier to obtain more accurate recognition rates, as well as we show it can outperform standard OPF-based classifiers that are trained over the original manifold. Experiments conducted in some public datasets evidenced the validity of metric learning in the context of OPF classifiers.
|
|
15:00-17:00, Paper TuPMP.69 | |
Superframes, a Temporal Video Segmentation |
Sadeghi Sokeh, Hajar | Kingston Univ |
Argyriou, Vasileios | Kingston Univ. London |
Monekosso, Dorothy | Leeds Beckett Univ |
Remagnino, Paolo | Kingston Univ |
Keywords: Clustering, Segmentation, features and descriptors, Video processing and analysis
Abstract: The goal of video segmentation is to turn video data into a set of concrete motion clusters that can be easily interpreted as building blocks of the video. There are some works on similar topics like detecting scene cuts in a video, but there is few specific research on clustering video data into the desired number of compact segments. It would be more intuitive, and more efficient, to work with perceptually meaningful entity obtained from a low-level grouping process which we call it ‘superframe’. This paper presents a new simple and efficient technique to detect superframes of similar content patterns in videos. We calculate the similarity of content-motion to obtain the strength of change between consecutive frames. With the help of existing optical flow technique using deep models, the proposed method is able to perform more accurate motion estimation efficiently. We also propose two criteria for measuring and comparing the performance of different algorithms on various databases. Experimental results on the videos from benchmark databases have demonstrated the effectiveness of the proposed method.
|
|
15:00-17:00, Paper TuPMP.70 | |
Graph Edit Distance Testing through Synthetic Graphs Generation |
Serratosa, Francesc | Univ. Rovira I Virgili |
Santacruz Muńoz, Jose Luis | Univ. Rovira Virgili |
Keywords: Graph matching
Abstract: Error-tolerant graph matching has been demonstrated to be an NP-problem, for this reason, several sub-optimal algorithms have been presented with the aim of making the runtime acceptable in some applications. These algorithms have been tested with relative small graphs due to the computation of the true distance for comparison purposes is too costly. We present a method to generate graphs together with an upper and lower bound distance with linear computational cost. Through this method, we can test the behaviour of the known or new sub-optimal error-tolerant graph matching algorithms against a lower and an upper bound graph edit distance on large graphs, even though we do not have the true distance. The computational cost to generate a pair of graphs together with their upper and lower bounds is linear with regard to the number of nodes. The practical experimentation shows that the runtime to generate a pair of graphs is negligible with regard to the runtime to match them.
|
|
15:00-17:00, Paper TuPMP.71 | |
Botnet Detection Based on Fuzzy Association Rules |
Lu, Jiazhong | Univ. of Electronic Science and Tech. of China |
Lv, Fengmao | Univ. of Electronic Science and Tech. of China, Chengd |
Liu, Quan-Hui | Univ. of Electronic Science and Tech. of China, Chengd |
Zhang, Malu Zhang | Univ. of Electronic Science and Tech. of China, Chengd |
Zhang, Xiaosong | Univ. of Electronic Science and Tech. of China, |
Keywords: Classification, Applications of pattern recognition and machine learning, Data mining
Abstract: Difficult to be detected in complex network environ- ments, botnets have been huge threats to network security. As the circumscriptions of normal traffics and botnet traffics are blurring, the commonly used botnet detection methods based on traffic analysis often result in high false positive rates. To overcome this issue, we propose an effective botnet detection method based on fuzzy association rules. The proposed method can calculate the features of botnet traffic accurately, which can be used to recognize the normal traffic and botnet. We first collect the data in the laboratory by setting different botnets in the controlled experiment. The botnet traffic features, association rules support, trust and membership are calculated by the proposed method, which are further used to distinguish the type of botnet. When our method is compared with other methods in our data set, we find the former performs better. For the generality, we also test our method on the public data set and also find the higher accuracy rates, which demonstrates the proposed method is effective in detecting the botnets.
|
|
15:00-17:00, Paper TuPMP.72 | |
Efficient Text Classification Using Tree-Structured Multi-Linear Principal Component Analysis |
Su, Yuanhang | Univ. of Southern California |
Huang, Yuzhong | Univ. of Southern California |
Kuo, C.-C. Jay | Univ. of Southern California |
Keywords: Dimensionality reduction, Classification, Neural networks
Abstract: A novel text data dimension reduction technique, called the tree-structured multi-linear principal component analysis (TMPCA), is proposed in this work. Being different from traditional text dimension reduction methods that deal with the word-level representation, the TMPCA technique reduces the dimension of input sequences and sentences to simplify the following text classification tasks. It is shown mathematically and experimentally that the TMPCA tool demands much lower complexity (and, hence, less computing power) than the ordinary principal component analysis (PCA). Furthermore, it is demonstrated by experimental results that the support vector machine (SVM) method applied to the TMPCA-processed data achieves commensurable or better performance than the state-of-the-art recurrent neural network (RNN) approach.
|
|
15:00-17:00, Paper TuPMP.73 | |
Prediction-Based Classification Using Learning on Riemannian Manifolds |
Tayanov, Vitaliy | Concordia Univ |
Krzyzak, Adam | -Concordia Univ |
Suen, Ching Y | Concordia Univ |
Keywords: Manifold learning, Ensemble learning, Deep learning
Abstract: This paper is concerned with learning from predictions. Predictions are obtained by ensemble of classifiers such as random forests (RF) or extra-trees. One assumes that estimators are semi independent so that they can be considered as prediction space. Hence we project our feature vector to the space of estimators obtaining responses from each of them. The responses for RFs are conditional class probabilities. The responses might be considered as projections onto some direction in quasi-orthogonal space which are decision trees of a RF. After that one creates the connected Riemannian manifold by computing a matrix of pairwise products of predictions for all trees in the RF. These matrices are symmetric and positive definite which is a necessary and sufficient condition to have a connected Riemannian manifold (R manifold). Because outputs of trees are conditional probabilities we have to create as many such matrices as there are classes. Stacking all these matrices together we obtain a tensor which is passed to Convolutional Neural Networks (CNN) for learning. We tested our algorithm on 12 datasets from UCI repository representing difficult classification problems. The results show very fast learning and convergence of loss and prediction accuracy. The proposed algorithm outperforms feature-based classical classifier ensembles (RFs and extra-trees) for every tested dataset from UCI repository.
|
|
15:00-17:00, Paper TuPMP.74 | |
Two-Stream Gated Fusion ConvNets for Action Recognition |
Zhu, Jiagang | Chinese Acad. of Sciences, Inst. of Automation |
Zou, Wei | CASIA |
Zhu, Zheng | CASIA |
Keywords: Deep learning, Classification, Video analysis
Abstract: The two-stream ConvNets in action recognition always fuse the two streams' predictions by the weighted averaging scheme. This fusion way with fixed weights lacks of pertinence to different action videos and always needs trial and error on the validation set. In order to enhance the adaptability of two-stream ConvNets, an end-to-end trainable gated fusion method, namely gating ConvNet, is proposed in this paper based on the MoE (Mixture of Experts) theory. The gating ConvNet takes the combination of convolutional layers of the spatial and temporal nets as input and outputs two fusion weights. To reduce the over-fitting of gating ConvNet caused by the redundancy of parameters, a new multi-task learning method is designed, which jointly learns the gating fusion weights for the two streams and learns the gating ConvNet for action classification. With the proposed gated fusion method and multi-task learning approach, competitive performance is achieved on the video action dataset UCF101.
|
|
15:00-17:00, Paper TuPMP.75 | |
Rethinking ReLU to Train Better CNNs |
Zhao, Gangming | Univ. of Chinese Acad. of Sciences |
Zhang, Zhaoxiang | Inst. of Automation, Chinese Acad. of Sciences |
Guan, He | Inst. of Automation, Chinese Acad. of Sciences |
Tang, Peng | Huazhong Univ. of Science and Tech |
Wang, Jingdong | Microsoft Res. Asia |
Keywords: Classification, Deep learning, Image classification
Abstract: Most of convolutional neural networks share the same characteristic: each convolutional layer is followed by a nonlinear activation layer where Rectified Linear Unit (ReLU) is the most widely used. In this paper, we argue that the designed structure with the equal ratio between these two layers may not be the best choice since it could result in the poor generalization ability. Thus, we try to investigate a more suitable method on using ReLU to explore the better network architectures. Specifically, we propose a proportional module to keep the ratio between convolution and ReLU amount to be N:M (N>M). The proportional module can be applied in almost all networks with no extra computational cost to improve the performance. Comprehensive experimental results indicate that the proposed method achieves better performance on different benchmarks with different network architectures, thus verify the superiority of our work.
|
|
15:00-17:00, Paper TuPMP.76 | |
Bayesian Multi-Hyperplane Machine for Pattern Recognition |
Nguyen, Khanh | Deakin Univ |
Le, Trung | Deakin Univ |
Nguyen, Tu Dinh | Deakin Univ |
Phung, Dinh | Deakin Univ |
Keywords: Probabilistic graphical model, Model selection, Classification
Abstract: Current existing multi-hyperplane machine approach deals with high-dimensional and complex datasets by approximating the input data region using a parametric mixture of hyperplanes. Consequently, this approach requires an excessively time-consuming parameter search to find the set of optimal hyper-parameters. Another serious drawback of this approach is that it is often suboptimal since the optimal choice for the hyper-parameter is likely to lie outside the searching space due to the space discretization step required in grid search. To address these challenges, we propose in this paper BAyesian Multi-hyperplane Machine (BAMM). Our approach departs from a Bayesian perspective, and aims to construct an alternative probabilistic view in such a way that its maximum-a-posteriori (MAP) estimation reduces exactly to the original optimization problem of a multi-hyperplane machine. This view allows us to endow prior distributions over hyper-parameters and augment auxiliary variables to efficiently infer model parameters and hyper-parameters via Markov chain Monte Carlo (MCMC) method. We then employ a Stochastic Gradient Descent (SGD) framework to scale our model up with ever-growing large datasets. Extensive experiments demonstrate the capability of our proposed method in learning the optimal model without using any parameter tuning, and in achieving comparable accuracies compared with the state-of-art baselines; in the meantime our model can seamlessly handle with large-scale datasets.
|
|
15:00-17:00, Paper TuPMP.77 | |
Diversified Dual Domain-Adversarial Neural Networks |
Fang, Yuchun | Shanghai Univ |
Yuan, Qiulong | Shanghai Univ |
Zhang, Wei | Shanghai Univ |
Zhang, Zhaoxiang | Inst. of Automation, Chinese Acad. of Sciences |
Keywords: Deep learning, Domain adaptation
Abstract: The cost of large-scale data collection and annotation is often obstacle for applying machine learning methods to new tasks or datasets. One way to circumvent this cost is constructing models to synthesize data and provide automatic annotation. Although these models are attractive, they often can not be generalized from synthetic images to real-world images. Therefore, domain adaptive algorithm is needed to improve these models, so that they can be applied successfully. In this paper, we propose an unsupervised domain adaptive framework codenamed D-DANN that can improve the representation power of convolutional neural network (CNN). Inspired by the theory of adversarial learning, we apply the discriminator to diverse the features extracted from dual branch CNN. We can obtain more sufficient shared representation across domains by the proposed dual feature extractors. We implement the D-DANN with several popular CNN models including LeNet, AlexNet and so on. Using these designed neural networks, we conduct extensive experiments on several pairs of domain adaptive validation datasets. The results show that our approach can efficiently enhance domain adaptive capability of general CNN models for unlabeled data.
|
|
15:00-17:00, Paper TuPMP.78 | |
Robust Discriminative Projective Dictionary Pair Learning by Adaptive Representations |
Sun, Yulin | Soochow Univ |
Zhang, Zhao | Soochow Univ |
Jiang, Weiming | Soochow Univ. the School of Computer Science and Tech |
Liu, Guangcan | Cornell |
Wang, Meng | Microsoft Res. Asia |
Yan, Shuicheng | National Univ. of Singapore |
Keywords: Data mining, Classification, Learning-based vision
Abstract: In this paper, we mainly propose a Robust Adaptive Projective Dictionary Pair Learning (RA-DPL) framework based on the adaptive discriminative representations. Our formulation can seamlessly integrate the robust projective dictionary pair learning and the adaptive sparse representation learning into a unified model. RA-DPL improves the existing DPL algorithm in threefold. First, RA-DPL aims at computing the robust projective dictionary pairs by employing the sparse and robust l2,1-norm to encode the reconstruction error. Second, RA-DPL regularizes the robust l2,1-norm on the analysis dictionary so that the analysis dictionary can extract sparse coefficients from the given samples explicitly. More importantly, the optimization of l2,1-norm is so efficient, that is, the sparse coding step will be time-saving. Third, RA-DPL can clearly preserve the local neighborhood relationship of the sparse coefficients within each class, which can make the learnt representations discriminating and can also improve the discriminating power of learnt dictionary. Extensive simulations on image databases demonstrate that our RA-DPL can obtain the superior performance over other state-of-the-arts.
|
|
15:00-17:00, Paper TuPMP.79 | |
Multiple Manifolds Metric Learning with Application to Image Set Classification |
Wang, Rui | School of Internet of Things Engineering, Jiangnan Univ |
Wu, Xiaojun | Jiangnan Univ |
Kittler, Josef | Univ. of Surrey |
Chen, Kai-Xuan | Jiangnan Univ |
Keywords: Manifold learning, Image classification
Abstract: In image set classification, a considerable advance has been made by modeling the original image sets by second order statistics or linear subspace, which typically lie on the Riemannian manifold. Specifically, they are Symmetric Positive Definite (SPD) manifold and Grassmann manifold respectively, and some algorithms have been developed on them for classification tasks. Motivated by the inability of existing methods to extract discriminatory features for data on Riemannian manifolds, we propose a novel algorithm which combines multiple manifolds as the features of the original image sets. In order to fuse these manifolds, the well-studied Riemannian kernels have been utilized to map the original Riemannian spaces into high dimensional Hilbert spaces. A metric Learning method has been devised to embed these kernel spaces into a lower dimensional common subspace for classification. The state-of-the-art results achieved on three datasets corresponding to two different classification tasks, namely face recognition and object categorization, demonstrate the effectiveness of the proposed method.
|
|
15:00-17:00, Paper TuPMP.80 | |
Dense Convolutional Recurrent Neural Network for Generalized Speech Animation |
Xiao, Lei | Inst. of Intelligent Machines, Chinese Acad. of Sciences, |
Wang, Zengfu | Univ. of Science and Tech. of China |
Keywords: Deep learning, Audio and acoustic processing and analysis, Applications of pattern recognition and machine learning
Abstract: This paper presents a novel automated speech animation approach named Dense Convolutional Recurrent Neural Network (DenseCRNN). The approach learns a non-linear mapping from acoustic speech to multiple articulator movements in a unified framework to which feature extraction, context encoding and multi-parameter decoding are integrated. We propose DenseCRNN based on three insights: (1) One can use a convolutional neural network incorporated with dense connectivity to extract speaker-independent features from arbitrary spoken audio effectively. (2) A bidirectional long short-term memory neural network is able to model the context information with respect to the phoneme coarticulation. (3) Multi-domain learning can be implemented to achieve better performance on account of the implicit correlation and explicit difference among outputs, where each domain is responsible for a single visual parameter. Experiments on MNGU0 dataset demonstrate our approach achieves significant improvements over state-of-the-art methods. Moreover, the proposed approach generalizes over different gender or accent, and has the capability of deploying on various character models.
|
|
15:00-17:00, Paper TuPMP.81 | |
Graph Memory Networks for Molecular Activity Prediction |
Pham, Trang | Deakin Univ |
Tran, Truyen | Deakin Univ |
Venkatesh, Svetha | Deakin Univ |
Keywords: Deep learning, Applications of pattern recognition and machine learning, Multitask learning
Abstract: Molecular activity prediction is critical in drug design. Machine learning techniques such as kernel methods and random forests have been successful for this task. These models require fixed-size feature vectors as input while the molecules are variable in size and structure. As a result, fixed-size fingerprint representation is poor in handling substructures for large molecules. In addition, molecular activity tests, or a so-called BioAssays, are relatively small in the number of tested molecules due to its complexity. Here we approach the problem through deep neural networks as they are flexible in modeling structured data such as grids, sequences and graphs. We train multiple BioAssays using a multi-task learning framework, which combines information from multiple sources to improve the performance of prediction, especially on small datasets. We propose Graph Memory Network (GraphMem), a memory-augmented neural network to model the graph structure in molecules. GraphMem consists of a recurrent controller coupled with an external memory whose cells dynamically interact and change through a multi-hop reasoning process. Applied to the molecules, the dynamic interactions enable an iterative refinement of the representation of molecular graphs with multiple bond types. GraphMem is capable of jointly training on multiple datasets by using a specific-task query fed to the controller as an input. We demonstrate the effectiveness of the proposed model for separately and jointly training on more than 100K measurements, spanning across 9 BioAssay activity tests.
|
|
15:00-17:00, Paper TuPMP.82 | |
End-To-End Video-Level Representation Learning for Action Recognition |
Zhu, Jiagang | Chinese Acad. of Sciences, Inst. of Automation |
Zou, Wei | CASIA |
Zhu, Zheng | CASIA |
Keywords: Deep learning, Video analysis, Classification
Abstract: From the frame/clip-level feature learning to the video-level representation building, deep learning methods in action recognition have developed rapidly in recent years. However, current methods suffer from the confusion caused by partial observation training, or without end-to-end learning, or restricted to single temporal scale modeling and so on. In this paper, we build upon two-stream ConvNets and propose Deep networks with Temporal Pyramid Pooling (DTPP), an end-to-end video-level representation learning approach, to address these problems. Specifically, at first, RGB images and optical flow stacks are sparsely sampled across the whole video. Then a temporal pyramid pooling layer is used to aggregate the frame-level features which consist of spatial and temporal cues. Lastly, the trained model has compact video-level representation with multiple temporal scales, which is both global and sequence-aware. Experimental results show that DTPP achieves the state-of-the-art performance on two challenging video action datasets: UCF101 and HMDB51, either by ImageNet pre-training or Kinetics pre-training.
|
|
15:00-17:00, Paper TuPMP.83 | |
Riemannian Kernel Based Nyström Method for Approximate Infinite-Dimensional Covariance Descriptors with Application to Image Set Classification |
Chen, Kai-Xuan | Jiangnan Univ |
Wu, Xiaojun | Jiangnan Univ |
Wang, Rui | School of Internet of Things Engineering, Jiangnan Univ |
Kittler, Josef | Univ. of Surrey |
Keywords: Manifold learning, Classification, Dimensionality reduction
Abstract: In the domain of pattern recognition, using the CovDs (Covariance Descriptors) to represent data and taking the metrics of the resulting Riemannian manifold into account have been widely adopted for the task of image set classification. Recently, it has been proven that infinite-dimensional CovDs are more discriminative than their low-dimensional counterparts. However, the form of infinite-dimensional CovDs is implicit and the computational load is high. We propose a novel framework for representing image sets by approximating infinite-dimensional CovDs in the paradigm of the Nystro ̈m method based on a Riemannian kernel. We start by modeling the images via CovDs, which lie on the Riemannian manifold spanned by SPD (Symmetric Positive Definite) matrices. We then extend the Nyström method to the SPD manifold and obtain the approximations of CovDs in RKHS (Reproducing Kernel Hilbert Space). Finally, we approximate infinite-dimensional CovDs via these approximations. Empirically, we apply our framework to the task of image set classification. The experimental results obtained on three benchmark datasets show that our proposed approximate infinite-dimensional CovDs outperform the original CovDs.
|
|
15:00-17:00, Paper TuPMP.84 | |
Automated Pruning for Deep Neural Network Compression |
Manessi, Franco | Lastminute.com Group |
Rozza, Alessandro | Lastminute.com Group |
Bianco, Simone | Univ. of Milano-Bicocca |
Napoletano, Paolo | Univ. of Milano-Bicocca |
Schettini, Raimondo | Univ. Degli Studi Di Milano-Bicocca |
Keywords: Deep learning, Neural networks, Transfer learning
Abstract: In this work we present a method to improve the pruning step of the current state-of-the-art methodology to compress neural networks. The novelty of the proposed pruning technique is in its differentiability, which allows pruning to be performed during the backpropagation phase of the network training. This enables an end-to-end learning and strongly reduces the training time. The technique is based on a family of differentiable pruning functions and a new regularizer specifically designed to enforce pruning. The experimental results show that the joint optimization of both the thresholds and the network weights permits to reach a higher compression rate, reducing the number of weights of the pruned network by a further 14% to 33% compared to the current state-of-the-art. Furthermore, we believe that this is the first study where the generalization capabilities in transfer learning tasks of the features extracted by a pruned network are analyzed. To achieve this goal, we show that the representations learned using the proposed pruning methodology maintain the same effectiveness and generality of those learned by the corresponding non-compressed network on a set of different recognition tasks.
|
|
15:00-17:00, Paper TuPMP.85 | |
LHONE: Label Homophily Oriented Network Embedding |
Zhang, Le | Chinese Acad. of Sciences |
Li, Xiang | Inst. of Information Engineering |
Xiang, Ji | Inst. of Information Engineering |
Qi, Ying | Chinese Acad. of Sciences |
Keywords: Data mining, Classification
Abstract: Network embedding is to learn effective low-dimensional vector representations for nodes in a network and has attracted considerable attention in recent years. To date, existing methods mainly focus on network structure information and cannot leverage abundant label information, which is potentially valuable in learning better vector representations. Due to the noise and incompleteness of label information, it is intractable to integrate label information into the vector representations in a partially labeled network. To address this issue, we investigate the effects of label information based on the label homophily. Briefly, label homophily can not only drive nodes sharing similar labels to be connected to each other, but also produce a division of a network into densely-connected, homogeneous parts that are weakly connected to each other. Furthermore, we propose a novel Label Homophily Oriented Network Embedding (LHONE) model to make the best of label homophily by converting a partially labeled network to two bipartite networks, and learning vector representations combined with a Gaussian mixture model (GMM). Extensive experiments on two real-world network datasets demonstrate the effectiveness of LHONE compared to state-of-the-art network embedding approaches.
|
|
15:00-17:00, Paper TuPMP.86 | |
Skin Lesion Segmentation Via Dense Connected Deconvolutional Network |
Li, Hang | Shenzhen Univ |
He, Xinzi | School of Biomedical Engineering, Health Science Center, Shenzhe |
Yu, Zhen | Shenzhen Univ |
Zhou, Feng | Department of Industrial and Manufacturing, Systems Engineering, |
Cheng, Jie Zhi | United-Imaging Healthcare, Shanghai, China |
Huang, Limin | Shenzhen People’s Hospital |
Wang, Tianfu | Shenzhen Univ |
Lei, Baiying | Shenzhen Univ |
Keywords: Deep learning, Neural networks, Transfer learning
Abstract: Dermoscopy imaging analysis is a routine procedure for diagnosis and treatment of skin lesions. Segmentation is the very first step to demarcate skin lesions for further quantitative analysis. However, it is a challenging task due to various changes from different viewpoints and scales of skin lesions. To handle these challenges, we devise a new dense deconvolutional network (DDN) for skin lesion segmentation based on encoding module and decoding module. Our devised network consists of convolution unit, dense deconvolutional layer (DDL) and chained residual pooling block. DDL is adopted to restore the high resolution of the original input by upsampling, while the chained residual pooling is utilized to fuse multi-level features. Also, the hierarchical supervision is added to capture low level detailed boundary information. The DDN is trained in an end-to-end manner and free of prior knowledge and complicated post-processing procedures. With fusing the local and global contextual information, the high-resolution prediction output is obtained. The validation on the public ISBI 2016 and 2017 skin lesion challenge dataset demonstrates the effectiveness of our proposed method.
|
|
15:00-17:00, Paper TuPMP.87 | |
Generating Adversarial Examples with Conditional Generative Adversarial Net |
Yu, Ping | Nanjing Univ. of Science and Tech |
Song, Kaitao | Nanjing Univ. of Science & Tech |
Lu, Jianfeng | Nanjing Univ. of Science & Tech |
Keywords: Deep learning, Image classification, Neural networks
Abstract: Recently, deep neural networks have significant progress and successful application in various fields, but they are found vulnerable to attack instances, e.g., adversarial examples. State-of-art attack methods can generate attack images by adding small perturbation to the source image. These attack images can fool the classifier but have little impact to human. Therefore, such attack instances are difficult to generate by searching the feature space. How to design an effective and robust generating method has become a spotlight. Inspired by adversarial examples, we propose two novel generative models to produce adaptive attack instances directly, in which conditional generative adversarial network is adopted and distinctive strategy is designed for training. Compared with the common method, such as Fast Gradient Sign Method, our models can reduce the generating cost and improve robustness and has about one fifth running time for producing attack instance.
|
|
15:00-17:00, Paper TuPMP.88 | |
An Effective Deep Learning Based Scheme for Network Intrusion Detection |
Zhang, Hongpo | Zhengzhou Science and Tech. Inst |
Wu, Chase Q. | New Jersey Inst. of Tech |
Gao, Shan | Zhengzhou Univ |
Wang, Zongmin | Zhengzhou Univ |
Xu, Yuxiao | Hangzhou DPtech Tech. Co., Ltd |
Liu, Yongpeng | Zhengzhou Univ |
Keywords: Deep learning
Abstract: Intrusion detection systems (IDS) play an important role in the protection of network operations and services. In this paper, we propose an effective network intrusion detection scheme based on deep learning techniques. The proposed scheme employs a denoising autoencoder (DAE) with a weighted loss function for feature selection, which determines a limited number of important features for intrusion detection to reduce feature dimensionality. The selected data is then classified by a compact multilayer perceptron (MLP) for intrusion identification. Extensive experiments are conducted on the UNSW-NB dataset to demonstrate the effectiveness of the proposed scheme. With a small feature selection ratio of 5.9%, the proposed scheme is still able to achieve a superior performance in terms of different evaluation criteria. The strategic selection of a reduced set of features yields satisfactory detection performance with low memory and computing power requirements, making the proposed scheme a promising solution to intrusion detection in high-speed networks.
|
|
15:00-17:00, Paper TuPMP.89 | |
Lifting Scheme Based Deep Network Model for Remote Sensing Imagery Classification |
Liu, Xinlong | Wuhan Univ |
He, Bokun | Wuhan Univ |
He, Chu | Wuhan Univ |
Keywords: Classification, Neural networks, Deep learning
Abstract: Deep Learning has shown great success in many fields, however, transferring this potential to remote sensing imagery interpretation is still a challenging task due to the special data properties, e.g., low Signal-to-Noise Ratio (SNR), high variation, etc. In this work, a lifting scheme based deep model is presented for remote sensing imagery classification. The main idea underlying this scheme is that, an innovative strategy is adopted to decompose the input image into two compact and low-resolution components, and these components are then fed into a standard Convolutional Neural Network (CNN) for classification task. More precisely, (1) one decomposed component is devoted to enhancing the latent patterns and simultaneously attenuating the random variation in the input, and (2) the other component is used to capture the local structural information in the input. The experimental results show that the lifting deep model is computationally efficient and has promising potential, improving the classification accuracy by about 5.7% and obtaining 2.69times speed-up compared with the counterpart CNN.
|
|
15:00-17:00, Paper TuPMP.90 | |
Cauchy Matching Pursuit for Robust Sparse Representation and Classification |
Wang, Yulong | Chengdu Univ |
Zou, Cuiming | Chengdu Univ |
Tang, YuanYan | Univ. of Macao |
Li, Luoqing | Hubei Univ |
Keywords: Sparse learning, Classification
Abstract: Various greedy algorithms have been developed for sparse signal recovery in recent years. However, most of them utilize the ell_2 norm based loss function and sensitive to non-Gaussian noises and outliers. This paper proposes a Cauchy matching pursuit (CauchyMP) algorithm for robust sparse representation and classification. By leveraging a Cauchy estimator based loss function, the proposed approach can robustly learn the sparse representation of noisy data corrupted by various severe noises. As a greedy algorithm, CauchyMP is also computationally efficient. We also develop a CauchyMP based classifier for robust classification with application to face recognition. The experiments on the datasets with gross corruptions demonstrate the efficacy and robustness of CauchyMP for learning robust sparse representation.
|
|
15:00-17:00, Paper TuPMP.91 | |
Deeply Supervised Residual Network for HEp-2 Cell Classification |
Xie, Hai | Shenzhen Univ |
He, Yejun | Shenzhen Univ |
Lei, Haijun | Shenzhen Univ |
Han, Tao | Shenzhen Univ |
Yu, Zhen | Shenzhen Univ |
Lei, Baiying | Shenzhen Univ |
Keywords: Classification, Neural networks, Medical image and signal analysis
Abstract: To diagnose various autoimmune diseases, the accurate Human Epithelial-2 (HEp-2) cell image classification is a very important step. Automatic classification of HEp-2 cell using microscope image is a highly challenging task due to the strong illumination changes derived from the low contrast of the cells. To address this challenge, we propose a deep residual network (ResNet) based framework to recognize HEp-2 cell automatically. Specifically, a residual network of 50 layers (ResNet-50) with substantial deep layer is adopted to acquire the informative feature for accurate recognition. To further boost the recognition performance, we devise a novel ResNet-based network with deep supervision. The deeply supervised ResNet (DSRN) can address the optimization problem of gradient vanishing/exploding and accelerate the convergence speed. DSRN can directly guide the training of the lower and upper levels of the network to counteract the effects of unstable gradient variations by the adverse training process. As a result, DSRN can extract more discriminative features. Experimental results show that our proposed DSRN method can achieve an average classification accuracy of 93.46% and 95.88% on ICPR2012 and ICPR2016-Task1 datasets, respectively. Our proposed method outperforms the traditional methods as well.
|
|
15:00-17:00, Paper TuPMP.92 | |
Pen Tip Motion Prediction for Handwriting Drawing Order Recovery Using Deep Neural Network |
Zhao, Bocheng | Chinese Acad. of Sciences, Inst. of Automation |
Yang, Minghao | National Lab. of Pattern Recognition (NLPR) Inst. of A |
Tao, Jianhua | Inst. of Automation, Chinese Acad. of Sciences |
Keywords: Regression, Deep learning, Applications of pattern recognition and machine learning
Abstract: Pen Tip Motion Prediction (PTMP) is the key step for Chinese handwriting order recovery (DOR), which is a challenge topic in the past few decades. We proposed a novel algorithm framework using Convolutional Neural Network (CNN) to predict pen tip movement for human handwriting pictures. The network is a regression CNN model, whose inputs are a series of part-drawn handwriting images and output is a vector that represents the probability of next stroke point position. The predicted output vector is utilized by an iteration framework to generate pen movement sequences. Experiments on public Chinese and English on-line handwriting database have indicated that the proposed model performs competitively in multi-writer handwriting PTMP and DOR tasks. Furthermore, the experiment demonstrated that characters belong to different languages shares some common writing patterns and the proposed method could learn these laws effectively.
|
|
15:00-17:00, Paper TuPMP.93 | |
Optimising Ensemble of Two-Class Classifiers Using Spectral Analysis |
Windeatt, Terry | Univ. Surey |
Keywords: Ensemble learning, Classification, Neural networks
Abstract: An approach to approximating the decision boundary of an ensemble of two-class classifiers is proposed. Spectral coefficients are used to approximate the discrete probability density function of a Boolean Function. It is shown that the difference between first and third order coefficient approximation is a good indicator of optimal base classifier complexity. A theoretical analysis is supported by experimental results on a variety of Artificial and Real two-class problems.
|
|
15:00-17:00, Paper TuPMP.94 | |
Neural Network Knowledge Transfer Using Unsupervised Similarity Matching |
Passalis, Nikolaos | Aristotle Univ. of Thessaloniki |
Tefas, Anastasios | Aristotle Univ. of Thessaloniki |
Keywords: Deep learning, Neural networks
Abstract: Transferring the knowledge from a large and complex neural network to a smaller and faster one allows for deploying more lightweight and accurate networks. In this paper, we propose a novel method that is capable of transferring the knowledge between any two layers of two neural networks by matching the similarity between the extracted representations. The proposed method is model-agnostic overcoming several limitations of existing knowledge transfer techniques, since the knowledge is transferred between layers that can have different architecture and no information about the complex model is required, apart from the output of the layers employed for the knowledge transfer. Three image datasets are used to demonstrate the effectiveness of the proposed approach, including a large-scale dataset for learning a light-weight model for facial pose estimation that can be directly deployed on devices with limited computational resources, such as embedded systems for drones.
|
|
15:00-17:00, Paper TuPMP.95 | |
Subspace Support Vector Data Description |
Sohrab, Fahad | Tampere Univ. of Tech |
Raitoharju, Jenni Karoliina | Tampere Univ. of Tech |
Moncef, Gabbouj | Tampere Univ. of Tech |
Iosifidis, Alexandros | Tampere Univ. of Tech |
Keywords: Support vector machine and kernel methods, Classification
Abstract: This paper proposes a novel method for solving one-class classification problems. The proposed approach, namely Subspace Support Vector Data Description, maps the data to a subspace that is optimized for one-class classification. In that feature space, the optimal hypersphere enclosing the target class is then determined. The method iteratively optimizes the data mapping along with data description in order to define a compact class representation in a low-dimensional feature space. We provide both linear and non-linear mappings for the proposed method. Experiments on 14 publicly available datasets indicate that the proposed Subspace Support Vector Data Description provides better performance compared to baselines and other recently proposed one-class classification methods.
|
|
15:00-17:00, Paper TuPMP.96 | |
Data Augmentation Via Latent Space Interpolation for Image Classification |
Liu, Xiaofeng | Carnegie Mellon Univ |
Zou, Yang | Carnegie Mellon Univ |
Kong, Lingsheng | Chinese Acad. of Sciences |
Diao, Zhihui | Chinese Acad. of Sciences |
Yan, Junliang | Chinese Acad. of Sciences |
Wang, Jun | Univ. of Chinese Acad. of Sciences |
Li, Site | Carnegie Mellon Univ |
Jia, Ping | Changchun Inst. of Optics, Fine Mechanies and Physics, CAS |
You, Jane | The Hong Kong Pol. Univ |
Keywords: Classification, Image classification, Deep learning
Abstract: Effective training of the deep neural networks requires much data to avoid underdetermined and poor generalization. Data Augmentation alleviates this by using existing data more effectively. However standard data augmentation produces only limited plausible alternative data by for example, flipping, distorting, adding noise to, cropping a patch from the original samples. In this paper, we introduce the adversarial autoencoder (AAE) to impose the feature representations with uniform distribution and compare both the linear interpolation and spherical linear interpolation Slerp on latent space, which is potential to generate a much broader set of augmentations for image classification. As a possible “recognition via generation” framework, it has potentials for several other classification tasks. Our experiments on the ILSVRC 2012, CIFAR-10, and CIFAR-100 datasets show that the latent space inter-class sampling (LSIS) improves the generalization and performance of state-of-the-art deep neural networks.
|
|
15:00-17:00, Paper TuPMP.97 | |
Multi-View Classification and 3D Bounding Box Regression Networks |
Pramerdorfer, Christopher | Vienna Univ. of Tech |
Kampel, Martin | Vienna Univ. of Tech |
Van Loock, Mark | Toyota Motor Europe |
Keywords: Deep learning, Multiview learning, 3D vision
Abstract: We present a method for jointly classifying objects in depth maps and regressing amodal (extending beyond occluded parts) 3D bounding boxes in a way that is highly robust to occlusions. Our method is based on a novel multi-view convolutional neural network architecture with shared layers for both tasks, improving efficiency. The network processes views that encode object geometry and occlusion information and outputs class scores and bounding box coordinates in world coordinates, requiring no post-processing steps. We demonstrate the effectiveness of our method by example of fall detection, presenting a new dataset of 40k samples rendered from 3D models. On this dataset, our method achieves an average classification accuracy above 97% and a regression error below 10 cm at occlusion ratios of up to 90%. The dataset and trained models are publicly available.
|
|
15:00-17:00, Paper TuPMP.98 | |
Deep Recurrent Electricity Theft Detection in AMI Networks with Random Tuning of Hyper-Parameters |
Mahmoud Nabil, Mahmoud | Tennessee Tech. Univ |
Muhammad Ismail, Muhammad Ismail | Texas A&M Univ. at Qatar |
Mohamed, Mahmoud | Tennessee Tech. Univ |
Mostafa Shahin, Mostafa Shahin | Texas A&M Univ |
Khalid Qaraqe, Khalid Qaraqe | Texas A&M Univ |
Serpedin, Erchin | Texas A&M Univ |
Keywords: Deep learning, Neural networks, Classification
Abstract: Modern smart grids rely on advanced metering infrastructure (AMI) networks for monitoring and billing purposes. However, such an approach suffers from electricity theft cyberattacks. Different from the existing research that utilizes shallow, static, and customer-specific-based electricity theft detectors, this paper proposes a generalized deep recurrent neural network (RNN)-based electricity theft detector that can efficiently thwart these cyberattacks. The proposed model exploits the time series nature of the customers electricity consumption to implement a gated recurrent unit (GRU)-RNN, hence, improving the detection performance. In addition, the proposed RNN-based detector adopts a random search analysis in its learning stage to appropriately fine tune its hyper-parameters. Extensive test studies are carried out to investigate the detectortextquoteright s performance using publicly available real data of 107,200 energy consumption days from 200 customers. Simulation results demonstrate the superior performance of the proposed detector compared with state-of-the-art electricity theft detectors.
|
|
15:00-17:00, Paper TuPMP.99 | |
A Segmented Local Offset Method for Imbalanced Data Classification Using Quasi-Linear Support Vector Machine |
Liang, Peifeng | Waseda Univ |
Yuan, Xin | Waseda Univ |
Li, Weite | Waseda Univ |
Hu, Jinglu | Waseda Univ |
Keywords: Support vector machine and kernel methods, Classification, Neural networks
Abstract: Within-class imbalance problems often occur in imbalance classification which worsens the imbalance distribution problem and increases the learning concept complexity. However, most of existing methods for imbalanced classification focus on rectifying the between-class which are insufficiencies and inappropriateness in many different scenarios. This paper proposes a novel quasi-linear SVM with local offset adjustment method for imbalance classification problem. Our chief aim is to use leaning offsets of sub-clusters obtained according to imbalance ratios of sub-clusters to adjust classifier to achieve the best results. For this purpose, firstly, a geometry-based partitions method for imbalance dataset is introduced to partition the input space into several linearly separable partitions so as to construct a quasi-linear kernel and obtain an SVM classifier. Then a local offset method based on F-score value for linearly separable imbalance dataset is introduced to obtain leaning offset of each partition. At last the quasi-linear SVM with local offset adjustment is used to get the classifier for imbalanced datasets. Simulation results on different real different real-world datasets show that the proposed method is effective for imbalanced data classifications.
|
|
15:00-17:00, Paper TuPMP.100 | |
Fully Convolutional Network for Head Detection with Depth Images |
Ballotta, Diego | Univ. of Modena and Reggio Emilia |
Borghi, Guido | Univ. of Modena and Reggio Emilia |
Vezzani, Roberto | Univ. of Modena and Reggio Emilia |
Cucchiara, Rita | Univ. Degli Studi Di Modena E Reggio Emilia |
Keywords: Deep learning, 3D vision, Object detection
Abstract: Head detection and localization are one of the most investigated and demanding tasks of the Computer Vision community. These are also a key element for many disciplines, like Human Computer Interaction, Human Behavior Understanding, Face Analysis and Video Surveillance. In last decades, many efforts have been conducted to develop accurate and reliable head or face detectors on standard RGB images, but only few solutions concern other types of images, such as depth maps. In this paper, we propose a novel method for head detection on depth images, based on a deep learning approach. In particular, the presented system overcomes the classic sliding-window approach, that is often the main computational bottleneck of many object detectors, through a Fully Convolutional Network. Two public datasets, namely Pandora and Watch-n-Patch, are exploited to train and test the proposed network. Experimental results confirm the effectiveness of the method, that is able to exceed all the state-of-art works based on depth images and to run with real time performance.
|
|
15:00-17:00, Paper TuPMP.101 | |
Visual Tree Convolutional Neural Network in Image Classification |
Liu, Yuntao | National Univ. of Defense Tech |
Dou, Yong | National Univ. of Defense Tech |
Jin, Ruochun | National Univ. of Defense Tech |
Qiao, Peng | National Univ. of Defense Tech |
Keywords: Deep learning, Image classification, Neural networks
Abstract: In image classification, Convolutional Neural Network(CNN) models have achieved high performance with the rapid development in deep learning. However, some categories in the image datasets are more difficult to be distinguished than others. Improving the classification accuracy on these confused categories is benefit to the overall performance. In this paper, we build a Confusion Visual Tree(CVT) based on the confused semantic level information to identify the confused categories. With the information provided by the CVT, we can lead the CNN training procedure to pay more attention on these confused categories. Therefore, we propose Visual Tree Convolutional Neural Networks(VT-CNN) based on the original deep CNN embedded with our CVT. We evaluate our VT-CNN model on the benchmark datasets CIFAR-10 and CIFAR-100. In our experiments, we build up 3 different VT-CNN models and they obtain improvement over their based CNN models by 1:36%, 0:89% and 0:64%, respectively.
|
|
15:00-17:00, Paper TuPMP.102 | |
Superpixel-Based Feature Extraction and Fusion Method for Hyperspectral and LiDAR Classification |
Sen, Jia | Shenzhen Univ |
Meng, Zhang | Shenzhen Univ |
Junjian, Xian | Shenzhen Univ |
Jiayue, Zhuang | Shenzhen Univ |
Qiang, Huang | Shenzhen Univ |
Keywords: Classification, Texture analysis
Abstract: In this paper, we propose a new efficient superpixel-based feature extraction and fusion method on hyperspectral and LiDAR data. Such important factor that the adjacent pixels belong to the same class with high probability is taken into consideration in our method, which means each superpixel can be regarded as a small region consisting of a number of pixels with similar spectral characteristics. In order to represent each superpixel well, we use our Gabor-wavelet-based feature extraction approach instead of morphological APs. A feature selection and fusion process has also been used to reduce the redundancy among Gabor features and make the fused feature more discriminative. The results on the several real dataset indicate that the proposed method provides state-of-the-art classification results, respectively, even when only few samples, i.e., only three samples per class, are labeled.
|
|
15:00-17:00, Paper TuPMP.103 | |
Multiple-Instance Learning with Empirical Estimation Guided Instance Selection |
Yuan, Liming | Tianjin Univ. of Tech |
Wen, Xianbin | Tianjin Univ. of Tech |
Xu, Haixia | Tianjin Univ. of Tech |
Zhao, Lu | Tianjin Chengjian Univ |
Keywords: Multiview learning, Classification, Semi-supervised learning
Abstract: The embedding based framework handles the multiple-instance learning (MIL) via the instance selection and embedding. It is how to select instance prototypes that becomes the main difference between various algorithms. Most current studies depend on single criteria for selecting instance prototypes. In this paper, we adopt two kinds of instance-selection criteria from two different views. For the combination of the two-view criteria, we also present an empirical estimator under which the two criteria compete for the instance selection. Experimental results validate the effectiveness of the proposed empirical estimator based instance-selection method for MIL.
|
|
15:00-17:00, Paper TuPMP.104 | |
Sequential Fish Catch Forecasting Using Bayesian State Space Models |
Kokaki, Yuya | Waseda Univ |
Tawara, Naohiro | Waseda Univ |
Kobayashi, Tetsunori | Waseda Univ |
Hashimoto, Kazuo | Waseda Univ |
Ogawa, Tetsuji | Waseda Univ |
Keywords: Sequence modeling, Applications of pattern recognition and machine learning
Abstract: A new state space model suitable for fixed shore net fishing is proposed and successfully applied to daily fish catch forecasting. Accurate prediction of daily fish catches makes it possible to support fishery workers with decision-making for efficient operations. For that purpose, the predictive model should be intuitive to the fishery workers and provide an estimate with a confidence. In the present paper, a fish catch forecasting method is developed using a state space model that emulates the process of fixed shore net fishing.In this method, the parameter estimation and prediction are sequentially performed using the Hamiltonian Monte Carlo method. The experimental comparisons using actual fish catch data and public meteorological information demonstrated that the proposed forecasting system yielded significant reductions in predictive errors over the systems based on decision-trees and legacy state-space models.
|
|
15:00-17:00, Paper TuPMP.105 | |
Extended Morphological Profile-Based Gabor Wavelets for Hyperspectral Image Classification |
Sen, Jia | Shenzhen Univ |
Huimin, Xie | Shenzhen Univ |
Xianglong, Deng | Shenzhen Univ |
Keywords: Classification, Texture analysis
Abstract: Hyperspectral imagery acquired by a hyperspectral sensor contains hundred of narrow contiguous spectral bands, since the spatial distribution of surface materials generally exhibits high regularity and local continuity, spatial texture information should be introduced to improve the classification accuracy of hyperspectral image. The extended morphological profiles (EMP) have been created from the raw hyperspectral image, which has proven to be effective and robust of reflecting the spatial structural features of hyperspectral data. Meanwhile, due to the three-dimensional (3D) Gabor wavelets have been introduced to exploit the joint spectral-spatial features of hyperspectral image. In this paper, in order to combine the advantages of the EMP operator and Gabor wavelet transform together, an extended morphological profile-based Gabor wavelets, named as EMP-Gabor, has been proposed for hyperspectral image classification. Specifically, principal components of the hyperspectral imagery are computed, the most significant principal components are used as base images for an extended morphological profile, and the EMP features can be thus obtained. Secondly, 3D Gabor wavelets with particular orientations are directly convolved with the EMP feature cube. Finally, support vector machine (SVM) classifier is utilized to carry out the classification task. Experimental results on two real hyperspectral data sets have demonstrated the effectiveness of the proposed EMP-Gabor framework for hyperspectral image classification over several state-of-the-art methods.
|
|
15:00-17:00, Paper TuPMP.106 | |
Fine-Grained Age Group Classification in the Wild |
Zhang, Ke | North China Electric Power Univ |
Liu, Na | North China Electric Power Univ |
Yuan, Xingfang | Univ. of Missouri |
Guo, Xinyao | North China Electric Power Univ |
Gao, Ce | North China Electric Power Univ |
Zhao, Zhenbing | North China Electric Power Univ |
Keywords: Deep learning, Image classification, Face recognition
Abstract: Age estimation from a single face image has been an essential task in the field of human-computer interaction and computer vision which has a wide range of practical application value. Concerning the problem that accuracy of age estimation of face images under unconstrained conditions are relatively low for existing methods, we propose a method based on Attention LSTM network for Fine-Grained age group classification in the wild based on the idea of Fine-Grained categories and visual attention. This method combines ResNets models with LSTM unit to construct AL-ResNets networks to extract age-sensitive local regions, which effectively improves age estimation accuracy. Firstly, ResNets model pre-trained on ImageNet data set is selected as the basic model, which is then fine-tuned on the IMDB-WIKI-101 data set for age estimation. Then, we fine-tune ResNets on the Adience data set to extract the global features of face images. To extract the local characteristics of age-sensitive areas, the LSTM unit is then presented to obtain the coordinates of the age-sensitive region automatically. Finally, by combining the global and local features, we got our final prediction results. Our experiments illustrate the effectiveness of AL-ResNets for age group classification in the wild, where it achieves new state-of-the-art performance than all other CNN methods on the Adience data set.
|
|
15:00-17:00, Paper TuPMP.107 | |
Region-Specific Metric Learning for Person Re-Identification |
Cao, Min | Inst. of Automation Chinese Acad. of Sciences |
Chen, Chen | Chinese Acad. of Sciences |
Hu, Xiyuan | Inst. of Automation, Chinese Acad. of Sciences |
Peng, Silong | Inst. of Automation, Chinese Acad. of Sciences |
Keywords: Multiview learning, Multilabel learning, Image classification
Abstract: Person re-identification addresses the problem of matching individual images of the same person captured by different non-overlapping camera views. Distance metric learning plays an effective role in addressing the problem. With the features extracted on several regions of person image, most of distance metric learning methods have been developed in which the learnt cross-view transformations are region-generic, i.e all region-features share a homogeneous transformation. The spatial structure of person image is ignored and the distribution difference among different region-features is neglected. Therefore in this paper, we propose a novel region-specific metric learning method in which a series of region-specific sub-models are optimized for learning cross-view region-specific transformations. Additionally, we also present a novel feature pre-processing scheme that is designed to improve the features’ discriminative power by removing weakly discriminative features. Experimental results on the publicly available VIPeR, PRID450S and QMUL GRID datasets demonstrate that the proposed method performs favorably against the state-of-the-art methods.
|
|
15:00-17:00, Paper TuPMP.108 | |
Robust Adaptive Low-Rank and Sparse Embedding for Feature Representation |
Wang, Lei | Soochow Univ |
Zhang, Zhao | Soochow Univ |
Liu, Guangcan | Cornell |
Ye, Qiaolin | Nanjing Univ. of Science and Tech |
Qin, Jie | ETH Zurich |
Wang, Meng | Microsoft Res. Asia |
Keywords: Classification, Sparse learning, Image processing and analysis
Abstract: Most existing low-rank sparse embedding models extract features of data in the original input space and usually separate the manifold preservation step from the coding process, which may result in the decreased performance. In this paper, a novel Robust Adaptive Low-rank and Sparse Embedding (RALSE) framework is technically proposed for salient feature extraction of the high-dimensional data by seamlessly integrating the joint low-rank and sparse recovery with the robust adaptive salient feature extraction. Specifically, our RALSE integrates the joint low-rank and sparse representation, adaptive neighborhood preserving graph weight learning and the robustness-promoting representation into a unified framework. For accurate similarity measure, RALSE computes the adaptive weights by minimizing the reconstruction error over the noise-removed data and salient features simultaneously, where L1-norm is regularized to ensure the sparse properties of learnt weights. RALSE can also ensure the learnt projection to preserve local neighborhood information of embedded features clearly and adaptively. The projection is not only modeled under joint low-rank and sparse regularization, but also computed from a clean subspace, making it powerful for the salient feature extraction. Thus, the learnt low-rank sparse features would be more accurate for subsequent classification. Extensive results demonstrate the effectiveness of our RALSE formulation for data representation and classification.
|
|
15:00-17:00, Paper TuPMP.109 | |
Riemannian Metric Learning Based on Curvature Flow |
Li, Yangyang | Acad. of Mathematics and Systems Science, Chinese Acad. of S |
Lu, Ruqian | Acad. of Mathematics and Systems Science, Chinese Acad. of S |
Keywords: Manifold learning, Dimensionality reduction
Abstract: In machine learning, a high dimensional dataset such as the digital image of human face is often considered as a point set distributed on a (differentiable) manifold. In many cases the intrinsic dimension of this manifold is highly lower than the representation dimension. In order to ease data processing, manifold-based dimension reduction algorithms were put forward since this century. The real purpose of manifold learning (MAL) is to learn a suitable metric in the low dimensional space. One main limitation of the existing MAL algorithms is that they all do not consider the intrinsic curvature of the embedded manifold, which means that the intrinsic geodesic distance cannot be uncovered by these algorithms. The intrinsic geodesic distance on the manifold is measured by Riemannian metric which is highly affected by Riemannian curvature. With this idea in mind, our work proposes to formulate a new algorithm by adding the curvature information to metric learning. We study a new curvature flow from input dataset. By employing this curvature flow, we obtain a Mahalanobis metric which can better uncover the intrinsic structure of the embedded manifold. To show the effectiveness of our proposed method, we compare our algorithm with several traditional MAL algorithms.
|
|
15:00-17:00, Paper TuPMP.110 | |
Cascade Deep Networks for Sparse Linear Inverse Problems |
Zhang, Huan | Tianjin Univ |
Shi, Hong | Tianjin Univ |
Wang, Wenwu | Univ. of Surrey |
Keywords: Sparse learning, Neural networks, Deep learning
Abstract: Sparse deep networks have been widely used in many linear inverse problems, such as image super-resolution and signal recovery. Its performance is as good as deep learning at the same time its parameters are much less than deep learning. However, when the linear inverse problems have several times linear transformation or the ratio of input dimension to output dimension is large, the single sparse deep network's performance is poor. In this paper, we propose a cascade sparse deep network to solve the above problem. In our model, we trained two cascade sparse networks which come from the research of Gregor and LeCun's ``learned ISTA'' and ``learned CoD". As for cascade structure, it can effectively improve the performance compared to the non-cascade model. We use the proposed CLISTA and CLCoD in image sparse code predict and signal recovery. The experiment results show both algorithms performance favorably against single sparse network.
|
|
15:00-17:00, Paper TuPMP.111 | |
Semi-Supervised Feature Selection by Mutual Information Based on Kernel Density Estimation |
Xu, Siqi | Tianjin Univ |
Dai, Jianhua | Hunan Normal Univ |
Shi, Hong | Tianjin Univ |
Keywords: Dimensionality reduction, Density estimation, Semi-supervised learning
Abstract: Feature selection, which improves the computational efficiency by selecting relevant features and removing redundant features, plays an important role in data mining and machine learning. In practice, collecting completely labeled data is tough and time-consuming. Therefore, semi-supervised feature selection methods have become a necessity. However, most existing feature selection methods based on mutual information are only suitable for completely labeled data. In this paper, we utilize kernel density estimation to learn the soft labels of unlabeled instances first. As for the data whose class separation is small, we propose a concept of kernel purity to indicate the contribution of each labeled instance with regard to each class, which can reduce the negative influence of some labeled instances in predicting the soft labels of unlabeled instances. Additionally, we extend the definitions of kernel density estimation entropy and mutual information to handle partially labeled continuous data effectively. Experimental results over several datasets have demonstrated the effectiveness of the proposed method.
|
|
15:00-17:00, Paper TuPMP.112 | |
ReNN: Rule Embedded Neural Networks |
Wang, Hu | LOHAS Tech. (Beijing) Corp. Limited |
Keywords: Neural networks, Deep learning, Biological image and signal analysis
Abstract: The artificial neural network shows powerful ability of inference, but it is still criticized for lack of interpretability and prerequisite needs of big dataset. This paper proposes the Rule-embedded Neural Network (ReNN) to overcome the shortages. ReNN first makes local-based inferences to detect local patterns, and then uses rules based on domain knowledge about the local patterns to generate rule-modulated map. After that, ReNN makes global-based inferences that synthesizes the local patterns and the rule-modulated map. To solve the optimization problem caused by rules, we use a two-stage optimization strategy to train the ReNN model. By introducing rules into ReNN, we can strengthen traditional neural networks with long-term dependencies which are difficult to learn with limited empirical dataset, thus improving inference accuracy. The complexity of neural networks can be reduced since long-term dependencies are not modeled with neural connections, and thus the amount of data needed to optimize the neural networks can be reduced. Besides, inferences from ReNN can be analyzed with both local patterns and rules, and thus have better interpretability. In this paper, ReNN has been validated with a time-series detection problem.
|
|
15:00-17:00, Paper TuPMP.113 | |
Oil Price Forecasting Using Supervised GANs with Continuous Wavelet Transform Features |
Luo, Zhaojie | Kobe Univ |
Chen, Jinhui | Kobe Univ |
Cai, Xiao Jing | Kobe Univ |
Tanaka, Katsuyuki | Kobe Univ |
Takiguchi, Tetsuya | Kobe Univ |
Kinkyo, Takuji | Kobe Univ |
Hamori, Shigeyuki | Kobe Univ |
Keywords: Deep learning, Applications of pattern recognition and machine learning, Semi-supervised learning
Abstract: This paper proposes a novel approach based on a supervised Generative Adversarial Networks (GANs) model that forecasts the crude oil prices with Adaptive Scales Continuous Wavelet Transform (AS-CWT). In our study, we first confirmed that the possibility of using Continuous Wavelet Transform (CWT) to decompose an oil price series into various components, such as the sequence of days, weeks, months and years, so that the decomposed new time series can be used as inputs for a deep-learning (DL) training model. Second, we found that applying the proposed adaptive scales in the CWT method can strengthen the dependence of inputs and provide more useful information, which can improve forecasting performance. Finally, we use the supervised GANs model as a training model, which can provide more accurate forecasts than those of the Naive forecast (NF) model and other nonlinear models, such as Neural Networks (NNs), and Deep Belief Networks (DBNs), when dealing with a limited amount of oil prices data.
|
|
15:00-17:00, Paper TuPMP.114 | |
ThinNet: An Efficient Convolutional Neural Network for Object Detection |
Cao, Sen | Nanjing Univ. of Science and Tech |
Liu, Yazhou | Nanjing Univ. of Science and Tech |
Zhou, Changxin | Nanjing Univ. of Science and Tech |
Sun, Quansen | Nanjing Univ. of Science and Tech |
Lasang, Pongsak | Panasonic R&d Center Singapore |
Shen, Shengmei | Panasonic R&D Center Singapore |
Keywords: Deep learning, Object detection, Neural networks
Abstract: Great advances have been made for the deep networks, but relatively high memory and computation requirements limit their applications in the embedded device. In this paper, we introduce a class of efficient network architecture named ThinNet mainly for object detection applications on memory and computation limited platforms. The new architecture is based on two proposed modules: Front module and Tinier module. The Front module reduce the information loss from raw input images by utilizing more convolution layers with small size filters. The Tinier module use pointwise convolution layers before conventional convolution layer to decrease model size and computation, while ensuring the detection accuracy. Experimental evaluations on ImageNet classification and PASCAL VOC object detection datasets demonstrate the superior performance of ThinNet over other popular models. Our pretrained classification model(ThinNet_C) attains the same top-1 and top-5 performance as the classic AlexNet but only with 1/50th the parameters. The detection model also obtains significant improvements over other detection methods, while requiring smaller model size to achieve high performance.
|
|
15:00-17:00, Paper TuPMP.115 | |
Indoor Scene Layout Estimation from a Single Image |
Lin, Hung Jin | National Tsing Hua Univ |
Huang, Sheng-Wei | National Tsing-Hua Univ |
Lai, Shang-Hong | National Tsing Hua Univ |
Chiang, Chen-Kuo | National Chung Cheng Univ |
Keywords: Deep learning, Neural networks
Abstract: With the popularity of the hand devices and intelligent agents, many aimed to explore machine's potential in interacting with reality. Scene understanding, among the many facets of reality interaction, has gained much attention for its relevance in applications such as augmented reality (AR). Scene understanding can be partitioned into several subtasks (i.e., layout estimation, scene classification, saliency prediction, etc). In this paper, we propose a deep learning-based approach for estimating the layout of a given indoor image in real-time. Our method consists of a deep fully convolutional network, a novel layout-degeneration augmentation method, and a new training pipeline which integrate an adaptive edge penalty and smoothness terms into the training process. Unlike previous deep learning-based methods that depend on post-processing refinement (e.g., proposal ranking and optimization), our method motivates the generalization ability of the network and the smoothness of estimated layout edges without deploying post-processing techniques. Moreover, the proposed approach is time-efficient since it only takes the model one forward pass to render accurate layouts. We evaluate our method on LSUN Room Layout and Hedau dataset and obtain estimation results comparable with the state-of-the-art methods.
|
|
15:00-17:00, Paper TuPMP.116 | |
Region and Temporal Dependency Fusion for Multi-Label Action Unit Detection |
Mei, Chuanneng | Shanghai Jiao Tong Univ |
Jiang, Fei | Shanghai JiaoTong Univ |
Shen, Ruimin | Shanghai JiaoTong Univ |
Hu, Qiaoping | Shanghai Jiao Tong Univ |
Keywords: Deep learning, Emotion recognition, Multilabel learning
Abstract: Automatic Facial Action Unit (AU) detection from videos increases numerous interests over the past years due to its importance for analyzing facial expressions. Many proposed methods face challenges in detecting sparse face regions for different AUs, in the fusion of temporal dependency, and in learning multiple AUs simultaneously. In this paper, we propose a novel deep neural network architecture for AU detection to model above-mentioned challenges jointly. Firstly, to capture the region sparsity, we design a region pooling layer after a fully convolutional network to extract per-region features for each AU. Secondly, in order to integrate temporal dependency, Long Short Term Memory (LSTM) is stacked on the top of regional features. Finally, the regional features and outputs of LSTMs are utilized together to produce per-frame multi-label predictions. Experimental results on three large spontaneous AU datasets, BP4D, GFT and DISFA, have demonstrated our work outperforms state-of-the-art methods. On three datasets, our work has highest average F1 and AUC scores with an average F1 score improvement of 4.8% on BP4D, 12.7% on GFT and 14.3% on DISFA, and an average AUC score improvement of 27.4% on BP4D and 33.5% on DISFA.
|
|
15:00-17:00, Paper TuPMP.117 | |
A Method of Automatically Generating Labanotation from Human Motion Capture Data |
Wang, Jiaji | Inst. of Information Science, Beijing Jiaotong Univ |
Miao, Zhenjiang | Inst. of Information Science, Beijing Jiaotong Univ |
Keywords: Applications of pattern recognition and machine learning, Multimedia analysis, indexing and retrieval, Segmentation, features and descriptors
Abstract: This paper presents a method of automatically generating Labanotation scores from human motion capture data. Up to now the main acquisition of Labanotation is manual recording by the professionals. Our work allows the users converting human motions to Labanotation scores efficiently. The key components of our method are the analysis of motion capture data, the segmentation of motion and the recognition of each motion fragment. In motion segmentation, we make the results aligned with the beat of Labanotation to ensure the generated symbols regular and accurate. In movement recognition, according to the different properties of human motion, we deal with the data in different suitable ways. Therefore, our recognition results are more reliable than previous works. The experiments show that our work is a useful tool for converting human dance motions into Labanotation scores. Further, considering its efficiency the method can be used to record large numbers of ethnic dances that coming to the crisis of being lost.
|
|
15:00-17:00, Paper TuPMP.118 | |
Retraining: A Simple Way to Improve the Ensemble Accuracy of Deep Neural Networks for Image Classification |
Zhao, Kaikai | Kyushu Univ |
Matsukawa, Tetsu | Kyushu Univ |
Suzuki, Einoshin | Kyushu Univ |
Keywords: Ensemble learning, Deep learning, Image classification
Abstract: In this paper, we propose a new heuristic training procedure to help a deep neural network (DNN) repeatedly escape from a local minimum and move to a better local minimum. Our method repeats the following processes multiple times: randomly reinitializing the weights of the last layer of a converged DNN while preserving the weights of the remaining layers, and then conducting a new round of training. The motivation is to make the training in the new round learn better parameters based on the "good" initial parameters learned in the previous round. With multiple randomly initialized DNNs trained based on our training procedure, we can obtain an ensemble of DNNs that are more accurate and diverse compared with the normal training procedure. We call this framework "retraining". Experiments on eight DNN models show that our method generally outperforms the state-of-the-art ensemble learning methods. We also provide two variants of the retraining framework to tackle the tasks of ensemble learning in which 1) DNNs exhibit very high training accuracies (e.g., > 95%) and 2) DNNs are too computationally expensive to train.
|
|
15:00-17:00, Paper TuPMP.119 | |
Feature Selection Ensemble for Symbolic Data Classification with AHP |
Wang, Meiqian | Shanghai Univ |
Yue, Xiaodong | Shanghai Univ |
Gao, Can | The Hong Kong Pol. Univ |
Chen, Yufei | Tongji Univ |
Keywords: Symbolic learning, Data mining, Classification
Abstract: The ensemble of feature selections facilitates to improve data generalization for learning tasks. However, existing feature selection ensemble methods have the following drawbacks. First, focusing on the numerical data, the works on the feature selection ensemble for symbolic and mixed-type data are very limited. Second, the voting-based ensemble strategies tend to select the top significant features but the consistency between the ensemble result and the diverse feature selections cannot be guaranteed. Aiming to handle these problems, we propose an feature selection ensemble method based on Analytic Hierarchy Process (AHP) in this paper. The AHP-based ensemble method can integrate diverse feature selections into a consistent one under the multiple criteria of feature discernibility and independence. Moreover, the ensemble methodology is helpful to implement the feature selection on distributed data and involve domain knowledge through extending criteria. Experimental results validate that the proposed ensemble method of feature selections is effective for symbolic data classification.
|
|
15:00-17:00, Paper TuPMP.120 | |
Classifier Recommendation Using Data Complexity Measures |
Garcia, Luís Paulo | Leipzig Univ |
Lorena, Ana | Univ. Federal De Săo Paulo |
de Souto, Marcilio | Univ. of Orleans |
Ho, Tin Kam | IBM |
Keywords: Applications of pattern recognition and machine learning, Data mining, Classification
Abstract: Application of machine learning to new and unfamiliar domains calls for increasing automation in choosing a learning algorithm suitable for the data arising from each domain. Meta-learning could address this need since it has been largely used in the last years to support the recommendation of the most suitable algorithms for a new dataset. The use of complexity measures could increase the systematic comprehension over the meta-models and also allow to differentiate the performance of a set of techniques taking into account the overlap between classes imposed by feature values, the separability and distribution of the data points. In this paper we compare the effectiveness of several standard regression models in predicting the accuracies of classifiers for classification problems from the OpenML repository. We show that the models can predict the classifiers' accuracies with low mean-squared-error and identify the best classifier for a problem that results in statistically significant improvements over a randomly chosen classifier or a fixed classifier believed to be good on average.
|
|
15:00-17:00, Paper TuPMP.121 | |
Zone2Vec: Distributed Representation Learning of Urban Zones |
Du, Jiahong | Beihang Univ |
Chen, Yujun | Beihang Univ |
Wang, Yue | Beihang Univ |
Pu, Juhua | Beihang Univ |
Keywords: Applications of pattern recognition and machine learning, Data mining, Deep learning
Abstract: A metropolis consists of zones segmented by major roads. People travel between zones to conduct social activities. To analyze the characteristics of the entire city, we can explore regions' features and find region-wise latent relationships. In this paper, we propose a semantic associated zone embedding (SAZE) method using distributed representation learning. SAZE can generate zone embeddings which extract more comprehensive characteristics of each zone and fit for many urban computing tasks, rather than the task-oriented methods. To feed our SAZE, we not only consider the connections between zones via trajectory, but also embed its intrinsic properties. Furthermore, we apply SAZE to two tasks, zone classification and zone clustering visualization, respectively. For each task, we compare SAZE with other state-of-the-art baseline methods and the results have demonstrated the advantage of our model over the several methods.
|
|
15:00-17:00, Paper TuPMP.122 | |
An Efficient Budget Allocation Algorithm for Multi-Channel Advertising |
Wang, Xingfu | Univ. of Science and Tech. of China |
Li, Pengcheng | Univ. of Science and Tech. of China |
Hawbani, Ammar | Univ. of Science and Tech. of China |
Keywords: Applications of pattern recognition and machine learning, Data mining, Reinforcement learning
Abstract: Budget allocation for multi-channel in advertising deals with distributing different sub-budgets to different channels under a fixed budget periodically. However, the issue of sequential decision making, with the goal of maximizing total benefits accrued over a period of time instead of immediate benefits, has rarely been addressed. Besides, there is a lack of explicit linking between the advertising actions taken in one channel and the responses obtained in another. What's more, the budget constraint restricts the feasible space of various optimal strategies. In this paper, we resolved these challenges by invoking a novel integrated algorithm based on both the Reinforcement Learning (RL) and Multi-Choice Knapsack Problem (MCKP), termed as Q-MCKP. Besides, we proposed some improvements such as a discretization method of discretizing the costs so as to decrease the complexity of the model. Moreover, the reward function of traditional Q-learning was rebuilt by concerning an additional impact factor among channels. We conducted experiments using approximately two years of daily practical advertising data collected from an enterprise. Comparing to the state-of-arts, our experimental results demonstrated more effective in different angles.
|
|
15:00-17:00, Paper TuPMP.123 | |
Recurrent Neural Networks for Financial Time-Series Modelling |
Tsang, Gavin | Swansea Univ |
Deng, Jingjing | Swansea Univ |
Xie, Xianghua | Swansea Univ |
Keywords: Sequence modeling, Applications of pattern recognition and machine learning, Deep learning
Abstract: The prediction of financial time series data is a challenging task due to the unpredictable behaviours of investors that are influenced by a multitude of factors. In this paper, we present a novel deep Long Short-Term Memory (LSTM) based time-series data modelling for use in stock market index prediction. A dataset comprised of six market indices from around the world were chosen to demonstrate the robustness in varying market conditions with an aim to forecast the next day closing price. With experimental results showing an average annual profitability performance of up to 200%, our method demonstrates its feasibility and significant results in time-series modelling and prediction of financial markets.
|
|
15:00-17:00, Paper TuPMP.124 | |
Nonlinear Metric Learning through Geodesic Interpolation within Lie Groups |
Wang, Zhewei | Ohio Univ |
Shi, Bibo | Duke Univ |
Smith, Charles | Univ. of Kentucky |
Liu, Jundong | Ohio Univ |
Keywords: Classification, Medical image and signal analysis
Abstract: In this paper, we propose a nonlinear distance metric learning scheme based on the fusion of component linear metrics. Instead of merging displacements at each data point, our model calculates the velocities induced by the component transformations, via a geodesic interpolation on a Lie transformation group. Such velocities are later summed up to produce a global transformation that is guaranteed to be diffeomorphic. Consequently, pair-wise distances computed this way conform to a smooth and spatially varying metric, which can greatly benefit k-NN classification. Experiments on synthetic and real datasets demonstrate the effectiveness of our model.
|
|
15:00-17:00, Paper TuPMP.125 | |
Dynamic Texture Similarity Criterion |
Richtr, Radek | The Czech Acad. of Sciences, Czech Republic |
Haindl, Michael | Inst. of Information Theory and Automation |
Keywords: Performance evaluation, Sequence modeling, Model selection
Abstract: Dynamic texture similarity ranking is a challenging and still unsolved problem. Evaluation of how well are various dynamic textures similar to humans perception view is extremely difficult even for static textures and requires tedious psycho-physical experiments. Human perception principles are largely not understood yet and the dynamic texture perception is further complicated with a distinct way of perceiving spatial and temporal domains, which complicates any similarity criterion definition. We propose a novel dynamic texture criterion based on the Fourier transformation and properties of dynamic texture spatio-temporal frequencies. The presented criterion correlates well with performed psycho-physical tests while maintaining sufficient diversity and descriptiveness.
|
|
15:00-17:00, Paper TuPMP.126 | |
Cross-Dataset Data Augmentation for Convolutional Neural Networks Training |
Gasparetto, Andrea | Ca' Foscari |
Ressi, Dalila | Univ. Ca' Foscari Venezia |
Bergamasco, Filippo | Univ. Ca' Foscari Venezia |
Pistellato, Mara | Univ. Ca' Foscari Venezia |
Cosmo, Luca | Univ. Ca' Foscari Venezia |
Boschetti, Marco | Microtec Srl |
Ursella, Enrico | Microtec Srl |
Albarelli, Andrea | Univ. Ca' Foscari Di Venezia |
Keywords: Classification, Domain adaptation, Applications of pattern recognition and machine learning
Abstract: Within modern Deep Learning setups, data augmentation is the weapon of choice when dealing with narrow datasets or with a poor range of different samples. However, the benefits of data augmentation are abysmal when applied to a dataset which is inherently unable to cover all the categories to be classified with a significant number of samples. To deal with such desperate scenarios, we propose a possible last resort: Cross-Dataset Data Augmentation. That is, the creation of new samples by morphing observations from a different source into credible specimens for the training dataset. Of course specific and strict conditions must be satisfied for this trick to work. In this paper we propose a general set of strategies and rules for Cross-Dataset Data Augmentation and we demonstrate its feasibility over a concrete case study. Even without defining any new formal approach, we think that the preliminary results of our paper are worth to produce a broader discussion on this topic.
|
|
15:00-17:00, Paper TuPMP.127 | |
Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval Using Text and Sketch |
Dey, Sounak | Computer Vision Center, Univ. Autonoma De Barcelona |
Dutta, Anjan | Computer Vision Centre, Univ. Autonoma De Barcelona |
Ghosh, Suman Kumar | Computer Vision Center, Autonomous Univ. of Barcelona |
Valveny, Ernest | Computer Vision Center - Univ. Autňnoma De Barcelona |
Llados, Josep | Computer Vision Center |
Pal, Umapada | Indian Statistical Inst |
Keywords: Deep learning, Multimedia analysis, indexing and retrieval, Neural networks
Abstract: In this work we introduce a cross modal image retrieval system that allows both text and sketch as input modalities for the query. A cross-modal deep network architecture is formulated to jointly model the sketch and text input modalities as well as the the image output modality, learning a common embedding between text and images and between sketches and images. In addition, an attention model is used to selectively focus the attention on the different objects of the image, allowing for retrieval with multiple objects in the query. Experiments show that the proposed method performs the best in both single and multiple object image retrieval in standard datasets.
|
|
15:00-17:00, Paper TuPMP.128 | |
Learning Parallel Canonical Correlations for Scale-Adaptive Low Resolution Face Recognition |
Yuan, Yun-Hao | YZU |
Zhang, Zhao | YZU |
Li, Yun | Yangzhou Univ |
Qiang, Ji-Peng Qiang | Yangzhou Univ |
Li, Bin | Yangzhou Univ |
Shen, Xiao-Bo | Nanyang Tech. Univ |
Keywords: Applications of pattern recognition and machine learning, Face recognition, Multiview learning
Abstract: Low resolution is one of the main obstacles in the application of face recognition. Although many methods have been proposed to improve the problem, they assume that low-resolution (LR) face images have a uniform scale. In real scenarios, this prerequisite is very harsh. In this paper, we propose a scale-adaptive LR face recognition approach based on two-dimensional multi-set canonical correlation analysis (2DMCCA), where face image matrix does not need to be previously transformed into a vector. In the proposed method, training sets with different resolutions are treated as different views, and then projected in parallel into a latent coherent space where the consistency of multi-view face data is maximally enhanced. When a new LR face image with an arbitrary scale is input, we first transform it by using the left and right projection matrices of an appropriate training view, and then reconstruct its high resolution facial feature by neighborhood reconstruction. Experimental results show that our proposed method is more effective and efficient than several existing methods.
|
|
15:00-17:00, Paper TuPMP.129 | |
Feature-Fusion HALS-Based Algorithm for Linked CP Decomposition Model in Application to Joint EMG/MMG Signal Classification |
Fonal, Krzysztof | Wroclaw Univ. of Science and Tech |
Zdunek, Rafal | Wroclaw Univ. of Science and Tech |
Wolczowski, Andrzej | Wroclaw Univ. of Science and Tech |
Keywords: Dimensionality reduction, Applications of pattern recognition and machine learning, Classification
Abstract: Basic tensor decomposition methods, such as CANDECOM/PARAFAC (CP) or the Tucker decomposition, decompose a single tensor into a set of factors. However, there are practical cases when decomposing two or more tensors jointly is very useful, e.g. for extraction of common features. In this study, we propose a new method, which is based on the Hierarchical Alternating Least Squares (HALS) algorithm, for a joint decomposition of two tensors (coming from different measurement modalities) into three sets of factor matrices: two sets of individual features, and one set of common features. The individual features are then combined for joint classification of electromyography (EMG) and mechanomyography (MMG) signals registered for various grasping movements. The experimental results demonstrated that the proposed method significantly improved the performance of joint classification with respect to separate classification of EMG or MMG signals.
|
|
15:00-17:00, Paper TuPMP.130 | |
Quasimetric Graph Edit Distance As a Compact Quadratic Assignment Problem |
Blumenthal, David B. | Free Univ. of Bozen-Bolzano |
Daller, Evariste | Normandie Univ. UNICAEN, ENSICAEN, CNRS, GREYC |
Bougleux, Sébastien | Normandie Univ. UNICAEN, ENSICAEN, CNRS |
Brun, Luc | ENSICAEN |
Gamper, Johann | Free Univ. of Bolzano-Bozen |
Keywords: Graph matching, Classification
Abstract: The graph edit distance (GED) is a widely used distance measure for attributed graphs. It has recently been shown that the problem of computing GED, which is a NP-hard optimization problem, can be formulated as a quadratic assignment problem (QAP). This formulation is useful, since it allows to derive well performing approximative heuristics for GED from existing techniques for QAP. In this paper, we focus on the case where the edit costs that underlie GED are quasimetric. This is the case in many applications of GED. We show that, for quasimetric edit costs, it is possible to reduce the size of the corresponding QAP formulation. An empirical evaluation shows that this reduction significantly speeds up the QAP-based approximative heuristics for GED.
|
|
15:00-17:00, Paper TuPMP.131 | |
An Efficient Deep Representation Based Framework for Large-Scale Terrain Classification |
Yan, Yupeng | Univ. of Florida |
Rangarajan, Anand | Univ. of Florida |
Ranka, Sanjay | Univ. of Florida |
Keywords: Classification, Transfer learning, Segmentation, features and descriptors
Abstract: In this paper, we present a novel terrain classification framework for large-scale remote sensing images. A well-performing multi-scale superpixel tessellation based segmentation approach is employed to generate homogeneous and irregularly shaped regions, and a transfer learning technique is sequentially deployed to derive representative deep features by utilizing successful pre-trained convolutional neural network (CNN) models. This design is aimed to overcome the big problem of lacking available ground-truth data and to increase the generalization power of the multi-pixel descriptor. In the subsequent classification step, we train a fast and robust support vector machine (SVM) to assign the pixel-level labels. Its maximum-margin property can be easily combined with a graph Laplacian propagation approach. Moreover, we analyze the advantages of applying a feature selection technique to the deep CNN features which are extracted by transfer learning. In the experiments, we evaluate the whole framework based on different geographical types. Compared with other region-based classification methods, the results show that our framework can obtain state-of-the-art performance w.r.t. both classification accuracy and computational efficiency.
|
|
15:00-17:00, Paper TuPMP.132 | |
Multi-Task Micro-Expression Recognition Combining Deep and Handcrafted Features |
Hu, Chunlong | Jiangsu Univ. of Science and Tech |
Jiang, Dengbiao | Jiangsu Univ. of Science and Tech |
Zou, Haitao | Jiangsu Univ. of Science and Tech |
Zuo, Xin | Jiangsu Univ. of Science and Tech |
Shu, Yucheng | Chongqing Univ. of Posts and Telecommunications |
Keywords: Multitask learning, Deep learning, Facial expression recognition
Abstract: Micro-expression recognition is a challenging problem due to its short duration and low intensity. Most previous work on micro-expression mainly used the handcrafted features. Recently, deep learning methods were also employed for some difficult face recognition tasks. This paper presents a new framework to recognize micro-expression by combining handcrafted features and deep features. The employed handcrafted feature is called Local Gabor Binary Pattern from Three Orthogonal Panels (LGBP-TOP) feature. LGBP-TOP combines spatial and temporal analysis to encode the local facial movements. The employed deep feature is based on the Convolutional Neural Network (CNN) model trained on the micro-expression dataset. And then, the sparse multi-task learning framework with adaptive penalty term is employed to remove the irrelevant information from the combined LGBP-TOP and CNN features. The experimental evaluation is performed on two widely used micro-expression databases. The results demonstrate that the proposed approach achieves a competitive performance compared with other popular micro-expression recognition methods.
|
|
15:00-17:00, Paper TuPMP.133 | |
Discernibility Matrix-Based Ensemble Learning |
Gao, Shuaichao | Tianjin Univ |
Dai, Jianhua | Hunan Normal Univ |
Shi, Hong | Tianjin Univ |
Keywords: Ensemble learning, Dimensionality reduction, Classification
Abstract: Ensemble learning is admittedly one main paradigm in machine learning, where multiple individual learners are combined together to obtain better performance by making use of the significant diversity among the models. The source of diversity, however, is included in either samples or attributes in some ensemble methods. The concept of discernibility matrix in rough set theory can yield several different attribute reducts, i.e. a series of attribute subsets selected. The attribute subsets obtained are all satisfactory attribute reduction results and different from each other, which exactly correspond with the diversity required by ensemble learning. In this paper, we embed the discernibility matrix to ensemble learning and propose a discernibility matrixbased ensemble learning algorithm named DMEL in which attribute reduction and learning are fused together. On the one hand, the learning performance can be improved in virtue of attribute reduction. On the other hand, a series of good but different attribute subsets obtained by discernibility matrix can ensure the diversity of ensemble learning from the angles of both samples and attributes. In order to ensure the algorithm more general, the k-means discernibility matrix is put forward to deal with numerical data. Experimental results on real-life data sets demonstrate the effectiveness of the proposed method.
|
|
15:00-17:00, Paper TuPMP.134 | |
Semi-Supervised Hashing for Semi-Paired Cross-View Retrieval |
Yu, Jun | Jiangnan Univ |
Wu, Xiaojun | Jiangnan Univ |
Kittler, Josef | Univ. of Surrey |
Keywords: Multiview learning, Semi-supervised learning, Classification
Abstract: Recently, hashing techniques have gained importance in large-scale retrieval tasks because of their retrieval speed. Most of the existing cross-view frameworks assume that data are well paired. However, the fully-paired multiview situation is not universal in real applications. The aim of the method proposed in this paper is to learn the hashing function for semi-paired cross-view retrieval tasks. To utilize the label information of partial data, we propose a semi-supervised hashing learning framework which jointly performs feature extraction and classifier learning. The experimental results on two datasets show that our method outperforms several state-of-the-art methods in terms of retrieval accuracy.
|
|
15:00-17:00, Paper TuPMP.135 | |
DivGroup: A Diversified Approach to Divide Collection of Patterns into Uniform Groups |
Bandyopadhyay, Sambaran | IBM Res |
Nandanwar, Sharad | Indian Inst. of Science, Bangalore |
Deshmukh, Rishabh | Indian Inst. of Science |
Musti, Narasimha Murty | Indian Inst. of Science |
Keywords: Data mining, Clustering, Applications of pattern recognition and machine learning
Abstract: Similarity based grouping of patterns has been explored profusely under the well celebrated clustering paradigm in pattern recognition and machine learning. In clustering, objects in the same cluster are similar to each other and objects belonging to different clusters are dissimilar in a corresponding sense. However, it is not rare to come across situations where instead of a similarity based grouping, forming groups of diverse objects is needed. Resource allocation across different parts of an organization, performing cross-validation splits of dataset with class imbalance, heterogeneous or mixed ability partitioning of students, etc. are the applications of grouping which require each group to contain diverse set of patterns. Moreover, these applications also demand different groups to be similar to each other on average. In this work, we propose a generic framework of partitioning a collection of patterns into a set of groups such that the above two criteria are fulfilled. To the best of our knowledge, this is the first work to propose such a framework irrespective of any particular application. Towards this end, it turns out that finding an optimal solution to the problem that we developed is NP Hard. So we propose an approximate solution for the same. We conduct experiments on both synthetic and real world datasets to evaluate the performance of the proposed algorithm. We show the merit of the algorithm by comparing the results with some related state-of-the-art baseline methods.
|
|
15:00-17:00, Paper TuPMP.136 | |
Scalable Knn Search Approximation for Time Series Data |
Filali Boubrahimi, Soukaina | GEORGIA STATE Univ |
Ma, Ruizhe | Georgia State Univ |
Aydin, Berkay | Georgia State Univ |
Hamdi, Shah Muhammad | Georgia State Univ |
Angryk, Rafal | Georgia State Univ |
Keywords: Classification, Clustering, Data mining
Abstract: k Nearest Neighbor (kNN) is a widely used classifier in time series data analytics due to its interpretability. kNN is often referred to as a lazy learning algorithm as it does not learn any discriminative function nor does it generate any rules from the training data. Instead, kNN classifier requires a search over all the training data for classifying a single test sample which makes it computationally demanding and hard to adopt for real world application. These applications are, sometimes, time-critical such as solar flare prediction which might have irreversible impacts on Earth. Therefore, scaling the nearest neighbors search to large datasets is crucial. In this paper, we propose a new scalable methodology to mitigate the problem of kNN high computational cost by approximating the nearest neighbor(s) with the help of clustering as a preprocessing step. We tested our idea on a comprehensive set of datasets with varying sizes and labels. Our results show that the performance of our approximate technique is comparable to the exact kNN classifier with up to 10x speed-up.
|
|
15:00-17:00, Paper TuPMP.137 | |
Pyramid Embedded Generative Adversarial Network for Automated Font Generation |
Sun, Donghui | Alibaba |
Zhang, Qing | Alibaba |
Yang, Jun | Alibaba |
Keywords: Applications of pattern recognition and machine learning, Deep learning, Transfer learning
Abstract: In this paper, we investigate the Chinese font synthesis problem and propose a Pyramid Embedded Generative Adversarial Network (PEGAN) to automatically generate Chinese character images. The PEGAN consists of one generator and one discriminator. The generator is built using one encoder-decoder structure with cascaded refinement connections and mirror skip connections. The cascaded refinement connections embed a multi-scale pyramid of down-sampled original input into the encoder feature maps of different layers, and multi-scale feature maps from the encoder are connected to the corresponding feature maps in the decoder to make the mirror skip connections. Through combining the generative adversarial loss, pixel-wise loss, category loss and perceptual loss, the generator and discriminator can be trained alternately to synthesize character images. In order to verify the effectiveness of our proposed PEGAN, we first build one evaluation set, in which the characters are selected according to their stroke number and frequency of use, and then use both qualitative and quantitative metrics to measure the performance of our model comparing with the baseline method. The experimental results demonstrate the effectiveness of our proposed model, it shows the potential to automatically extend small font banks into complete ones.
|
|
15:00-17:00, Paper TuPMP.138 | |
Density-Adaptive Kernel Based Re-Ranking for Person Re-Identification |
Guo, Ruo-Pei | Beijing Univ. of Posts and Telecommunications |
Li, Chun-Guang | Beijing Univ. of Posts and Telecommunications |
Li, Yonghua | School of Information and Communication Engineering |
Lin, Jiaru | Beijing Uni. of Posts and Telecommunications |
Keywords: Classification, Density estimation, Visual surveillance
Abstract: Person Re-Identification (ReID) refers to the task of verifying the identity of a pedestrian observed from non-overlapping surveillance cameras views. Recently, it has been validated that re-ranking could bring extra performance improvements in person ReID. However, the current re-ranking approaches either require feedbacks from users or suffer from burdensome computation cost. In this paper, we propose to exploit a density-adaptive kernel technique to perform efficient and effective re-ranking for person ReID. Specifically, we present two simple yet effective re-ranking methods, termed inverse Density-Adaptive Kernel based Re-ranking (inv-DAKR) and bidirectional Density-Adaptive Kernel based Re-ranking (bi-DAKR), which are based on a smooth kernel function with a density-adaptive parameter. Experiments on six benchmark data sets confirm that our proposals are effective and efficient.
|
|
15:00-17:00, Paper TuPMP.139 | |
Semantic Image Synthesis Via Conditional Cycle-Generative Adversarial Networks |
Liu, Xiyan | Inst. of Automation, Chinese Acad. of Sciences |
Meng, Gaofeng | Inst. of Automation, Chinese Acad. of Sciences |
Xiang, Shiming | Inst. Ofautomation, Chinese Acad. of Sciences |
Pan, Chunhong | Inst. of Automation, Chinese Acad. of Sciences |
Keywords: Deep learning, Image captioning, Applications of computer vision
Abstract: Traditional approaches for semantic image synthesis mainly focus on text descriptions while ignoring the related structures and attributes in the original images. Therefore, some critical information, e.g., the style, backgrounds, objects shapes and pose, is missed in the generated images. In this paper, we propose a novel framework called Conditional Cycle-Generative Adversarial Network (CCGAN) to address this issue. Our model can generate photo-realistic images conditioned on the given text descriptions, while maintaining the attributes of the original images. The framework mainly consists of two coupled conditional adversarial networks, which are able to learn a desirable image mapping that can keep the structures and attributes in the images. We introduce a conditional cycle consistency loss to prevent the contradiction between two generators. This loss allows the generated images to retain most of the features of the original image, so as to improve the stability of network training. Moreover, benefiting from the mechanism of circular training, the proposed networks can learn the semantic information of the text much accurately. Experiments on Caltech-UCSD Bird dataset and Oxford-102 flower dataset demonstrate that the proposed method significantly outperforms the existing methods in terms of image details reconstruction and semantic information expression.
|
|
15:00-17:00, Paper TuPMP.140 | |
Generalized Fisher Discriminant Analysis As a Dimensionality Reduction Technique |
Jiang, Yuechi | The Hong Kong Pol. Univ |
Leung, Frank Hung Fat | The Hong Kong Pol. Univ |
Keywords: Dimensionality reduction, Audio and acoustic processing and analysis
Abstract: Fisher Discriminant Analysis (FDA) has been widely used as a dimensionality reduction technique. Its application varies from face recognition to speaker recognition. In the past two decades, there have been many variations on the formulation of FDA. Different variations adopt different ways to combine the between-class scatter matrix and the within-class scatter matrix, which are two basic components in FDA. In this paper, we propose the Generalized Fisher Discriminant Analysis (GFDA), which provides a general formulation for FDA. GFDA generalizes the standard FDA as well as many different variants of FDA, such as Regularized Linear Discriminant Analysis (R-LDA), Regularized Kernel Discriminant Analysis (R-KDA), Inverse Fisher Discriminant Analysis (IFDA), and Regularized Fisher Discriminant Analysis (RFDA). GFDA can also degenerate to Principal Component Analysis (PCA). Four special types of GFDA are then applied as dimensionality reduction techniques for speaker recognition, in order to investigate the performance of different variants of FDA. Basically, GFDA provides a convenient way to compare different variants of FDA by simply changing some parameters. It makes it easier to explore the roles that the between-class scatter matrix and the within-class scatter matrix play.
|
|
15:00-17:00, Paper TuPMP.141 | |
Class2Str: End to End Latent Hierarchy Learning |
Saha, Soham | International Inst. of Information Tech. Hyderabad |
Varma, Girish | IIIT Hyderabad |
Jawahar, C. V. | IIIT |
Keywords: Deep learning, Neural networks, Structured prediction
Abstract: Deep neural networks for image classification typically consists of a convolutional feature extractor followed by a fully connected classifier network. The predicted and the ground truth labels are represented as one hot vectors. Such a representation assumes that all classes are equally dissimilar. However, classes have visual similarities and often form a hierarchy. Learning this latent hierarchy explicitly in the architecture could provide invaluable insights. We propose an alternate architecture to the classifier network called the Latent Hierarchy (LH) Classifier and an end to end learned Class2Str mapping which discovers a latent hierarchy of the classes. We show that for some of the best performing architectures on CIFAR and Imagenet datasets, the proposed replacement and training by LH classifier recovers the accuracy, with a fraction of the number of parameters in the classifier part. Compared to the previous work of HDCNN, which also learns a 2 level hierarchy, we are able to learn a hierarchy at an arbitrary number of levels as well as obtain an accuracy improvement on the Imagenet classification task over them. We also verify that many visually similar classes are grouped together, under the learnt hierarchy.
|
|
15:00-17:00, Paper TuPMP.142 | |
Adaptive Tiling: Applying Fixed-Size Systolic Arrays to Sparse Convolutional Neural Networks |
Kung, H. T. | Harvard Univ |
McDanel, Bradley | Harvard Univ |
Zhang, Sai Qian | Harvard Univ |
Keywords: Deep learning
Abstract: We introduce adaptive tiling, a method of partitioning layers in a sparse convolutional neural network (CNN) into blocks of filters and channels, called tiles, each implementable with a fixed-size systolic array. By allowing a tile to adapt its size so that it can cover a large sparse area, we minimize the total number of tiles, or equivalently, the number of systolic array calls required to perform CNN inference. The proposed scheme resolves a challenge of applying systolic array architectures, traditionally designed for dense matrices, to sparse CNNs. To validate the approach, we construct a highly sparse Lasso-Mobile network by pruning MobileNet trained with an l1 regularization penalty, and demonstrate that adaptive tiling can lead to a 2-3x reduction in systolic array calls, on Lasso-Mobile, for several benchmark datasets.
|
|
TuPMOT1 |
Ballroom C, 1st Floor |
TuPMOT1 Structural Pattern Recognition (Ballroom C, 1st Floor) |
Oral Session |
|
17:00-17:20, Paper TuPMOT1.1 | |
A Game-Theoretic Hyper-Graph Matching Algorithm |
Hou, Jian | Bohai Univ |
Pelillo, Marcello | Ca' Foscari Univ |
Keywords: Graph matching, Clustering
Abstract: Feature matching is aimed to establish the correspondences between features of two sets. Aside from the well-known graph matching, hyper-graph matching is receiving increasing interests due to its ability to encode more invariance information. Existing hyper-graph matching algorithms are usually based on the maximization of matching score between correspondences. In this paper we treat the candidate matches as pure strategies and formulate the hyper-graph matching problem as a non-cooperative multi-player clustering game. Specifically, we calculate the higher-order similarity as the payoff of players in selecting the corresponding triplet of pure strategies, and find that the subset of consistent matches can be extracted by optimizing a polynomial function with a higher-order replicator dynamics over the standard simplex. With the Baum-Eagon inequality, we arrive at the equilibrium of the game and obtain a subset of consistent matches as the final matching result. Our approach is especially useful in dealing with the case that some features in the model image have no correspondences in the test image. In addition, with our approach each match is assigned a weight which reflects the relationship with other matches and can be used to enforce the one-to-one constraint. Experiments on both synthetic datasets and real images demonstrate the effectiveness of our approach.
|
|
17:20-17:40, Paper TuPMOT1.2 | |
Kernel-Weighted Graph Convolutional Network: A Deep Learning Approach for Traffic Forecasting |
Zhang, Qi | Inst. of Automation, Chinese Acad. of Sciences |
Jin, QiZhao | Inst. of Automation, Chinese Acad. of Sciences |
Chang, Jianlong | Inst. of Automation Chinese Acad. of Sciences |
Xiang, Shiming | Inst. Ofautomation, Chinese Acad. of Sciences |
Pan, Chunhong | Inst. of Automation, Chinese Acad. of Sciences |
Keywords: Deep learning, Applications of pattern recognition and machine learning, Data mining
Abstract: Traffic forecasting is of great significance and has many applications in Intelligent Traffic System (ITS). In spite of many thoughtful attempts in the past decades, this task still remains far from being solved, due to the diversity, complexity and nonlinearity of traffic situations. Technically, it can be cast on the framework of regressions with spatial-template data. Typically, one may consider to employ the Convolutional Neural Network (CNN) to achieve this goal. Unfortunately, the traditional CNN is developed for grid data. By contrast, here we are facing with non-grid traffic data points that are observed spatially at locations of interest. To this end, this paper proposes a novel Kernel-Weighted Graph Convolutional Network (KW-GCN) for traffic forecasting, which learns simultaneously a group of convolutional kernels and their linear combination weights for each of the nodes in the graph. This yields a mechanism that is able to learn the features locally and exploit the structure information of traffic road-network globally. By introducing additional parameters, our KW-GCN can relax the restriction of weight sharing in classical CNN to better handle the traffic data of non-stationarity. Furthermore, it has been illustrated that the proposed linear weighting of kernels can be viewed as the low-rank decomposition of the well-known locally-connected networks, and thus it avoids over-fitting to some degree. We apply our approach to the real-world GPS data set of about 30,000 taxis in seven months in Beijing. Experiments on both taxi-flow forecasting and road-speed forecasting demonstrate that our method significantly outperforms the state-of-the-art ones.
|
|
17:40-18:00, Paper TuPMOT1.3 | |
Depth-Based Subgraph Convolutional Neural Networks |
Zhang, Zhihong | Xiamen Univ |
Zhou, Da | Xiamen Univ |
Xu, Chuanyu | Xiamen Univ |
Wang, Beizhan | School of Software in Xiamen Univ |
Wang, Dong | Xiamen Univ |
Ren, Guijun | Opera Solutions |
Hancock, Edwin | Univ. of York |
Bai, Lu | Central Univ. of Finance and Ec |
Cui, Lixin | School of Information, Central Univ. of Finance and Ec |
Keywords: Deep learning, Neural networks, Classification
Abstract: This paper proposes a new graph convolutional neural architecture based on a depth-based representation of graph structure, called the depth-based subgraph convolutional neural networks (DS-CNNs), which integrates both the global topological and local connectivity structures within a graph. Our idea is to decompose a graph into a family of K-layer expansion subgraphs rooted at each vertex, and then a set of convolution filters are designed over these subgraphs to capture local connectivity structural information. Specifically, we commence by establishing a family of K-layer expansion subgraphs for each vertex of graph by {mapping graph to tree} procedures, which can provide global topological arrangement information contained within a graph. We then design a set of fixed-size convolution filters and integrate them with these subgraphs (depicted in Figure 1). The idea is to apply convolution filters sliding over the entire subgraphs of a vertex to extract the local features analogous to the standard convolution operation on grid data. In particular, the convolution operation captures the local structural information within the graph, and has the weight sharing property among different positions of subgraph; the pooling operation acts directly on the output of the preceding layer without any preprocessing scheme (e.g., clustering or other techniques). Experiments on three graph-structured datasets demonstrate that our model DS-CNNs are able to outperform six state-of-the-art methods at the task of node classification.
|
|
18:00-18:20, Paper TuPMOT1.4 | |
A Deep Hybrid Graph Kernel through Deep Learning Networks |
Cui, Lixin | School of Information, Central Univ. of Finance and Ec |
Bai, Lu | Central Univ. of Finance and Ec |
Rossi, Luca | Aston Univ |
Wang, Yue | Central Univ. of Finance and Ec |
Yuhang, Jiao | Central Univ. of Finance and Ec |
Hancock, Edwin | Univ. of York |
Keywords: Support vector machine and kernel methods
Abstract: In this paper, we develop a new deep hybrid graph kernel based on the depth-based matching kernel and the Weisfeiler-Lehman subtree kernel, by jointly computing a basic deep kernel that simultaneously captures the relationship between the combined kernels through deep learning networks. Specifically, for a set of graphs under investigations, we commence by computing two kernel matrices using each of the combined kernels respectively. With the two kernel matrices to hand, for each graph we use the kernel value between the graph and each of the training graphs as the graph characterisation vector. This vector can be seen as a kernel-based similarity embedding vector of the graph. We use the embedding vectors of all graphs to train a deep autoencoder network, that is optimized using Stochastic Gradient Descent together with the Deep Belief Network for pretraining. The deep representation computed through the deep learning network captures the main relationship between the depth-based matching kernel and the Weisfeiler-Lehman subtree kernel. The resulting deep hybrid graph kernel is computed by summing the original kernels together with the dot product kernel between their deep representations. We show that the deep hybrid graph kernel not only captures the joint information between the associated depth-based matching and Weisfeiler-Lehman subtree kernels, but also reflects the information content over all graphs under investigations. Experimental evaluations demonstrate the effectiveness of the proposed kernel.
|
|
TuPMOT2 |
309B, 3rd Floor |
TuPMOT2 3D Vision (309B, 3rd Floor) |
Oral Session |
|
17:00-17:20, Paper TuPMOT2.1 | |
Variational Fusion of Light Field and Photometric Stereo for Precise 3D Sensing within a Multi-Line Scan Framework |
Antensteiner, Doris | Austrian Inst. of Tech |
Štolc, Svorad | AIT Austrian Inst. of Tech. GmbH |
Pock, Thomas | Graz Univ. of Tech |
Keywords: 3D reconstruction, 3D vision, Image processing and analysis
Abstract: Recent work has shown the improved depth reconstruction by combining depth and surface normal information. In this paper, we build on the findings and introduce novel variational methods for a refined depth reconstruction for a multi-line scanner using light field and photometric stereo data. In this specific setup, the object is acquired while moving on a conveyor belt in a defined direction under the camera, which simultaneously captures light field and photometric stereo data as the object is transported. We perform our experiments on virtual and real-world data and achieve significantly improved results over state-of-the-art methods both in depth and surface normal accuracy.
|
|
17:20-17:40, Paper TuPMOT2.2 | |
Representing a Partially Observed Non-Rigid 3D Human Using Eigen-Texture and Eigen-Deformation |
Kimura, Ryosuke | Kagoshima Univ |
Sayo, Akihiko | Kyushu Univ |
Dayrit, Fabian Lorenzo | Nara Inst. of Science and Tech |
Nakashima, Yuta | Osaka Univ |
Kawasaki, Hiroshi | Kyushu Univ |
Blanco, Ambrosio | Microsoft Res. Asia |
Ikeuchi, Katsushi | The Univ. of Tokyo |
Keywords: Motion and tracking, Regression, Shape modeling and encoding
Abstract: Reconstruction of the shape and motion of humans from RGB-D is a challenging problem, and one that is receiving much attention in recent years. Recent approaches for fullbody reconstruction use a statistic shape model, which is built upon accurate full-body scans of people in skin-tight clothes, to complete the invisible parts due to occlusion. Such a statistic model may still be fit to an RGB-D measurement with loose clothes but cannot describe its deformations, such as clothing wrinkles. Observed surfaces may be reconstructed precisely from actual measurements, while we have no cues for unobserved surfaces. For full-body reconstruction with loose clothes, we propose to use lower dimensional embeddings of texture and deformation referred to as eigen-texturing and eigen-deformation, to reproduce views of even unobserved surfaces. Provided a fullbody reconstruction from a sequence of partial measurements as 3D meshes, the texture and deformation of each triangle are then embedded using eigen-decomposition. Combined with neural network-based coefficient regression, our method synthesizes the texture and deformation from arbitrary viewpoints. We evaluate our method using simulated data and visually demonstrate how our method works on real data.
|
|
17:40-18:00, Paper TuPMOT2.3 | |
Non-Rigid Reconstruction with a Single Moving RGB-D Camera |
Elanattil, Shafeeq | Queensland Univ. of Tech |
Moghadam, Peyman | CSIRO Data61 |
Sridha, Sridharan | Queensland Univ. of Tech |
Fookes, Clinton | Queensland Univ. of Tech |
Cox, Mark | CSIRO |
Keywords: 3D reconstruction, Motion and tracking, Scene understanding
Abstract: We present a novel non-rigid reconstruction method using a moving RGB-D camera. Current approaches use only non-rigid part of the scene and completely ignore the rigid background. Non-rigid parts often lack sufficient geometric and photometric information for tracking large frame-to-frame motion. Our approach uses camera pose estimated from the rigid background for foreground tracking. This enables robust foreground tracking in situations where large frame-to-frame motion occurs. Moreover, we are proposing a multi-scale deformation graph which improves non-rigid tracking without compromising the quality of the reconstruction. We are also contributing a synthetic dataset which is made publically available for evaluating non-rigid reconstruction methods. The dataset provides frame-by-frame ground truth geometry of the scene, the camera trajectory, and masks for background foreground. Experimental results show that our approach is more robust in handling larger frame-to-frame motions and provides better reconstruction compared to state-of-the-art approaches.
|
|
18:00-18:20, Paper TuPMOT2.4 | |
3D Shape Segmentation Based on Viewpoint Entropy and Projective Fully Convolutional Networks Fusing Multi-View Features |
Shui, Panpan | Nanjing Univ |
Wang, Pengyu | Nanjing Univ |
Yu, Fenggen | Nanjing Univ |
Hu, Bingyang | Nanjing Univ |
Gan, Yuan | Nanjing Univ |
Liu, Kun | Nanjing Univ |
Zhang, Yan | Nanjing Univ |
Keywords: 3D vision, Learning-based vision, Classification
Abstract: This paper introduces an architecture for segmenting 3D shapes into labeled semantic parts. Our architecture combines viewpoint selection method based on viewpoint entropy, multi-view image-based Fully Convolutional Networks (FCNs) and graph cuts optimization method to yield coherent segmentation of 3D shapes. First, we select iteratively a fixed number of perspectives with the maximum viewpoint entropy from existing viewpoints that can cover the shape's triangles, to maximally and automatically adjust the distance between the viewpoint and the center point of the shape to make sure the shape projected to fill the render window as wide as possible. Second, the image-based FCN is used for efficient view-based reasoning about 3D shape parts. In this process, global features generated by max view pooling are concatenated with every single view's feature in the fully connected layer before upsampling. Then, the multi-view FCN outputs confidence maps per part, which are then input into the projection layer that contains the mapping relationship of every shape's triangles and their projective pixels' positions in the rendered images from selected perspectives. And then, the FCN outputs are projected back onto 3D shape surfaces. and max view pooling is applied to the output of the projection layer so that every triangle of each shape has a unique probability for each label. Finally, graph cuts algorithm is implemented for the final segmentation result.
|
|
TuPMOT3 |
311A, 3rd Floor |
TuPMOT4 Biometric Analysis and Synthesis (311A, 3rd Floor) |
Oral Session |
|
17:00-17:20, Paper TuPMOT3.1 | |
Robust ECG Biometrics Using Two-Stage Model |
Wu, Bo | Shandong Univ |
Yang, Gongping | Shandong Univ |
Yang, Lu | Shandong Univ. of Finance and Ec |
Yin, Yilong | Shandong Univ |
Keywords: Other biometrics
Abstract: ECG biometrics has achieved great success on high quality ECG signals. However, it is still a challenging problem to apply ECG biometrics on mobile devices due to the low quality signals. In this paper, we propose a robust two-stage model. In first stage, we utilize 1D CNN model to remove the invalid heartbeats from ECG recording. And then, we combine the raw signal with the hidden feature of 1D CNN as the feature representation of heartbeat. In second stage, we group a certain number of heartbeat representations as input sequence. Attention-based bidirectional LSTM is used to aggregate input sequence and generate discriminative identity features for recognition. We evaluate our method on two public datasets, and the results show that our two-stage model can achieve the state-of-the-art performance compared with other existing methods.
|
|
17:20-17:40, Paper TuPMOT3.2 | |
A Noise-Robust Self-Adaptive Multitarget Speaker Detection System |
Zheng, Siqi | Ping-An Tech |
Wang, Jianzong | Ping-An Tech |
Xiao, Jing | Ping-An Tech |
Hsu, Wei-Ning | Massachusetts Inst. of Tech |
Glass, James | Massachusetts Inst. of Tech |
Keywords: Speaker recognition
Abstract: We describe a multitarget speaker detection system that provides a robust way to classify the utterance of a speaker in noisy environments. The multitarget detection problem is known to be much more difficult to tackle than single target speaker verification tasks, especially when the target set is large and the data is corrupted by noise. In this work we aim to improve the performance of our multitarget speaker detection system in real-world settings, where complicated background noise and unpredictable speaker behavior are present. We make three major improvements that contribute to our goal. First, we discover an effective noise-filtering method using GMM-based voice activity detector followed by unsupervised bottom-up clustering. Second, we incorporate a Highway-LSTM network to estimate posterior distributions of senones, replacing the traditional GMM-UBM with senone posteriors. Finally, we apply a self-adaptive approach on the classifier back-end so that our PLDA parameters and S-normalization subsets can be updated online.
|
|
17:40-18:00, Paper TuPMOT3.3 | |
Global and Local Consistent Age Generative Adversarial Networks |
Li, Peipei | Inst. of Automation Chinese Acad. of Sciences |
Hu, Yibo | Inst. of Automation Chinese Acad. of Sciences |
Li, Qi | Inst. of Automation, Chinese Acad. of Sciences |
He, Ran | Inst. of Automation, Chinese Acad. of Sciences |
Sun, Zhenan | Inst. of Automation, Chinese Acad. of Sciences |
Keywords: Face recognition, Biometric systems and applications, Other biometrics
Abstract: Abstract—Age progression/regression is a challenging task due to the complicated and non-linear transformation in human aging process. Many researches have shown that both global and local facial features are essential for face representation, but previous GAN based methods mainly focused on the global feature in age synthesis. To utilize both global and local facial information, we propose a Global and Local Consistent Age Generative Adversarial Network (GLCA-GAN). In our generator, a global network learns the whole facial structure and simulates the aging trend of the whole face, while three crucial facial patches are progressed or regressed by three local networks aiming at imitating subtle changes of crucial facial subregions. To preserve most of the details in age-attribute-irrelevant areas, our generator learns the residual face. Moreover, we employ an identity preserving loss to better preserve the identity information, as well as age preserving loss to enhance the accuracy of age synthesis. A pixel loss is also adopted to preserve detailed facial information of the input face. Our proposed method is evaluated on three face aging datasets, i.e., CACD dataset, Morph dataset and FG-NET dataset. Experimental results show appealing performance of the proposed method by comparing with the state-of-the-art.
|
|
18:00-18:20, Paper TuPMOT3.4 | |
GP-GAN: Gender Preserving GAN for Synthesizing Faces from Landmarks |
Sindagi, Vishwanath | Rutgers Univ |
Patel, Vishal | Rutgers, the State Univ. of New Jersey |
Di, Xing | Rutgers, the State Univ. of New Jersey |
Keywords: Face recognition, Biometric systems and applications
Abstract: Facial landmarks constitute the most compressed representation of faces and are known to preserve information such as pose, gender and facial structure present in the faces. Several works exist that attempt to perform high-level face-related analysis tasks based on landmarks alone without the aid of face images. In contrast, in this work, an attempt is made to tackle the inverse problem of synthesizing faces from their respective landmarks. The primary aim of this work is to demonstrate that information preserved by landmarks (gender in particular) can be further accentuated by leveraging generative models to synthesize corresponding faces. Though the problem is particularly challenging due to its ill-posed nature, we believe that successful synthesis will enable several applications such as boosting performance of high-level face related tasks using landmark points and performing dataset augmentation. To this end, a novel face-synthesis method known as Gender Preserving Generative Adversarial Network (GP-GAN) that is guided by adversarial loss, perceptual loss and a gender preserving loss is presented. Further, we propose a novel generator sub-network UDeNet for GP-GAN that leverages advantages of U-Net and DenseNet architectures. Extensive experiments and comparison with recent methods are performed to verify the effectiveness of the proposed method. Our code is available at: https://github.com/DetionDX/GP-GAN-Gender- Preserving-GAN-for-Synthesizing-Faces-from-Landmarks
|
|
TuPMOT4 |
311B, 3rd Floor |
TuPMOT5 Document Image Analysis (311B, 3rd Floor) |
Oral Session |
|
17:00-17:20, Paper TuPMOT4.1 | |
How Do Convolutional Neural Networks Learn Design? |
Jolly, Shailza | Univ. of Kaiserslautern |
Iwana, Brian Kenji | Kyushu Univ |
Kuroki, Ryohei | Kyushu Univ |
Uchida, Seiichi | Kyushu Univ |
Keywords: Applications of deep learning to document analysis, Document understanding, Document image processing
Abstract: In this paper, we aim to understand the design principles in book cover images which are carefully crafted by experts. Book covers are designed in a unique way, specific to genres which convey important information to their readers. By using Convolutional Neural Networks (CNN) to predict book genres from cover images, visual cues which distinguish genres can be highlighted and analyzed. In order to understand these visual clues contributing towards the decision of a genre, we present the application of Layer-wise Relevance Propagation (LRP) on the book cover image classification results. We use LRP to explain the pixel-wise contributions of book cover design and highlight the design elements contributing towards particular genres. In addition, with the use of state-of-the-art object and text detection methods, insights about genre-specific book cover designs are discovered.
|
|
17:20-17:40, Paper TuPMOT4.2 | |
Document Images Watermarking for Security Issue Using Fully Convolutional Networks |
Cu, Vinh Loc | La Rochelle Univ |
Burie, Jean-Christophe | Univ. of La Rochelle |
Ogier, Jean-Marc | Univ. De La Rochelle |
Keywords: Applications of document analysis, Computational document forensics, Document analysis systems
Abstract: In the literature, the watermarking schemes for document images in spatial domain mainly focus on text content, so they need to be further improved to be possibly applied on general content. In this paper, we propose a blindly invisible watermarking approach for security matter of general grayscale documents. In order to detect stable regions used for hiding secret data, we make the best use of fully convolutional networks (FCN). The FCN for the problem of document structure segmentation is adjusted to solve the problem of watermarking regions detection wherein we consider various types of segmented content regions having the same label. The segmented content regions are then known as watermarking regions. Next, the watermarking pattern is constructed with the aim of detecting potential positions where the watermarking process is carried out. Lastly, the watermarking algorithm is developed by dividing gray level values pertaining to each watermarking pattern into two groups for carrying one watermark bit. The experiments are performed on various document contents, and our approach obtains high performance in terms of imperceptibility, capacity and robustness against distortions caused by JPEG compression, geometric transformation and print-and-scan process.
|
|
17:40-18:00, Paper TuPMOT4.3 | |
Aligning Text and Document Illustrations: Towards Visually Explainable Digital Humanities |
Baraldi, Lorenzo | Univ. of Modena and Reggio Emilia |
Cornia, Marcella | Univ. of Modena and Reggio Emilia |
Grana, Costantino | Univ. Degli Studi Di Modena E Reggio Emilia |
Cucchiara, Rita | Univ. Degli Studi Di Modena E Reggio Emilia |
Keywords: Applications of deep learning to document analysis, Historical document analysis, Vision and language
Abstract: While several approaches to bring vision and language together are emerging, none of them has yet addressed the digital humanities domain, which, nevertheless, is a rich source of visual and textual data. To foster research in this direction, we investigate the learning of visual-semantic embeddings for historical document illustrations, devising both supervised and semi-supervised approaches. We exploit the joint visual-semantic embeddings to automatically align illustrations and textual elements, thus providing an automatic annotation of the visual content of a manuscript. Experiments are performed on the Borso d'Este Holy Bible, one of the most sophisticated illuminated manuscript from the Renaissance, which we manually annotate aligning every illustration with textual commentaries written by experts. Experimental results quantify the domain shift between ordinary visual-semantic datasets and the proposed one, validate the proposed strategies, and devise future works on the same line.
|
|
18:00-18:20, Paper TuPMOT4.4 | |
Staff Line Removal Using Generative Adversarial Networks |
Konwer, Aishik | Inst. of Engineering & Management |
Bhunia, Ayan Kumar | Inst. of Engineering and Management, Kolkata |
Bhowmick, Abir | Inst. of Engineering & Management |
Bhunia, Ankan Kumar | Jadavpur Univ |
Banerjee, Prithaj | Inst. of Engineering & Management |
Roy, Partha Pratim | IIT |
Pal, Umapada | Indian Statistical Inst |
Keywords: Graphics recognition, Document image processing, Applications of deep learning to document analysis
Abstract: Staff line removal is a crucial pre-processing step in Optical Music Recognition. In this paper we propose a novel approach for staff line removal, based on Generative Adversarial Networks. We convert staff line images into patches and feed them into a U-Net, used as Generator. The Generator intends to produce staff-less images at the output. Then the Discriminator does binary classification and differentiates between the generated fake staff-less image and real ground truth staff less image. For training, we use a Loss function which is a weighted combination of L2 loss and Adversarial loss. L2 loss minimizes the difference between real and fake staff-less image. Adversarial loss helps to retrieve more high quality textures in generated images. Thus our architecture supports solutions which are closer to ground truth and it reflects in our results. For evaluation we consider the ICDAR/GREC 2013 staff removal database. Our method achieves superior performance in comparison to other conventional approaches on the same dataset.
|
| |