-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathabstracts.txt
More file actions
41 lines (18 loc) · 18.5 KB
/
abstracts.txt
File metadata and controls
41 lines (18 loc) · 18.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
While widely acknowledged as highly effective in computer vision, multi-label MRFs with non-convex priors are difficult to optimize. To tackle this, we introduce an algorithm that iteratively approximates the original energy with an appropriately weighted surrogate energy that is easier to minimize. Our algorithm guarantees that the original energy decreases at each iteration. In particular, we consider the scenario where the global minimizer of the weighted surrogate energy can be obtained by a multi-label graph cut algorithm, and show that our algorithm then lets us handle of large variety of non-convex priors. We demonstrate the benefits of our method over state-of-the-art MRF energy minimization techniques on stereo and inpainting problems.
Multi-label submodular Markov Random Fields (MRFs) have been shown to be solvable using max-flow based on an encoding of the labels proposed by Ishikawa, in which each variable X_i is represented ℓ by nodes (where ℓ is the number of labels) arranged in a column. However, this method in general requires 2ℓ^2 edges for each pair of neighbouring variables. This makes it inapplicable to realistic problems with many variables and labels, due to excessive memory requirement. In this paper, we introduce a variant of the max-flow algorithm that requires much less storage. Consequently, our algorithm makes it possible to optimally solve multi-label submodular problems involving large numbers of variables and labels on a standard computer.
The fully connected conditional random field (CRF) with Gaussian pairwise potentials has proven popular and effective for multi-class semantic segmentation. While the energy of a dense CRF can be minimized accurately using a linear programming (LP) relaxation, the state-of-the-art algorithm is too slow to be useful in practice. To alleviate this deficiency, we introduce an efficient LP minimization algorithm for dense CRFs. To this end, we develop a proximal minimization framework, where the dual of each proximal problem is optimized via block coordinate descent. We show that each block of variables can be efficiently optimized. Specifically, for one block, the problem decomposes into significantly smaller subproblems, each of which is defined over a single pixel. For the other block, the problem is optimized via conditional gradient descent. This has two advantages: 1) the conditional gradient can be computed in a time linear in the number of pixels and labels; and 2) the optimal step size can be computed analytically. Our experiments on standard datasets provide compelling evidence that our approach outperforms all existing baselines including the previous LP based approach for dense CRFs.
Dense conditional random fields (CRFs) have become a popular framework for modelling several problems in computer vision such as stereo correspondence and multi-class semantic segmentation. By modelling long-range interactions, dense CRFs provide a labelling that captures finer detail than their sparse counterparts. Currently, the state-of-the-art algorithm performs mean-field inference using a filter-based method but fails to provide a strong theoretical guarantee on the quality of the solution.
A question naturally arises as to whether it is possible to obtain a maximum a posteriori (MAP) estimate of a dense CRF using a principled method. Within this paper, we show that this is indeed possible. We will show that, by using a filter-based method, continuous relaxations of the MAP problem can be optimised efficiently using state-of-the-art algorithms. Specifically, we will solve a quadratic programming (QP) relaxation using the Frank-Wolfe algorithm and a linear programming (LP) relaxation by developing a proximal minimisation framework. By exploiting labelling consistency in the higher-order potentials and utilising the filter-based method, we are able to formulate the above algorithms such that each iteration has a complexity linear in the number of classes and random variables. The presented algorithms can be applied to any labelling problem using a dense CRF with sparse higher-order potentials. In this paper, we use semantic segmentation as an example application as it demonstrates the ability of the algorithm to scale to dense CRFs with large dimensions. We perform experiments on the Pascal dataset to indicate that the presented algorithms are able to attain lower energies than the mean-field inference method.
We study incremental learning for the classification task, a key component for life-long learning systems. For an incremental learning algorithm, the main challenges are to update the classifier whilst preserving previous knowledge. In addition to forgetting, a well-known issue while preserving knowledge, we observe that incremental learning algorithms also suffer from a crucial problem of intransigence, its inability to update knowledge. First, we introduce two metrics to quantify forgetting and intransigence that allow us to understand, analyse, and gain better insights into the behaviour of an incremental learning algorithm. Second, we present a generalization of EWC and Path Integral, with a theoretically grounded KL-divergence based perspective. We thoroughly analyse and compare the behaviour of different incremental learning algorithms on MNIST and CIFAR-100 datasets. We obtain superior results for our method in terms of accuracy, and provide better trade-off for forgetting and intransigence.
Pruning large neural networks while maintaining the performance is often highly desirable due to the reduced space and time complexity. In existing methods, pruning is incorporated within an iterative optimization procedure with either heuristically designed pruning schedules or additional hyperparameters, undermining their utility. In this work, we present a new approach that prunes a given network once at initialization. Specifically, we introduce a saliency criterion based on connection sensitivity that identifies structurally important connections in the network for the given task even before training. This eliminates the need for both pretraining as well as the complex pruning schedule while making it robust to architecture variations. After pruning, the sparse network is trained in the standard way. Our method obtains extremely sparse networks with virtually the same accuracy as the reference network on image classification tasks and is broadly applicable to various architectures including convolutional, residual and recurrent networks. Unlike existing methods, our approach enables us to demonstrate that the retained connections are indeed relevant to the given task.
Compressing large neural networks by quantizing the parameters, while maintaining the performance is often highly desirable due to the reduced memory and time complexity. In this work, we formulate neural network quantization as a discrete labelling problem and design an efficient approximate algorithm based on the popular mean-field method. To this end, we devise a projected stochastic gradient descent algorithm and show that it is, in fact, equivalent to a proximal version of the mean-field method. Thus, we provide an MRF optimization perspective to neural network quantization, which would enable research on modelling higher-order interactions among the network parameters to design better quantization schemes. Our experiments on standard image classification datasets with convolutional and residual architectures evidence that our algorithm obtains fully-quantized networks with accuracies very close to the floating-point reference networks.
We consider move-making algorithms for energy minimization of multi-label Markov Random Fields (MRFs). Since this is not a tractable problem in general, a commonly used heuristic is to minimize over subsets of labels and variables in an iterative procedure. Such methods include α-expansion, αβ-swap and range-moves. In each iteration, a small subset of variables are active in the optimization, which diminishes their effectiveness, and increases the required number of iterations. In this paper, we present a method in which optimization can be carried out over all labels, and most, or all variables at once. Experiments show substantial improvement with respect to previous move-making algorithms.
Deep generative modelling for human body analysis is an emerging problem with many interesting applications. However, the latent space learned by such models is typically not interpretable, resulting in less flexible models. In this work, we adopt a structured semi-supervised approach and present a deep generative model for human body analysis where the body pose and the visual appearance are disentangled in the latent space. Such a disentanglement allows independent manipulation of pose and appearance, and hence enables applications such as pose-transfer without being explicitly trained for such a task. In addition, our setting allows for semi-supervised pose estimation, relaxing the need for labelled data. We demonstrate the capabilities of our generative model on the Human3.6M and on the DeepFashion datasets.
In this work, we introduce a simple and flexible method for video object segmentation based on similarity learning. The proposed method can learn to perform dense label transfer from one image to the other. More specifically, the objective is to learn a similarity metric for dense pixel-wise correspondence between two images. This learned model can then be used in a label transfer framework to propagate object annotations from a reference frame to all the subsequent frames in a video. Unlike previous methods, our similarity learning approach works fairly well across various domains, even when no domain adaptation is involved. Using the proposed approach, we achieved the second place in the first DAVIS challenge for interactive video object segmentation, in both quality and speed tracks.
Despite the availability of many Markov Random Field (MRF) optimization algorithms, their widespread usage is currently limited due to imperfect MRF modelling arising from hand-crafted model parameters. In addition to differentiability, the two main aspects that enable learning these model parameters are the forward and backward propagation time of the MRF optimization algorithm and its parallelization capabilities. In this work, we introduce two fast and differentiable message passing algorithms, namely, Iterative Semi-Global Matching Revised (ISGMR) and Parallel Tree-Reweighted Message Passing (TRWP) which are greatly sped up on GPU by exploiting massive parallelism. Specifically, ISGMR is an iterative and revised version of the standard SGM for general second-order MRFs with improved optimization effectiveness, whereas TRWP is a highly parallelizable version of Sequential TRW (TRWS) for faster optimization. Our experiments on standard stereo benchmarks demonstrate that ISGMR achieves much lower energies than SGM and TRWP is two orders of magnitude faster than TRWS without losing effectiveness in optimization. Furthermore, our CUDA implementations are at least 7 and 650 times faster than PyTorch GPU implementations in the forward and backward propagation, respectively, enabling efficient end-to-end learning with message passing.
Network pruning is a promising avenue for compressing deep neural networks. A typical approach to pruning starts by training a model and then removing redundant parameters while minimizing the impact on what is learned. Alternatively, a recent approach shows that pruning can be done at initialization prior to training, based on a saliency criterion called connection sensitivity. However, it remains unclear exactly why pruning an untrained, randomly initialized neural network is effective. In this work, by noting connection sensitivity as a form of gradient, we formally characterize initialization conditions to ensure reliable connection sensitivity measurements, which in turn yields effective pruning results. Moreover, we analyze the signal propagation properties of the resulting pruned networks and introduce a simple, data-free method to improve their trainability. Our modifications to the existing pruning at initialization method lead to improved results on all tested network models for image classification tasks. Furthermore, we empirically study the effect of supervision for pruning and demonstrate that our signal propagation perspective, combined with unsupervised pruning, can be useful in various scenarios where pruning is applied to non-standard arbitrarily-designed architectures.
Dot-product computations in convolutional layers of quantized neural networks can be efficiently performed using any scaled version of the quantized weight vectors by factorizing out the multiplicative factor and scaling the output correspondingly. However, common weight quantization strategies frequently disregard this opportunity. In this paper, we develop a quantization strategy focused on the structure of quantized representations as a whole, rather than on their individual values, to develop a training strategy capable of exploiting the aforementioned fact. Our approach, Angle Penalized neural network Scaled Quantization (APSQ) jointly introduces both the quantization and implicitly the scaling into the network optimization procedure. Moreover, by formulating the quantization as a regularization problem, our method allows for continuous gradient propagation, making it more stable than competing approaches based on gradient estimations. Specifically, scaled binary kernels on a Resnet-18 architecture allow to achieve near full-precision accuracy on ImageNet classification, obtaining 67.0% (-2.3%) with an equivalent memory footprint of 2 bits per weight while using only summations.
Quantizing large Neural Networks (NN) while maintaining the performance is highly desirable for resource-limited devices due to reduced memory and time complexity. It is usually formulated as a constrained optimization problem and optimized via a modified version of gradient descent. In this work, by interpreting the continuous parameters (unconstrained) as the dual of the quantized ones, we introduce a Mirror Descent (MD) framework for NN quantization. Specifically, we provide conditions on the projections (i.e., mapping from continuous to quantized ones) which would enable us to derive valid mirror maps and in turn the respective MD updates. Furthermore, we present a numerically stable implementation of MD that requires storing an additional set of auxiliary variables (unconstrained), and show that it is strikingly analogous to the Straight Through Estimator (STE) based method which is typically viewed as a “trick” to avoid vanishing gradients issue. Our experiments on CIFAR-10/100, TinyImageNet, and ImageNet classification datasets with VGG-16, ResNet-18, and MobileNetV2 architectures show that our MD variants obtain quantized networks with state-of-the-art performance.
Neural network quantization has become increasingly popular due to efficient memory consumption and faster computation resulting from bitwise operations on the quantized networks. Even though they exhibit excellent generalization capabilities, their robustness properties are not well-understood. In this work, we systematically study the robustness of quantized networks against gradient based adversarial attacks and demonstrate that these quantized models suffer from gradient vanishing issues and show a fake sense of security. By attributing gradient vanishing to poor forward-backward signal propagation in the trained network, we introduce a simple temperature scaling approach to mitigate this issue while preserving the decision boundary. Despite being a simple modification to existing gradient based adversarial attacks, experiments on CIFAR-10/100 datasets with VGG-16 and ResNet-18 networks demonstrate that our temperature scaled attacks obtain near-perfect success rate on quantized networks while outperforming original attacks on adversarially trained models as well as floating-point networks.
Even though 3D Convolutional Neural Networks (CNNs) are essential for most of the learning based applications involving dense 3D data, their applicability is limited due to excessive memory and computational requirements. Therefore, compressing such networks by pruning is highly desirable. However, pruning 3D CNNs is largely unexplored possibly because of the complex nature of typical pruning algorithms which embed pruning into an iterative optimization paradigm. In this work, we introduce a Resource Aware Neuron Pruning (RANP) algorithm that prunes 3D CNNs at initialization to high sparsity levels. Specifically, the core idea is to obtain an importance score for each neuron based on their sensitivity to the loss function. This neuron importance is then weighted according to the neuron resource consumption, where the resource could be defined as FLOPs or memory. Our experiments with widely used 3D-UNets on 3D segmentation datasets (ShapeNet and BraTS’18) demonstrate that RANP can lead to roughly ~4x reduction in memory and ~30x reduction in FLOPs with negligible loss in performance compared to the unpruned network. This significantly reduces the computational resources required to train 3D CNNs, and we show that the pruned network obtained by our algorithm can be easily scaled up and even transferred to another dataset for training.
Weakly Supervised Object Localization (WSOL) methods have become increasingly popular since they only require image level labels as opposed to expensive bounding box annotations required by fully supervised algorithms. Typically, a WSOL model is first trained to predict class generic objectness scores on an off-the-shelf fully supervised source dataset and then it is progressively adapted to learn the objects in the weakly supervised target dataset. In this work, we argue that learning only an objectness function is a weak form of knowledge transfer and propose to learn a classwise pairwise similarity function that directly compares two input proposals as well. The combined localization model and the estimated object annotations are jointly learned in an alternating optimization paradigm as is typically done in standard WSOL methods. In contrast to the existing work that learns pairwise similarities, our proposed approach optimizes a unified objective with convergence guarantee and it is computationally efficient for large-scale applications. Experiments on the COCO and ILSVRC 2013 detection datasets show that the performance of the localization model improves significantly with the inclusion of pairwise similarity function. For instance, in the ILSVRC dataset, the Correct Localization (CorLoc) performance improves from 72.7% to 78.2% which is a new state-of-the-art for weakly supervised object localization task.