Articles About Machine Learning

Code Adam Gradient Descent Optimization From Scratch

Gradient descent is an optimization algorithm that follows the negative gradient of an objective function in order to locate the minimum of the function. A limitation of gradient descent is that a single step size (learning rate) is used for all input variables. Extensions to gradient descent like AdaGrad and RMSProp update the algorithm to use a separate step size for each input variable but may result in a step size that rapidly decreases to very small values. The Adaptive […]

Read more

Learning to Segment Rigid Motions from Two Frames

Appearance-based detectors achieve remarkable performance on common scenes, but tend to fail for scenarios lack of training data. Geometric motion segmentation algorithms, however, generalize to novel scenes, but have yet to achieve comparable performance to appearance-based ones, due to noisy motion estimations and degenerate motion configurations… To combine the best of both worlds, we propose a modular network, whose architecture is motivated by a geometric analysis of what independent object motions can be recovered from an egomotion field. It takes […]

Read more

RepVGG: Making VGG-style ConvNets Great Again

We present a simple but powerful architecture of convolutional neural network, which has a VGG-like inference-time body composed of nothing but a stack of 3×3 convolution and ReLU, while the training-time model has a multi-branch topology. Such decoupling of the training-time and inference-time architecture is realized by a structural re-parameterization technique so that the model is named RepVGG… On ImageNet, RepVGG reaches over 80% top-1 accuracy, which is the first time for a plain model, to the best of our […]

Read more

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

In deep learning, models typically reuse the same parameters for all inputs. Mixture of Experts (MoE) defies this and instead selects different parameters for each incoming example… The result is a sparsely-activated model — with outrageous numbers of parameters — but a constant computational cost. However, despite several notable successes of MoE, widespread adoption has been hindered by complexity, communication costs and training instability — we address these with the Switch Transformer. We simplify the MoE routing algorithm and design […]

Read more

Analysis of skin lesion images with deep learning

Skin cancer is the most common cancer worldwide, with melanoma being the deadliest form. Dermoscopy is a skin imaging modality that has shown an improvement in the diagnosis of skin cancer compared to visual examination without support… We evaluate the current state of the art in the classification of dermoscopic images based on the ISIC-2019 Challenge for the classification of skin lesions and current literature. Various deep neural network architectures pre-trained on the ImageNet data set are adapted to a […]

Read more

Distributed Double Machine Learning with a Serverless Architecture

Serverless cloud computing is predicted to be the dominating and default architecture of cloud computing in the coming decade (Berkley View on Serverless Computing, 2019). In this paper we explore serverless cloud computing for double machine learning… Being based on repeated cross-fitting, double machine learning is particularly well suited to exploit the enormous elasticity of serverless computing. It allows to get fast on-demand estimations without additional cloud maintenance effort. We provide a prototype implementation DoubleML-Serverless written in Python that implements […]

Read more

Preconditioned training of normalizing flows for variational inference in inverse problems

Obtaining samples from the posterior distribution of inverse problems with expensive forward operators is challenging especially when the unknowns involve the strongly heterogeneous Earth. To meet these challenges, we propose a preconditioning scheme involving a conditional normalizing flow (NF) capable of sampling from a low-fidelity posterior distribution directly… This conditional NF is used to speed up the training of the high-fidelity objective involving minimization of the Kullback-Leibler divergence between the predicted and the desired high-fidelity posterior density for indirect measurements […]

Read more

Variational Embeddings for Community Detection and Node Representation

In this paper, we study how to simultaneously learn two highly correlated tasks of graph analysis, i.e., community detection and node representation learning. We propose an efficient generative model called VECoDeR for jointly learning Variational Embeddings for Community Detection and node Representation… VECoDeR assumes that every node can be a member of one or more communities. The node embeddings are learned in such a way that connected nodes are not only “closer” to each other but also share similar community […]

Read more

Unchain the Search Space with Hierarchical Differentiable Architecture Search

Differentiable architecture search (DAS) has made great progress in searching for high-performance architectures with reduced computational cost. However, DAS-based methods mainly focus on searching for a repeatable cell structure, which is then stacked sequentially in multiple stages to form the networks… This configuration significantly reduces the search space, and ignores the importance of connections between the cells. To overcome this limitation, in this paper, we propose a Hierarchical Differentiable Architecture Search (H-DAS) that performs architecture search both at the cell […]

Read more

Controllable Guarantees for Fair Outcomes via Contrastive Information Estimation

Controlling bias in training datasets is vital for ensuring equal treatment, or parity, between different groups in downstream applications. A naive solution is to transform the data so that it is statistically independent of group membership, but this may throw away too much information when a reasonable compromise between fairness and accuracy is desired… Another common approach is to limit the ability of a particular adversary who seeks to maximize parity. Unfortunately, representations produced by adversarial approaches may still retain […]

Read more
1 71 72 73 74 75 226