Articles About Machine Learning

Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation

We introduce dual-decoder Transformer, a new model architecture that jointly performs automatic speech recognition (ASR) and multilingual speech translation (ST). Our models are based on the original Transformer architecture (Vaswani et al., 2017) but consist of two decoders, each responsible for one task (ASR or ST)… Our major contribution lies in how these decoders interact with each other: one decoder can attend to different information sources from the other via a dual-attention mechanism. We propose two variants of these architectures […]

Read more

MARNet: Multi-Abstraction Refinement Network for 3D Point Cloud Analysis

Representation learning from 3D point clouds is challenging due to their inherent nature of permutation invariance and irregular distribution in space. Existing deep learning methods follow a hierarchical feature extraction paradigm in which high-level abstract features are derived from low-level features… However, they fail to exploit different granularity of information due to the limited interaction between these features. To this end, we propose Multi-Abstraction Refinement Network (MARNet) that ensures an effective exchange of information between multi-level features to gain local […]

Read more

Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common APIs

This article provides the motivation and overview of the Collective Knowledge framework (CK or cKnowledge). The CK concept is to decompose research projects into reusable components that encapsulate research artifacts and provide unified application programming interfaces (APIs), command-line interfaces (CLIs), meta descriptions and common automation actions for related artifacts… The CK framework is used to organize and manage research projects as a database of such components. Inspired by the USB “plug and play” approach for hardware, CK also helps to […]

Read more

Diverse Image Captioning with Context-Object Split Latent Spaces

Diverse image captioning models aim to learn one-to-many mappings that are innate to cross-domain datasets, such as of images and texts. Current methods for this task are based on generative latent variable models, e.g. VAEs with structured latent spaces… Yet, the amount of multimodality captured by prior work is limited to that of the paired training data — the true diversity of the underlying generative process is not fully captured. To address this limitation, we leverage the contextual descriptions in […]

Read more

c-lasso — a Python package for constrained sparse and robust regression and classification

We introduce c-lasso, a Python package that enables sparse and robust linear regression and classification with linear equality constraints. The underlying statistical forward model is assumed to be of the following form: [ y = X beta + sigma epsilon qquad textrm{subject to} qquad Cbeta=0 ] Here, $X in mathbb{R}^{ntimes d}$is a given design matrix and the vector $y in mathbb{R}^{n}$ is a continuous or binary response vector… The matrix $C$ is a general constraint matrix. The vector $beta in […]

Read more

Image Inpainting with Learnable Feature Imputation

A regular convolution layer applying a filter in the same way over known and unknown areas causes visual artifacts in the inpainted image. Several studies address this issue with feature re-normalization on the output of the convolution… However, these models use a significant amount of learnable parameters for feature re-normalization, or assume a binary representation of the certainty of an output. We propose (layer-wise) feature imputation of the missing input values to a convolution. In contrast to learned feature re-normalization, […]

Read more

Unsupervised Monocular Depth Learning in Dynamic Scenes

We present a method for jointly training the estimation of depth, ego-motion, and a dense 3D translation field of objects relative to the scene, with monocular photometric consistency being the sole source of supervision. We show that this apparently heavily underdetermined problem can be regularized by imposing the following prior knowledge about 3D translation fields: they are sparse, since most of the scene is static, and they tend to be constant for rigid moving objects… We show that this regularization […]

Read more

Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation

Panoptic segmentation is posed as a new popular test-bed for the state-of-the-art holistic scene understanding methods with the requirement of simultaneously segmenting both foreground things and background stuff. The state-of-the-art panoptic segmentation network exhibits high structural complexity in different network components, i.e. backbone, proposal-based foreground branch, segmentation-based background branch, and feature fusion module across branches, which heavily relies on expert knowledge and tedious trials… In this work, we propose an efficient, cooperative and highly automated framework to simultaneously search for […]

Read more

Revisiting Graph Neural Networks for Link Prediction

Graph neural networks (GNNs) have achieved great success in recent years. Three most common applications include node classification, link prediction, and graph classification… While there is rich literature on node classification and graph classification, GNNs for link prediction is relatively less studied and less understood. Two representative classes of methods exist: GAE and SEAL. GAE (Graph Autoencoder) first uses a GNN to learn node embeddings for all nodes, and then aggregates the embeddings of the source and target nodes as […]

Read more

Towards Accurate and Consistent Evaluation: A Dataset for Distantly-Supervised Relation Extraction

In recent years, distantly-supervised relation extraction has achieved a certain success by using deep neural networks. Distant Supervision (DS) can automatically generate large-scale annotated data by aligning entity pairs from Knowledge Bases (KB) to sentences… However, these DS-generated datasets inevitably have wrong labels that result in incorrect evaluation scores during testing, which may mislead the researchers. To solve this problem, we build a new dataset NYTH, where we use the DS-generated data as training data and hire annotators to label […]

Read more
1 107 108 109 110 111 226