AttnGrounder: Talking to Cars with Attention

We propose Attention Grounder (AttnGrounder), a single-stage end-to-end trainable model for the task of visual grounding. Visual grounding aims to localize a specific object in an image based on a given natural language text query… Unlike previous methods that use the same text representation for every image region, we use a visual-text attention module that relates each word in the given query with every region in the corresponding image for constructing a region dependent text representation. Furthermore, for improving the […]

Read more

Manifold attack

Machine Learning in general and Deep Learning in particular has gained much interest in the recent decade and has shown significant performance improvements for many Computer Vision or Natural Language Processing tasks. In order to deal with databases which have just a small amount of training samples or to deal with models which have large amount of parameters, the regularization is indispensable… In this paper, we enforce the manifold preservation (manifold learning) from the original data into latent presentation by […]

Read more

Simple Simultaneous Ensemble Learning in Genetic Programming

Learning ensembles by bagging can substantially improve the generalization performance of low-bias high-variance estimators, including those evolved by Genetic Programming (GP). Yet, the best way to learn ensembles in GP remains to be determined… This work attempts to fill the gap between existing GP ensemble learning algorithms, which are often either simple but expensive, or efficient but complex. We propose a new algorithm that is both simple and efficient, named Simple Simultaneous Ensemble Genetic Programming (2SEGP). 2SEGP is obtained by […]

Read more

Dual-Mandate Patrols: Multi-Armed Bandits for Green Security

Conservation efforts in green security domains to protect wildlife and forests are constrained by the limited availability of defenders (i.e., patrollers), who must patrol vast areas to protect from attackers (e.g., poachers or illegal loggers). Defenders must choose how much time to spend in each region of the protected area, balancing exploration of infrequently visited regions and exploitation of known hotspots… We formulate the problem as a stochastic multi-armed bandit, where each action represents a patrol strategy, enabling us to […]

Read more

Completely Self-Supervised Crowd Counting via Distribution Matching

Dense crowd counting is a challenging task that demands millions of head annotations for training models. Though existing self-supervised approaches could learn good representations, they require some labeled data to map these features to the end task of density estimation… We mitigate this issue with the proposed paradigm of complete self-supervision, which does not need even a single labeled image. The only input required to train, apart from a large set of unlabeled crowd images, is the approximate upper limit […]

Read more

Multi-Referenced Training for Dialogue Response Generation

In open-domain dialogue response generation, a dialogue context can be continued with diverse responses, and the dialogue models should capture such one-to-many relations. In this work, we first analyze the training objective of dialogue models from the view of Kullback-Leibler divergence (KLD) and show that the gap between the real world probability distribution and the single-referenced data’s probability distribution prevents the model from learning the one-to-many relations efficiently… Then we explore approaches to multi-referenced training in two aspects. Data-wise, we […]

Read more

Issue #99 – Training Neural Machine Translation with Semantic Similarity

17 Sep20 Issue #99 – Training Neural Machine Translation with Semantic Similarity Author: Dr. Karin Sim, Machine Translation Scientist @ Iconic Introduction The standard way of training Neural Machine Translation (NMT) systems is by Maximum Likelihood Estimation (MLE), and although there have been experiments in the past to optimize systems directly in order to improve particular evaluation metrics, these were of limited success. Of course, using BLEU is not ideal due to the fact that it fails to account for […]

Read more

Self-Supervised Annotation of Seismic Images using Latent Space Factorization

Annotating seismic data is expensive, laborious and subjective due to the number of years required for seismic interpreters to attain proficiency in interpretation. In this paper, we develop a framework to automate annotating pixels of a seismic image to delineate geological structural elements given image-level labels assigned to each image… Our framework factorizes the latent space of a deep encoder-decoder network by projecting the latent space to learned sub-spaces. Using constraints in the pixel space, the seismic image is further […]

Read more

Do Response Selection Models Really Know What’s Next? Utterance Manipulation Strategies for Multi-turn Response Selection

In this paper, we study the task of selecting optimal response given user and system utterance history in retrieval-based multi-turn dialog systems. Recently, pre-trained language models (e.g., BERT, RoBERTa, and ELECTRA) have shown significant improvements in various natural language processing tasks… This and similar response selection tasks can also be solved using such language models by formulating them as dialog-response binary classification tasks. Although existing works using this approach successfully obtained state-of-the-art results, we observe that language models trained in […]

Read more

Quantifying the Preferential Direction of the Model Gradient in Adversarial Training With Projected Gradient Descent

Adversarial training, especially projected gradient descent (PGD), has been the most successful approach for improving robustness against adversarial attacks. After adversarial training, gradients of models with respect to their inputs are meaningful and interpretable by humans… However, the concept of interpretability is not mathematically well established, making it difficult to evaluate it quantitatively. We define interpretability as the alignment of the model gradient with the vector pointing toward the closest point of the support of the other class. We propose […]

Read more
1 902 903 904 905 906 912