TextCaps: a Dataset for Image Captioning with Reading Comprehension

Abstract Image descriptions can help visually impaired people to quickly understand the image content. While we made significant progress in automatically describing images and optical character recognition, current approaches are unable to include written text in their descriptions, although text is omnipresent in human environments and frequently critical to understand our surroundings. To study how to comprehend text in the context of an image we collect a novel dataset, TextCaps, with 145k captions for 28k images. Our dataset challenges a […]

Read more

Spatially Aware Multimodal Transformers for TextVQA

August 23, 2020 By: Yash Kant, Dhruv Batra, Peter Anderson, Alexander Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal Abstract Textual cues are essential for everyday tasks like buying groceries and using public transport. To develop this assistive technology, we study the TextVQA task, i.e., reasoning about text in images to answer a question. Existing approaches are limited in their use of spatial relations and rely on fully-connected transformer-based architectures to implicitly learn the spatial structure of a scene. In contrast, […]

Read more

Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline

Abstract Prior work in visual dialog has focused on training deep neural models on VisDial in isolation. Instead, we present an approach to leverage pretraining on related vision-language datasets before transferring to visual dialog. We adapt the recently proposed ViLBERT model for multi-turn visually-grounded conversations. Our model is pretrained on the Conceptual Captions and Visual Question Answering datasets, and finetuned on VisDial. Our best single model outperforms prior published work by > 1% absolute on NDCG and MRR. Next, we […]

Read more

Information Extraction of Clinical Trial Eligibility Criteria

Abstract Clinical trials predicate subject eligibility on a diversity of criteria ranging from patient demographics to food allergies. Trials post their requirements as semantically complex, unstructured free-text. Formalizing trial criteria to a computer-interpretable syntax would facilitate eligibility determination. In this paper, we investigate an information extraction (IE) approach for grounding criteria from trials in ClinicalTrials.gov to a shared knowledge base. We frame the problem as a novel knowledge base population task, and implement a solution combining machine learning and context […]

Read more

Unsupervised Quality Estimation for Neural Machine Translation

August 31, 2020 By: Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Frédéric Blain, Francisco (Paco) Guzman, Mark Fishel, Nikolaos Aletras, Vishrav Chaudhary, Lucia Specia Abstract Quality Estimation (QE) is an important component in making Machine Translation (MT) useful in real-world applications, as it is aimed to inform the user on the quality of the MT output at test time. Existing approaches require large amounts of expert annotated data, computation and time for training. As an alternative, we devise an unsupervised approach […]

Read more

Lemotif: An Affective Visual Journal Using Deep Neural Networks

Abstract We present Lemotif, an integrated natural language processing and image generation system that uses machine learning to (1) parse a text-based input journal entry describing the user’s day for salient themes and emotions and (2) visualize the detected themes and emotions in creative and appealing image motifs. Synthesizing approaches from artificial intelligence and psychology, Lemotif acts as an affective visual journal, encouraging users to regularly write and reflect on their daily experiences through visual reinforcement. By making patterns in […]

Read more

Weak-Attention Suppression For Transformer Based Speech Recognition

Abstract Transformers, originally proposed for natural language processing (NLP) tasks, have recently achieved great success in automatic speech recognition (ASR). However, adjacent acoustic units (i.e., frames) are highly correlated, and long-distance dependencies between them are weak, unlike text units. It suggests that ASR will likely benefit from sparse and localized attention. In this paper, we propose Weak-Attention Suppression (WAS), a method that dynamically induces sparsity in attention probabilities. We demonstrate that WAS leads to consistent Word Error Rate (WER) improvement […]

Read more

How to Develop LARS Regression Models in Python

Regression is a modeling task that involves predicting a numeric value given an input. Linear regression is the standard algorithm for regression that assumes a linear relationship between inputs and the target variable. An extension to linear regression involves adding penalties to the loss function during training that encourage simpler models that have smaller coefficient values. These extensions are referred to as regularized linear regression or penalized linear regression. Lasso Regression is a popular type of regularized linear regression that […]

Read more

Machine Translation Weekly 55: Social Polarization Seen through Word Embeddings

This week, I am going to have a closer look at a paper that creatively uses methods for bilingual word embeddings for social media analysis. The paper’s preprint was uploaded last week on arXiv. The title is “We Don’t Speak the Same Language: Interpreting Polarization through Machine Translation,” and most of the authors CMU in Pittsburgh. The paper’s central assumption is that the polarization of different opinion groups, especially in the USA, went so far that some words have totally […]

Read more
1 782 783 784 785 786 942