Speller100: Zero-shot spelling correction at scale for 100-plus languages

At Microsoft Bing, our mission is to delight users everywhere with the best search experience. We serve a diverse set of customers all over the planet who issue queries in over 100 languages. In search we’ve found about 15% of queries submitted by customers have misspellings. When queries are misspelled, we match the wrong set of documents and trigger incorrect answers, which can produce a suboptimal results page for our customers. Therefore, spelling correction is the very first component in […]

Read more

Summarising Historical Text in Modern Languages

de №11 Story Die Arbeiten im hiesigen Arsenal haben schon seit langer Zeit nachgelassen, und seitdem die Perser so sehr von den Russen geschlagen worden sind, hört man überhaupt nichts mehr von Kriegsrüstungen in den türkischen Provinzen. Die Pforte hatte nicht geglaubt, daß Rußland eine so starke Macht nach den Ufern des kaspischen Meeres abschicken, und daß der Krieg mit den Persern sobald eine so entscheidende Wendung nehmen würde. Alle kriegerischen Nachrichten, die wir jetzt aus den türkischen Provinzen erhalten, […]

Read more

Spark NLP: Natural Language Understanding at Scale

Abstract Spark NLP is a Natural Language Processing (NLP) library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines that can scale easily in a distributed environment. Spark NLP comes with 1100+ pretrained pipelines and models in more than 192+ languages. It supports nearly all the NLP tasks and modules that can be used seamlessly in a cluster. Downloaded more than 2.7 million times and experiencing 9x growth since January […]

Read more

Attention Can Reflect Syntactic Structure (If You Let It)

Abstract Since the popularization of the Transformer as a general-purpose feature encoder for NLP, many studies have attempted to decode linguistic structure from its novel multi-head attention mechanism. However, much of such work focused almost exclusively on English — a language with rigid word order and a lack of inflectional morphology. In this study, we present decoding experiments for multilingual BERT across 18 languages in order to test the generalizability of the claim that dependency syntax is reflected in attention patterns. We […]

Read more

“Laughing at you or with you”: The Role of Sarcasm in Shaping the Disagreement Space

Frans Hendrik van Eemeren, Rob Grootendorst, Sally Jackson, Scott Jacobs, et al. 1993. Reconstructing argumentative discourse. University of Alabama Press. Rob Abbott, Marilyn Walker, Pranav Anand, Jean E Fox Tree, Robeson Bowmani, and Joseph King. 2011. How can you say such things?!?: Recognizing disagreement in informal political argument. In Proceedings of the Workshop on Languages in Social Media, pages 2–11. Association for Computational Linguistics. Marilyn A Walker, Jean E Fox Tree, Pranav Anand, Rob Abbott, and Joseph King. 2012b.    

Read more

Syntactic Nuclei in Dependency Parsing – A Multilingual Exploration

In the previous sections, we have shown how syntactic nuclei can be identified in the UD annotation and how transition-based parsers can be made sensitive to these structures in their internal representations through the use of nucleus composition. We now proceed to a set of experiments investigating the impact of nucleus composition on a diverse selection of languages. 5.1 Experimental Settings We use UUParser (de Lhoneux et al., 2017, Smith    

Read more

Does injecting linguistic structure into language models lead to better alignment with brain recordings?

Figure 1 shows a high-level outline of our experimental design, which aims to establish whether injecting structure derived from a variety of syntacto-semantic formalisms into neural language model representations can lead to better correspondence with human brain activation data. We utilize fMRI recordings of human subjects reading a set of texts. Representations of these texts are then derived from the activations of the language models. Following Gauthier and Levy (

Read more

Transition-based Graph Decoder for Neural Machine Translation

Abstract While a number of works showed gains from incorporating source-side symbolic syntactic and semantic structure into neural machine translation (NMT), much fewer works addressed the decoding of such structure. We propose a general Transformer-based approach for tree and graph decoding based on generating a sequence of transitions, inspired by a similar approach that uses RNNs by Dyer et al. (2016). Experiments with using the proposed decoder with Universal Dependencies syntax on English-German, German-English and English-Russian show improved performance over […]

Read more

NLPBK at VLSP-2020 shared task: Compose transformer pretrained models for Reliable Intelligence Identification on Social network

In Our model, we generate representations of post message in three methods: tokenized syllables-level text through Bert4News, tokenized word-level text through PhoBERT and tokenized syllables-level text through XLM. We simply concatenate both this three representations with the corresponding post metadata features. This can be considered as a naive model but are proved that can improve performance of systems (Tu et al. (2017), Thanh et al. (

Read more
1 679 680 681 682 683 912