Hugging Face – Issue 7 – Feb 9th 2021

News New Year, New Website! Our vision for the future of machine learning is one step closer to reality thanks to the 1,000+ researchers & open-source contributors, thousands of companies & the fantastic Hugging Face team! Last month, we announced the launch of the latest version of huggingface.co and we couldn’t be more proud. 🔥 Play live with >10 billion parameters models for tasks including translation, NER, zero-shot classification, and

Read more

Introduction to Hugging Face’s Transformers v4.3.0 and its First Automatic Speech Recognition Model – Wav2Vec2

Overview Hugging Face has released Transformers v4.3.0 and it introduces the first Automatic Speech Recognition model to the library: Wav2Vec2 Using one hour of labeled data, Wav2Vec2 outperforms the previous state of the art on the 100-hour subset while using 100 times less labeled data Using just ten minutes of labeled data and pre-training on 53k hours of unlabeled data Wav2Vec2 achieves 4.8/8.2 WER Understand Wav2Vec2 implementation using transformers library on audio to text generation   Introduction Transformers has been […]

Read more

Function Optimization With SciPy

Optimization involves finding the inputs to an objective function that result in the minimum or maximum output of the function. The open-source Python library for scientific computing called SciPy provides a suite of optimization algorithms. Many of the algorithms are used as a building block in other algorithms, most notably machine learning algorithms in the scikit-learn library. These optimization algorithms can be used directly in a standalone manner to optimize a function. Most notably, algorithms for local search and algorithms […]

Read more

Speller100: Zero-shot spelling correction at scale for 100-plus languages

At Microsoft Bing, our mission is to delight users everywhere with the best search experience. We serve a diverse set of customers all over the planet who issue queries in over 100 languages. In search we’ve found about 15% of queries submitted by customers have misspellings. When queries are misspelled, we match the wrong set of documents and trigger incorrect answers, which can produce a suboptimal results page for our customers. Therefore, spelling correction is the very first component in […]

Read more

Summarising Historical Text in Modern Languages

de â„–11 Story Die Arbeiten im hiesigen Arsenal haben schon seit langer Zeit nachgelassen, und seitdem die Perser so sehr von den Russen geschlagen worden sind, hört man ĂĽberhaupt nichts mehr von KriegsrĂĽstungen in den tĂĽrkischen Provinzen. Die Pforte hatte nicht geglaubt, daĂź RuĂźland eine so starke Macht nach den Ufern des kaspischen Meeres abschicken, und daĂź der Krieg mit den Persern sobald eine so entscheidende Wendung nehmen wĂĽrde. Alle kriegerischen Nachrichten, die wir jetzt aus den tĂĽrkischen Provinzen erhalten, […]

Read more

Spark NLP: Natural Language Understanding at Scale

Abstract Spark NLP is a Natural Language Processing (NLP) library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines that can scale easily in a distributed environment. Spark NLP comes with 1100+ pretrained pipelines and models in more than 192+ languages. It supports nearly all the NLP tasks and modules that can be used seamlessly in a cluster. Downloaded more than 2.7 million times and experiencing 9x growth since January […]

Read more

Attention Can Reflect Syntactic Structure (If You Let It)

Abstract Since the popularization of the Transformer as a general-purpose feature encoder for NLP, many studies have attempted to decode linguistic structure from its novel multi-head attention mechanism. However, much of such work focused almost exclusively on English — a language with rigid word order and a lack of inflectional morphology. In this study, we present decoding experiments for multilingual BERT across 18 languages in order to test the generalizability of the claim that dependency syntax is reflected in attention patterns. We […]

Read more

“Laughing at you or with you”: The Role of Sarcasm in Shaping the Disagreement Space

Frans Hendrik van Eemeren, Rob Grootendorst, Sally Jackson, Scott Jacobs, et al. 1993. Reconstructing argumentative discourse. University of Alabama Press. Rob Abbott, Marilyn Walker, Pranav Anand, Jean E Fox Tree, Robeson Bowmani, and Joseph King. 2011. How can you say such things?!?: Recognizing disagreement in informal political argument. In Proceedings of the Workshop on Languages in Social Media, pages 2–11. Association for Computational Linguistics. Marilyn A Walker, Jean E Fox Tree, Pranav Anand, Rob Abbott, and Joseph King. 2012b.    

Read more

Syntactic Nuclei in Dependency Parsing – A Multilingual Exploration

In the previous sections, we have shown how syntactic nuclei can be identified in the UD annotation and how transition-based parsers can be made sensitive to these structures in their internal representations through the use of nucleus composition. We now proceed to a set of experiments investigating the impact of nucleus composition on a diverse selection of languages. 5.1 Experimental Settings We use UUParser (de Lhoneux et al., 2017, Smith    

Read more

Does injecting linguistic structure into language models lead to better alignment with brain recordings?

Figure 1 shows a high-level outline of our experimental design, which aims to establish whether injecting structure derived from a variety of syntacto-semantic formalisms into neural language model representations can lead to better correspondence with human brain activation data. We utilize fMRI recordings of human subjects reading a set of texts. Representations of these texts are then derived from the activations of the language models. Following Gauthier and Levy (

Read more
1 686 687 688 689 690 919