How to Upload Files with Python’s requests Library

Introduction Python is supported by many libraries which simplify data transfer over HTTP. The requests library is one of the most popular Python packages as it’s heavily used in web scraping. It’s also popular for interacting with servers! The library makes it easy to upload data in a popular format like JSON, but also makes it easy to upload files as well. In this tutorial, we will take a look at how to upload files using Python’s requests library. The […]

Read more

Seaborn Violin Plot – Tutorial and Examples

Introduction Seaborn is one of the most widely used data visualization libraries in Python, as an extension to Matplotlib. It offers a simple, intuitive, yet highly customizable API for data visualization. In this tutorial, we’ll take a look at how to plot a Violin Plot in Seaborn. Violin plots are used to visualize data distributions, displaying the range, median, and distribution of the data. Violin plots show the same summary statistics as box plots, but they also include Kernel Density […]

Read more

Multilingualism in Natural Language Processing: Targeting Low Resource Indian Languages

Introduction A language is a systematic form of communication that can take a variety of forms. There are approximately 7,000 languages believed to be spoken across the globe. Despite this diversity, the majority of the world’s population speaks only a fraction of these languages. In Spite of such a rich diversity Languages are still evolving across time much like the society we live in. While the English language is uniform, having the distinct status of being the official language of […]

Read more

Machine Translation Weekly 63: Maximum Aposteriori vs. Minimum Bayes Risk decoding

This week I will have a look at the best paper from this year’s COLING that brings an interesting view on inference in NMT models. The title of the paper is “Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation” and its authors are from the University of Amsterdam. NMT models learn the conditional probability of the next word in a target sentence given the source sentence and the previous words in the target […]

Read more

What Is Meta-Learning in Machine Learning?

Meta-learning in machine learning refers to learning algorithms that learn from other learning algorithms. Most commonly, this means the use of machine learning algorithms that learn how to best combine the predictions from other machine learning algorithms in the field of ensemble learning. Nevertheless, meta-learning might also refer to the manual process of model selecting and algorithm tuning performed by a practitioner on a machine learning project that modern automl algorithms seek to automate. It also refers to learning across […]

Read more

Spelling Correction in Python with TextBlob

Introduction Spelling mistakes are common, and most people are used to software indicating if a mistake was made. From autocorrect on our phones, to red underlining in text editors, spell checking is an essential feature for many different products. The first program to implement spell checking was written in 1971 for the DEC PDP-10. Called SPELL, it was capable of performing only simple comparisons of words and detecting one or two letter differences. As hardware and software advanced, so have […]

Read more

Jump Search in Python

Introduction Finding the right data we need is an age-old problem before computers. As developers, we create many search algorithms to retrieve data efficiently. Search algorithms can be divided into two broad categories: sequential and interval searches. Sequential searches check each element in a data structure. Interval searches check various points of the data (called intervals), reducing the time it takes to find an item, given a sorted dataset. In this article, you will cover Jump Search in Python – […]

Read more

CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs

Abstract Cross-lingual document alignment aims to identify pairs of documents in two distinct languages that are of comparable content or translations of each other. In this paper, we exploit the signals embedded in URLs to label web documents at scale with an average precision of 94.5% across different language pairs. We mine sixty-eight snapshots of the Common Crawl corpus and identify web document pairs that are translations of each other. We release a new web dataset consisting of over 392 […]

Read more

Dense Passage Retrieval for Open-Domain Question Answering

November 16, 2020 By: Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih Abstract Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder […]

Read more

Measuring the Similarity of Grammatical Gender Systems by Comparing Partitions

Abstract A grammatical gender system divides a lexicon into a small number of relatively fixed grammatical categories. How similar are these gender systems across languages? To quantify the similarity, we define gender systems extensionally, thereby reducing the problem of comparisons between languages’ gender systems to cluster evaluation. We borrow a rich inventory of statistical tools for cluster evaluation from the field of community detection (Driver and Kroeber, 1932; Cattell, 1945), that enable us to craft novel information-theoretic metrics for measuring […]

Read more
1 698 699 700 701 702 919