The State of Multilingual AI

Models that allow interaction via natural language have become ubiquitious. Research models such as BERT and T5 have become much more accessible while the latest generation of language and multi-modal models are demonstrating increasingly powerful capabilities. At the same time, a wave of NLP startups has started to put this technology to practical use. While such language technology may be hugely impactful, recent models have mostly focused on English and a handful of other languages with large amounts of resources. […]

Read more

K-Means Clustering: A Centroid-based Algorithm

K — means clustering is a centroid-based unsupervised machine learning algorithm. Unsupervised learning uses the machine learning algorithm to analyze unlabelled data and find hidden patterns without human intervention. It’s clear from the name itself that K-means is a cluster-based algorithm. Clustering is a technique where we can group together a set    

Read more

Combining Embedding and Keyword Based Search for Improved Performance

TLDR — Ensembling keyword and embedding models for search is one of the quickest and easiest ways to improve search performance over the standard embedding based search paradigms. There is a large amount of evidence in the machine learning literature which supports that this helps with in domain performance, out of domain generalization, as well as multilingual transfer. The reason for this seems to be that sparse and dense representations of text seem to represent complimentary linguistic qualities of their […]

Read more
1 117 118 119 120 121 914