A Simple Guide to Metrics for Calculating String Similarity
Introduction
One of the applications of Natural Language Processing is auto-correction and spellings checks. All of us have encountered this that if we type an incorrect or typo in the Google search engine, then the engine automatically corrects it and suggests the right word in its place. How does the engine do that? How does it know what word we wanted to write or ask? That is what we will be covering in this article. The methods available to check this, and the implementation of each method in Python.
Table of Contents
- String Similarity
- Hamming Distance
- Normalized Hamming Distance
- Levenshtein Distance
- Matrix Method for Levenshtein Distance
- Summary
String Similarity
The search engine is able to autocorrect the spellings by checking the similarity between the strings. The way to check the similarity between any data point or groups is by calculating the distance between those data points. In textual data as well, we check the similarity between the strings by calculating