Build your own NLP based search engine Using BM25
Introduction
Ever wondered how these search engines like Google and Yahoo work. And ever thought about how can they scan all through the internet and return relevant results in just About 5,43,00,000 results (0.004seconds). Well, they work on the concept of Crawling and Indexing.
- Crawling: Automated bots looks for pages that are new or updated. And stores the key information like — URL, title, keywords, and so on from the pages to be used later.
- Indexing: Data captured from crawling is analyzed like — what the page is about. Key content, images, and video files on the page are used in the process. This information is indexed and stored to be returned later for a search query.
Hence, whenever we asked them to search anything for us they are not scanning through the length and breadth of the internet but just scanning through those indexed URLs in step 2.
Well, today we would work on