Indexing in Natural Language Processing for Information Retrieval
This article was published as a part of the Data Science Blogathon
Overview
- This blog covers GREP(Global-Regular-Expression-Print) and its drawbacks
- Then we move on to Document Term Matrix and Inverted Matrix
- Finally, we end with dynamic and distributed indexing
Global Regular Expression Print
Whenever we are dealing with a small amount of data, we can use the grep command very efficiently. It allows us to search one or more files for lines that contain a pattern.
For example-:
“grep pat check.txt”
This command will print all lines containing the text string “pat”, from the file check.txt
All the lines containing text strings such as “pat”, “patty”, “pattern”, “patties” will be printed at the output terminal.
Drawbacks of Grep command:-
- It is unable to return the document with query words appearing the maximum number of times.
- This command is in general very slow when working with a