iNLTK: Natural Language Toolkit for Indic Languages

We present iNLTK, an open-source NLP library consisting of pre-trained language models and out-of-the-box support for Paraphrase Generation, Textual Similarity, Sentence Embeddings, Word Embeddings, Tokenization and Text Generation in 13 Indic Languages. By using pre-trained models from iNLTK for text classification on publicly available datasets, we significantly outperform previously reported results… On these datasets, we also show that by using pre-trained models and paraphrases from iNLTK, we can achieve more than 95% of the previous best performance by using less […]