Python for NLP: Working with the Gensim Library (Part 2)
This is my 11th article in the series of articles on Python for NLP and 2nd article on the Gensim library in this series. In a previous article, I provided a brief introduction to Python’s Gensim library. I explained how we can create dictionaries that map words to their corresponding numeric Ids. We further discussed how to create a bag of words corpus from dictionaries. In this article, we will study how we can perform topic modeling using the Gensim library.
I have explained how to do topic modeling using Python’s Scikit-Learn library, in my previous article. In that article, I explained how Latent Dirichlet Allocation (LDA) and Non-Negative Matrix factorization (NMF) can be used for topic modeling.
In this article, we will use the Gensim library for topic modeling. The approaches employed for topic modeling will be LDA and LSI (Latent Semantim Indexing).
Installing Required Libraries
We will perform topic modeling on the text obtained from Wikipedia articles. To scrape Wikipedia articles, we will use the Wikipedia API. To download the Wikipedia API library, execute the following command:
$ pip install