How to make you own Wiki from Wikipedia using Python

Here is a short blog I was asked to make about making a personal Wiki from Wikipedia. It shows the basic steps in text processing so I hope it will be useful for data scientists. It also requires some knowledge of MediaWiki setup on a web server, and some (not very advanced) knowledge of the Python programming language. It takes only several days to create this Wiki with Wikipedia articles if you know Python and basic ideas of data science. Here are the steps:

(1) Install MediaWiki with basic extensions and insert some templates from Wikipedia (~2 h).

(2) Download Wikipedia dump file (with the extension *.bz2) using https://dumps.wikimedia.org/  A BitTorrent program is recommended since the file is large (~17 GB).

(3) Create a Python script that reads this file and write only articles with certain categories. Read it wisely. You cannot read the entire file into the computer memory (my old computer had only 8 GB of RAM), so use other techniques to parse this file. In data science, this step is called data skimming. I wanted all categories related to data science and science. My script creates TXT file with

 

To finish reading, please visit source site