SpaCy3Urdu: run command to setup assets(dataset from UD)
Project setup
run command to setup assets(dataset from UD)
It uses project.yml
file and download the data from UD GitHub repository.
Download vectors
Download fasttext vectors
wget https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.ur.300.vec.gz
Use these vectors to prune it so that model size is reduced. I’m currently using 100000 vectors for training the model.
mkdir vectors
python -m spacy init vectors ur cc.ur.300.vec.gz ./vectors --truncate 100000 --name ur_model.vectors
Train the model
Now run the command to train the tagger and parser for Urdu language.