Detectron2 for Document Layout Analysis
Detectron2
This repo contains the training configurations, code and trained models trained on PubLayNet dataset using Detectron2 implementation.
PubLayNet is a very large dataset for document layout analysis (document segmentation). It can be used to trained semantic segmentation/Object detection models.
NOTE
- Models are trained on a portion of the dataset (train-0.zip, train-1.zip, train-2.zip, train-3.zip)
- Trained on total 191,832 images
- Models are evaluated on dev.zip (~11,000 images)
- Backbone pretrained on COCO dataset is used but trained from scratch on PubLayNet dataset
- Trained using Nvidia GTX 1080Ti 11GB
- Trained on Windows 10
Steps to test pretrained models locally or jump to next section for docker deployment
from detectron2.data import MetadataCatalog
MetadataCatalog.get("dla_val").thing_classes = ['text', 'title', 'list', 'table', 'figure']
- Then run below command for prediction on single image (change