Detectron2 for Document Layout Analysis

Detectron2

This repo contains the training configurations, code and trained models trained on PubLayNet dataset using Detectron2 implementation.
PubLayNet is a very large dataset for document layout analysis (document segmentation). It can be used to trained semantic segmentation/Object detection models.

NOTE

Models are trained on a portion of the dataset (train-0.zip, train-1.zip, train-2.zip, train-3.zip)
Trained on total 191,832 images
Models are evaluated on dev.zip (~11,000 images)
Backbone pretrained on COCO dataset is used but trained from scratch on PubLayNet dataset
Trained using Nvidia GTX 1080Ti 11GB
Trained on Windows 10

Steps to test pretrained models locally or jump to next section for docker deployment

from detectron2.data import MetadataCatalog
MetadataCatalog.get("dla_val").thing_classes = ['text', 'title', 'list', 'table', 'figure']

Then run below command for prediction on single image (change

To finish reading, please visit source site