Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
ABINet
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
The official code of ABINet (CVPR 2021, Oral).
ABINet uses a vision model and an explicit language model to recognize text in the wild, which are trained in end-to-end way. The language model (BCN) achieves bidirectional language representation in simulating cloze test, additionally utilizing iterative correction strategy.
Runtime Environment
-
We provide a pre-built docker image using the Dockerfile from
docker/Dockerfile
-
Running in Docker
$ [email protected]:FangShancheng/ABINet.git $ docker run --gpus all --rm -ti --ipc=host -v $(pwd)/ABINet:/app fangshancheng/fastai:torch1.1 /bin/bash
-
(Untested) Or using the dependencies
pip install -r requirements.txt
Datasets
-
Training datasets