pdf_sprinkles: sprinkles text in your PDFs
pdf_sprinkles
remotely OCRs a PDF with Google Cloud Document AI, and returns the result as a PDF with searchable text.
It runs on the command-line or as a web server. The server version can be deployed to App Engine easily.
pdf_sprinkles
has only been tested with English-language text, but it should work for most European languages supported by the Document AI API today. It is known not to work with RTL languages and with CJK scripts currently.
Installation
pdf_sprinkles
is experimental, so it’s not packaged yet. To install:
-
Set up Google Cloud Document AI, following the quickstart.
-
Clone this repository and
cd
to it. -
Create a