Calling NDLOCR-Lite as a Python Library Instead of CLI

Tue, 31 Mar 2026 00:00:00 +0900

NDLOCR-Lite is a Japanese OCR engine published by the National Diet Library of Japan. It extracts text from digitized images of books and periodicals using a combination of layout recognition (DEIM) and character recognition (PARSeq).

It can be used via the ndlocr-lite CLI command, but when integrating it into a batch processing pipeline or handling recognition results programmatically, calling it from Python as a library is more practical.

However, NDLOCR-Lite does not expose a public library API. The approach described here calls the internal entry point used by the CLI directly. This article documents how to do that.

Japanese OCR on Digital Archive Systems Tech Blog

Calling NDLOCR-Lite as a Python Library Instead of CLI