Home Articles Books Search About
RSS 日本語

Calling NDLOCR-Lite as a Python Library Instead of CLI

NDLOCR-Lite is a Japanese OCR engine published by the National Diet Library of Japan. It extracts text from digitized images of books and periodicals using a combination of layout recognition (DEIM) and character recognition (PARSeq). It can be used via the ndlocr-lite CLI command, but when integrating it into a batch processing pipeline or handling recognition results programmatically, calling it from Python as a library is more practical. However, NDLOCR-Lite does not expose a public library API. The approach described here calls the internal entry point used by the CLI directly. This article documents how to do that. ...