NDLOCR-Lite is a Japanese OCR engine published by the National Diet Library of Japan. It extracts text from digitized images of books and periodicals using a combination of layout recognition (DEIM) and character recognition (PARSeq).
It can be used via the ndlocr-lite CLI command, but when integrating it into a batch processing pipeline or handling recognition results programmatically, calling it from Python as a library is more practical.
However, NDLOCR-Lite does not expose a public library API. The approach described here calls the internal entry point used by the CLI directly. This article documents how to do that.
Setup
Python 3.10 or later is required. On macOS, the system Python may be 3.9, so install 3.12 via Homebrew or similar.
brew install python@3.12
Installing with pip outside a virtual environment triggers an externally-managed-environment error, so use venv.
python3.12 -m venv venv
source venv/bin/activate
pip install git+https://github.com/ndl-lab/ndlocr-lite.git
The first install downloads ONNX models (~160MB) along with the package.
CLI Structure
The CLI entry point (ndlocr-lite command) internally calls ocr.main(). main() parses arguments with argparse and passes the resulting Namespace to ocr.process(args).
ndlocr-lite command
→ ocr.main()
→ builds argparse.Namespace
→ calls ocr.process(args)
This means you can call ocr.process() directly by constructing the argparse.Namespace yourself.
Calling from a Script
import argparse
import os
from pathlib import Path
import ocr
def run(image_path: str, output_dir: str, viz: bool = False):
base_dir = Path(ocr.__file__).parent
args = argparse.Namespace(
sourcedir=None,
sourceimg=image_path,
output=output_dir,
viz=viz,
det_weights=str(base_dir / "model" / "deim-s-1024x1024.onnx"),
det_classes=str(base_dir / "config" / "ndl.yaml"),
det_score_threshold=0.2,
det_conf_threshold=0.25,
det_iou_threshold=0.2,
simple_mode=False,
rec_weights30=str(base_dir / "model" / "parseq-ndl-16x256-30-tiny-192epoch-tegaki3.onnx"),
rec_weights50=str(base_dir / "model" / "parseq-ndl-16x384-50-tiny-146epoch-tegaki2.onnx"),
rec_weights=str(base_dir / "model" / "parseq-ndl-16x768-100-tiny-165epoch-tegaki2.onnx"),
rec_classes=str(base_dir / "config" / "NDLmoji.yaml"),
device="cpu",
)
ocr.process(args)
if __name__ == "__main__":
os.makedirs("output", exist_ok=True)
run("input.jpg", "output")
Model and config file paths are resolved relative to ocr.__file__ (the package install location), so the script works regardless of where the venv is created.
Output Files
For an input image named input.jpg, the following files are generated in the output directory.
| File | Contents |
|---|---|
input.txt | Recognized text (one line per detected text region) |
input.xml | Layout-structured XML (bounding boxes, class info, recognized text) |
input.json | JSON format (bounding box coordinates, text, confidence, vertical writing flag) |
Setting viz=True also outputs a visualization image viz_input.jpg with blue boxes overlaid on detected regions.
The JSON output has the following structure.
{
"contents": [
{
"boundingBox": [x1, y1, x2, y2],
"text": "recognized text",
"confidence": 0.95,
"isVertical": true
}
],
"imginfo": {
"img_width": 1024,
"img_height": 768,
"img_path": "input.jpg"
}
}
For batch processing pipelines, reading the JSON output is the most convenient way to pass results to downstream programs.
Customizing Arguments
The main arguments are listed below.
| Argument | Default | Description |
|---|---|---|
sourceimg | — | Path to a single image |
sourcedir | — | Path to an image directory (processes all images inside) |
output | — | Output directory |
viz | False | Output visualization image |
device | "cpu" | Set to "cuda" for GPU inference (requires onnxruntime-gpu) |
simple_mode | False | Use a single recognition model (normally switches between 3 models based on character count) |
det_score_threshold | 0.2 | Detection score threshold |
When using sourcedir, images in the directory (jpg, png, tiff, jp2, bmp) are processed in batch. Set sourceimg to None in that case.
Notes
ocr.process()is not a public API. This approach depends on internal implementation details, and the argument structure may change in future versions of NDLOCR-Lite.- The text output (
.txt) reverses line order when more than 50% of detected lines are vertical (tate-gaki), to approximate natural reading order. - Character recognition automatically switches between three models for 30, 50, and 100 characters based on predicted character count. Setting
simple_mode=Trueforces use of a single model.