Calling NDLOCR-Lite as a Python Library Instead of CLI

NDLOCR-Lite is a Japanese OCR engine published by the National Diet Library of Japan. It extracts text from digitized images of books and periodicals using a combination of layout recognition (DEIM) and character recognition (PARSeq).

It can be used via the ndlocr-lite CLI command, but when integrating it into a batch processing pipeline or handling recognition results programmatically, calling it from Python as a library is more practical.

However, NDLOCR-Lite does not expose a public library API. The approach described here calls the internal entry point used by the CLI directly. This article documents how to do that.

Setup

Python 3.10 or later is required. On macOS, the system Python may be 3.9, so install 3.12 via Homebrew or similar.

brew install python@3.12

Installing with pip outside a virtual environment triggers an externally-managed-environment error, so use venv.

python3.12 -m venv venv
source venv/bin/activate
pip install git+https://github.com/ndl-lab/ndlocr-lite.git

The first install downloads ONNX models (~160MB) along with the package.

CLI Structure

The CLI entry point (ndlocr-lite command) internally calls ocr.main(). main() parses arguments with argparse and passes the resulting Namespace to ocr.process(args).

ndlocr-lite command
  → ocr.main()
    → builds argparse.Namespace
    → calls ocr.process(args)

This means you can call ocr.process() directly by constructing the argparse.Namespace yourself.

Calling from a Script

import argparse
import os
from pathlib import Path

import ocr

def run(image_path: str, output_dir: str, viz: bool = False):
    base_dir = Path(ocr.__file__).parent

    args = argparse.Namespace(
        sourcedir=None,
        sourceimg=image_path,
        output=output_dir,
        viz=viz,
        det_weights=str(base_dir / "model" / "deim-s-1024x1024.onnx"),
        det_classes=str(base_dir / "config" / "ndl.yaml"),
        det_score_threshold=0.2,
        det_conf_threshold=0.25,
        det_iou_threshold=0.2,
        simple_mode=False,
        rec_weights30=str(base_dir / "model" / "parseq-ndl-16x256-30-tiny-192epoch-tegaki3.onnx"),
        rec_weights50=str(base_dir / "model" / "parseq-ndl-16x384-50-tiny-146epoch-tegaki2.onnx"),
        rec_weights=str(base_dir / "model" / "parseq-ndl-16x768-100-tiny-165epoch-tegaki2.onnx"),
        rec_classes=str(base_dir / "config" / "NDLmoji.yaml"),
        device="cpu",
    )

    ocr.process(args)

if __name__ == "__main__":
    os.makedirs("output", exist_ok=True)
    run("input.jpg", "output")

Model and config file paths are resolved relative to ocr.__file__ (the package install location), so the script works regardless of where the venv is created.

Output Files

For an input image named input.jpg, the following files are generated in the output directory.

File	Contents
`input.txt`	Recognized text (one line per detected text region)
`input.xml`	Layout-structured XML (bounding boxes, class info, recognized text)
`input.json`	JSON format (bounding box coordinates, text, confidence, vertical writing flag)

Setting viz=True also outputs a visualization image viz_input.jpg with blue boxes overlaid on detected regions.

The JSON output has the following structure.

{
  "contents": [
    {
      "boundingBox": [x1, y1, x2, y2],
      "text": "recognized text",
      "confidence": 0.95,
      "isVertical": true
    }
  ],
  "imginfo": {
    "img_width": 1024,
    "img_height": 768,
    "img_path": "input.jpg"
  }
}

For batch processing pipelines, reading the JSON output is the most convenient way to pass results to downstream programs.

Customizing Arguments

The main arguments are listed below.

Argument	Default	Description
`sourceimg`	—	Path to a single image
`sourcedir`	—	Path to an image directory (processes all images inside)
`output`	—	Output directory
`viz`	`False`	Output visualization image
`device`	`"cpu"`	Set to `"cuda"` for GPU inference (requires `onnxruntime-gpu`)
`simple_mode`	`False`	Use a single recognition model (normally switches between 3 models based on character count)
`det_score_threshold`	`0.2`	Detection score threshold

When using sourcedir, images in the directory (jpg, png, tiff, jp2, bmp) are processed in batch. Set sourceimg to None in that case.

Notes

ocr.process() is not a public API. This approach depends on internal implementation details, and the argument structure may change in future versions of NDLOCR-Lite.
The text output (.txt) reverses line order when more than 50% of detected lines are vertical (tate-gaki), to approximate natural reading order.
Character recognition automatically switches between three models for 30, 50, and 100 characters based on predicted character count. Setting simple_mode=True forces use of a single model.

Setup#

CLI Structure#

Calling from a Script#

Output Files#

Customizing Arguments#

Notes#