NDLOCR-Lite is a Japanese OCR engine published by the National Diet Library of Japan. It extracts text from digitized images of books and periodicals using a combination of layout recognition (DEIM) and character recognition (PARSeq).

It can be used via the ndlocr-lite CLI command, but when integrating it into a batch processing pipeline or handling recognition results programmatically, calling it from Python as a library is more practical.

However, NDLOCR-Lite does not expose a public library API. The approach described here calls the internal entry point used by the CLI directly. This article documents how to do that.

Setup

Python 3.10 or later is required. On macOS, the system Python may be 3.9, so install 3.12 via Homebrew or similar.

brew install python@3.12

Installing with pip outside a virtual environment triggers an externally-managed-environment error, so use venv.

python3.12 -m venv venv
source venv/bin/activate
pip install git+https://github.com/ndl-lab/ndlocr-lite.git

The first install downloads ONNX models (~160MB) along with the package.

CLI Structure

The CLI entry point (ndlocr-lite command) internally calls ocr.main(). main() parses arguments with argparse and passes the resulting Namespace to ocr.process(args).

ndlocr-lite command
  → ocr.main()
    → builds argparse.Namespace
    → calls ocr.process(args)

This means you can call ocr.process() directly by constructing the argparse.Namespace yourself.

Calling from a Script

import argparse
import os
from pathlib import Path

import ocr

def run(image_path: str, output_dir: str, viz: bool = False):
    base_dir = Path(ocr.__file__).parent

    args = argparse.Namespace(
        sourcedir=None,
        sourceimg=image_path,
        output=output_dir,
        viz=viz,
        det_weights=str(base_dir / "model" / "deim-s-1024x1024.onnx"),
        det_classes=str(base_dir / "config" / "ndl.yaml"),
        det_score_threshold=0.2,
        det_conf_threshold=0.25,
        det_iou_threshold=0.2,
        simple_mode=False,
        rec_weights30=str(base_dir / "model" / "parseq-ndl-16x256-30-tiny-192epoch-tegaki3.onnx"),
        rec_weights50=str(base_dir / "model" / "parseq-ndl-16x384-50-tiny-146epoch-tegaki2.onnx"),
        rec_weights=str(base_dir / "model" / "parseq-ndl-16x768-100-tiny-165epoch-tegaki2.onnx"),
        rec_classes=str(base_dir / "config" / "NDLmoji.yaml"),
        device="cpu",
    )

    ocr.process(args)

if __name__ == "__main__":
    os.makedirs("output", exist_ok=True)
    run("input.jpg", "output")

Model and config file paths are resolved relative to ocr.__file__ (the package install location), so the script works regardless of where the venv is created.

Output Files

For an input image named input.jpg, the following files are generated in the output directory.

FileContents
input.txtRecognized text (one line per detected text region)
input.xmlLayout-structured XML (bounding boxes, class info, recognized text)
input.jsonJSON format (bounding box coordinates, text, confidence, vertical writing flag)

Setting viz=True also outputs a visualization image viz_input.jpg with blue boxes overlaid on detected regions.

The JSON output has the following structure.

{
  "contents": [
    {
      "boundingBox": [x1, y1, x2, y2],
      "text": "recognized text",
      "confidence": 0.95,
      "isVertical": true
    }
  ],
  "imginfo": {
    "img_width": 1024,
    "img_height": 768,
    "img_path": "input.jpg"
  }
}

For batch processing pipelines, reading the JSON output is the most convenient way to pass results to downstream programs.

Customizing Arguments

The main arguments are listed below.

ArgumentDefaultDescription
sourceimgPath to a single image
sourcedirPath to an image directory (processes all images inside)
outputOutput directory
vizFalseOutput visualization image
device"cpu"Set to "cuda" for GPU inference (requires onnxruntime-gpu)
simple_modeFalseUse a single recognition model (normally switches between 3 models based on character count)
det_score_threshold0.2Detection score threshold

When using sourcedir, images in the directory (jpg, png, tiff, jp2, bmp) are processed in batch. Set sourceimg to None in that case.

Notes

  • ocr.process() is not a public API. This approach depends on internal implementation details, and the argument structure may change in future versions of NDLOCR-Lite.
  • The text output (.txt) reverses line order when more than 50% of detected lines are vertical (tate-gaki), to approximate natural reading order.
  • Character recognition automatically switches between three models for 30, 50, and 100 characters based on predicted character count. Setting simple_mode=True forces use of a single model.