<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Japanese OCR on Digital Archive Systems Tech Blog</title><link>https://tech.ldas.jp/en/tags/japanese-ocr/</link><description>Recent content in Japanese OCR on Digital Archive Systems Tech Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Tue, 31 Mar 2026 00:00:00 +0900</lastBuildDate><atom:link href="https://tech.ldas.jp/en/tags/japanese-ocr/index.xml" rel="self" type="application/rss+xml"/><item><title>Calling NDLOCR-Lite as a Python Library Instead of CLI</title><link>https://tech.ldas.jp/en/posts/ndlocr-lite-python-integration/</link><pubDate>Tue, 31 Mar 2026 00:00:00 +0900</pubDate><guid>https://tech.ldas.jp/en/posts/ndlocr-lite-python-integration/</guid><description>&lt;p>&lt;a href="https://github.com/ndl-lab/ndlocr-lite">NDLOCR-Lite&lt;/a> is a Japanese OCR engine published by the National Diet Library of Japan. It extracts text from digitized images of books and periodicals using a combination of layout recognition (DEIM) and character recognition (PARSeq).&lt;/p>
&lt;p>It can be used via the &lt;code>ndlocr-lite&lt;/code> CLI command, but when integrating it into a batch processing pipeline or handling recognition results programmatically, calling it from Python as a library is more practical.&lt;/p>
&lt;p>However, NDLOCR-Lite does not expose a public library API. The approach described here calls the internal entry point used by the CLI directly. This article documents how to do that.&lt;/p></description></item></channel></rss>