Overview
This is a memo of running NDL Classical Japanese OCR on an Amazon EC2 CPU environment. The advantage is that it can be run without preparing an expensive GPU environment, but please note that it takes about 30 seconds to 1 minute per image.
The following article was referenced when building this environment.
https://qiita.com/relu/items/e882e23a9bd07243211b
Instance
Select Ubuntu from Quick Start.

For the instance type, I recommend t2.medium or higher. Errors occurred with smaller instances.
Server Configuration
Log in via SSH and execute the following.
sudo apt-get update && sudo apt-get upgrade -y
sudo apt -y install build-essential
sudo apt -y install libgl1-mesa-dev libglib2.0-0
sudo apt -y install unzip
sudo apt install -y python3-pip
sudo apt install -y python3.10-venv
python3 -m venv app
source app/bin/activate
pip install --upgrade pip
git clone https://github.com/ndl-lab/ndlkotenocr_cli.git
cd ndlkotenocr_cli
vi requirements.txt
Open requirements.txt and remove the version specification for scikit-image. Also add torch and torchvision.
click
lmdb==1.2.1
natsort==7.1.1
nltk==3.6.6
numpy==1.22.4
albumentations==1.2.1
opencv-python==4.6.0.66
protobuf==3.19.6
pyyaml
scikit-image # scikit-image==0.16.2
scipy==1.7.3
lightgbm==3.3.2
transformers==4.19.1
pandas==1.3.5
mmcls==0.23.1
mmdet==2.25.0
datasets==2.2.1
jiwer==2.3.0
wheel
torch
torchvision
Continue executing the following.
pip install -r requirements.txt
Install mmcv-full and download pre-trained models. Prepare and execute a file like the following.
import os
import torch
import json
print(torch.__version__)
torch_ver, cuda_ver = torch.__version__.split('+')
PROJECT_DIR = os.getcwd()
os.system(f'pip install mmcv-full==1.5.3 -f https://download.openmmlab.com/mmcv/dist/{cuda_ver}/torch{torch_ver}/index.html --no-cache-dir')
os.system(f'wget https://lab.ndl.go.jp/dataset/ndlkotensekiocr/trocr/models.zip -P {PROJECT_DIR}/src/text_kotenseki_recognition/')
os.system(f'cd {PROJECT_DIR}/src/text_kotenseki_recognition/ && unzip -o models.zip')
os.system(f'wget https://lab.ndl.go.jp/dataset/ndlkotensekiocr/layoutmodel/models.zip -P {PROJECT_DIR}/src/ndl_kotenseki_layout/')
os.system(f'cd {PROJECT_DIR}/src/ndl_kotenseki_layout/ && unzip -o models.zip')
os.system(f'cd {PROJECT_DIR}/')
os.system("wget https://dl.ndl.go.jp/api/iiif/2585098/R0000003/full/full/0/default.jpg -O example.jpg")
For running on CPU, modify the device in config.yml.
ndl_kotenseki_layout:
config_path: 'src/ndl_kotenseki_layout/models/ndl_kotenseki_layout_config.py'
checkpoint_path: 'src/ndl_kotenseki_layout/models/ndl_kotenseki_layout_v1.pth'
device: 'cpu' # Edited here.
'score_thr': 0.3
text_kotenseki_recognition:
saved_preprocessor_model: 'src/text_kotenseki_recognition/models/trocr-base-preprocessor'
saved_tokenize_model: 'src/text_kotenseki_recognition/models/decoder-roberta-v3'
saved_ocr_model: 'src/text_kotenseki_recognition/models/kotenseki-trocr-honkoku-v3'
accept_empty: True
batch_size: 100
device: 'cpu' # Edited here.
kotenseki_reading_order:
checkpoint_path: 'src/kotenseki_reading_order/models/kotenseki_reading_order_model.joblib'
Execution
Prepare the folder and images for processing.
mkdir -p input/img
mv example.jpg input/img
mkdir -p tmp/output
Execute.
python main.py infer input tmp/output
Summary
I hope this serves as a helpful reference for environments where inference time is not a critical concern.