Overview

This is a memo of running NDL Classical Japanese OCR on an Amazon EC2 CPU environment. The advantage is that it can be run without preparing an expensive GPU environment, but please note that it takes about 30 seconds to 1 minute per image.

The following article was referenced when building this environment.

https://qiita.com/relu/items/e882e23a9bd07243211b

Instance

Select Ubuntu from Quick Start.

For the instance type, I recommend t2.medium or higher. Errors occurred with smaller instances.

Server Configuration

Log in via SSH and execute the following.

sudo apt-get update && sudo apt-get upgrade -y

sudo apt -y install build-essential
sudo apt -y install libgl1-mesa-dev libglib2.0-0
sudo apt -y install unzip


sudo apt install -y python3-pip
sudo apt install -y python3.10-venv

python3 -m venv app
source app/bin/activate
pip install --upgrade pip

git clone https://github.com/ndl-lab/ndlkotenocr_cli.git
cd ndlkotenocr_cli
vi requirements.txt

Open requirements.txt and remove the version specification for scikit-image. Also add torch and torchvision.

click
lmdb==1.2.1
natsort==7.1.1
nltk==3.6.6
numpy==1.22.4
albumentations==1.2.1
opencv-python==4.6.0.66
protobuf==3.19.6
pyyaml
scikit-image # scikit-image==0.16.2
scipy==1.7.3
lightgbm==3.3.2
transformers==4.19.1
pandas==1.3.5
mmcls==0.23.1
mmdet==2.25.0
datasets==2.2.1
jiwer==2.3.0
wheel
torch
torchvision

Continue executing the following.

pip install -r requirements.txt

Install mmcv-full and download pre-trained models. Prepare and execute a file like the following.

import os
import torch
import json

print(torch.__version__)
torch_ver, cuda_ver = torch.__version__.split('+')
PROJECT_DIR = os.getcwd()
os.system(f'pip install mmcv-full==1.5.3 -f https://download.openmmlab.com/mmcv/dist/{cuda_ver}/torch{torch_ver}/index.html --no-cache-dir')
os.system(f'wget https://lab.ndl.go.jp/dataset/ndlkotensekiocr/trocr/models.zip -P {PROJECT_DIR}/src/text_kotenseki_recognition/')
os.system(f'cd {PROJECT_DIR}/src/text_kotenseki_recognition/ && unzip -o models.zip')
os.system(f'wget https://lab.ndl.go.jp/dataset/ndlkotensekiocr/layoutmodel/models.zip -P {PROJECT_DIR}/src/ndl_kotenseki_layout/')
os.system(f'cd {PROJECT_DIR}/src/ndl_kotenseki_layout/ && unzip -o models.zip')
os.system(f'cd {PROJECT_DIR}/')
os.system("wget https://dl.ndl.go.jp/api/iiif/2585098/R0000003/full/full/0/default.jpg -O example.jpg")

For running on CPU, modify the device in config.yml.

ndl_kotenseki_layout:
  config_path: 'src/ndl_kotenseki_layout/models/ndl_kotenseki_layout_config.py'
  checkpoint_path: 'src/ndl_kotenseki_layout/models/ndl_kotenseki_layout_v1.pth'
  device: 'cpu' # Edited here.
  'score_thr': 0.3
text_kotenseki_recognition:
  saved_preprocessor_model: 'src/text_kotenseki_recognition/models/trocr-base-preprocessor'
  saved_tokenize_model: 'src/text_kotenseki_recognition/models/decoder-roberta-v3'
  saved_ocr_model: 'src/text_kotenseki_recognition/models/kotenseki-trocr-honkoku-v3'
  accept_empty: True
  batch_size: 100
  device: 'cpu' # Edited here.
kotenseki_reading_order:
  checkpoint_path: 'src/kotenseki_reading_order/models/kotenseki_reading_order_model.joblib'

Execution

Prepare the folder and images for processing.

mkdir -p input/img
mv example.jpg input/img
mkdir -p tmp/output

Execute.

python main.py infer input tmp/output

Summary

I hope this serves as a helpful reference for environments where inference time is not a critical concern.