Update History

2024-05-22

  • Added the section “Adding the Docker Command User to the docker Group”.

Overview

mdx is a data platform for industry-academia-government collaboration co-created by universities and research institutions.

https://mdx.jp/

In this article, we will run NDL Classical Japanese OCR using an mdx virtual machine.

https://github.com/ndl-lab/ndlkotenocr_cli

Project Application

For the project type, we selected “Trial”.

With the “Trial” type, one GPU pack was allocated.

Creating a Virtual Machine

Deployment

We selected “01_Ubuntu-2204-server-gpu (Recommended)”.

On the pre-deployment screen, we configured the settings as follows: pack type set to “GPU Pack” and pack count set to 1.

For the public key, we created one on the local PC as follows:

cd ~/.ssh/mdx
ssh-keygen

Then we pasted the contents of the generated id_rsa.pub.

After that, we waited for the virtual machine deployment to complete.

Network Configuration for SSH Connection

We were able to proceed by following this video:

https://youtu.be/p7OqcnXBQt8?si=E5JtC-xnrc5ZQYo_

First, note the IPv4 address of Service Network 1 of the launched virtual machine.

Next, we added “DNAT” from the network tags. The “Source Global IPv4 Address” was auto-filled, and we entered the Service Network IPv4 address noted earlier in the “Destination Private IP Address” field.

Next, we added an “ACL”. Following the video, we configured it as follows.

To allow access only from specific IP addresses, we configured it as follows:

On the other hand, while allowing unrestricted access from any address carries security risks, configuring it as follows appears to enable SSH connections from any address:

Testing the Connection

Use the Source Global IPv4 Address added via DNAT. After the initial login, you will be prompted to change the password.

ssh mdxuser@ -i ~/.ssh/mdx/id_rsa

Connecting with VS Code

For subsequent operations, although not required, we used the VS Code extension “Remote Explorer”.

Working Inside the Virtual Machine

Verifying the GPU

sudo su
root@ubuntu-2204:/home/mdxuser# nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10              Driver Version: 535.86.10    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-40GB          On  | 00000000:03:00.0 Off |                    0 |
| N/A   25C    P0              45W / 400W |      4MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Installing Docker

Docker was installed following the instructions on the following page:

https://docs.docker.com/engine/install/ubuntu/

for pkg in docker.io docker-doc docker-compose podman-docker containerd runc; do sudo apt-get remove $pkg; done
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo \
  "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo docker run hello-world

If Hello from Docker! is displayed, the installation was successful.

NVIDIA Docker Runtime

(There may be better methods, but) we installed the NVIDIA Docker Runtime by running the following:

# Add the package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Adding the Docker Command User to the docker Group

Add the user to the Docker group:

sudo usermod -aG docker $USER

Reboot the system:

sudo reboot

Installing NDL Classical Japanese OCR

From here, we proceed with setting up NDL Classical Japanese OCR.

The following step takes some time:

git clone https://github.com/ndl-lab/ndlkotenocr_cli
cd ndlkotenocr_cli
sh ./docker/dockerbuild.sh

Next, we launch the container, but first modify it to mount a directory from the host machine.

./docker/run_docker.sh

docker run --gpus all -d -v /home/mdxuser/tmpdir:/root/tmpdir --rm --name kotenocr_cli_runner -i kotenocr-cli-py37:latest

Then run the following:

sh ./docker/run_docker.sh

Enter the container:

docker exec -it kotenocr_cli_runner bash

Running Inference

Downloading Images

Create a directory and download “The Tale of Genji” (held by the National Diet Library):

mkdir -p /root/tmpdir/input/2585098/img
wget https://dl.ndl.go.jp/api/iiif/2585098/R0000003/full/full/0/default.jpg -O /root/tmpdir/input/2585098/img/0001.jpg

Running OCR

Run OCR on the downloaded image.

First, create the output folder:

mkdir -p /root/tmpdir/output

Run the OCR:

python main.py infer /root/tmpdir/input/2585098 /root/tmpdir/output/2585098 -s s

The recognition results are stored in the /home/mdxuser/tmpdir/output folder on the host machine.

Other Notes

Stopping the Container

docker stop kotenocr_cli_runner

Shutting Down the Virtual Machine

sudo shutdown -r now

Summary

Thanks to mdx and NDL Lab, an environment that makes it easy to engage in research using machine learning is well established. We are grateful to all those involved.