Training YOLOv11 Classification (Kuzushiji Recognition) Using mdx.jp

Overview

We had the opportunity to train a YOLOv11 classification model (for kuzushiji/classical Japanese character recognition) using mdx.jp, so this article serves as a reference.

Dataset

We target the following “Kuzushiji Dataset”:

http://codh.rois.ac.jp/char-shape/book/

Creating the Dataset

We format the dataset to match the YOLO format. First, we merge the data, which is separated by book title, into a flat structure.

#| export
class Classification:

    def create_dataset(self, input_file_path, output_dir):
        # "../data/*/characters/*/*.jpg"
        files = glob(input_file_path)

        # output_dir = "../data/dataset"

        for file in tqdm(files):
            cls = file.split("/")[-2]

            output_file = f"{output_dir}/{cls}/{file.split('/')[-1]}"

            if os.path.exists(output_file):
                continue

            # print(f"Copying {file} to {output_file}")

            os.makedirs(f"{output_dir}/{cls}", exist_ok=True)

            shutil.copy(file, output_file)

Next, we split the dataset using the following script:

def split(self, input_dir, output_dir, train_ratio = 0.7, val_ratio = 0.15):
        if os.path.exists(output_dir):
             shutil.rmtree(output_dir)

        # Get class directories
        classes = [d for d in os.listdir(input_dir) if os.path.isdir(os.path.join(input_dir, d))]

        # Split and save data
        for cls in tqdm(classes):
            class_dir = os.path.join(input_dir, cls)
            files = [os.path.join(class_dir, f) for f in os.listdir(class_dir) if os.path.isfile(os.path.join(class_dir, f))]

            # Shuffle and split
            random.shuffle(files)
            train_end = int(len(files) * train_ratio)
            val_end = int(len(files) * (train_ratio + val_ratio))

            train_files = files[:train_end]
            val_files = files[train_end:val_end]
            test_files = files[val_end:]

            # Create save directories
            for split, split_files in zip(["train", "val", "test"], [train_files, val_files, test_files]):
                split_dir = os.path.join(output_dir, split, cls)
                os.makedirs(split_dir, exist_ok=True)

                # Copy files
                for file in split_files:
                    shutil.copy(file, os.path.join(split_dir, os.path.basename(file)))

        print("Data splitting completed.")

As a result, a dataset of 1,086,326 images was created.

Using Ultralytics HUB (Failed)

First, we considered using Ultralytics HUB and attempted to upload the dataset, but an error occurred as shown below. We were unable to determine whether the cause was an issue with how the dataset was created.

Uploading to mdx.jp Object Storage

The dataset is uploaded to object storage using the following command:

s3cmd sync data/split_dataset_full.zip s3://satoru196/dataset/

Reference: Initial Setup

The official manual is available at:

https://docs.mdx.jp/ja/index.html#object-storage-usage-example

sudo apt install s3cmd

s3cmd --configure
...
Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables.
Access Key: (enter your access key)
Secret Key: (enter your secret key)
Default Region [US]: us-east-1

Use "s3.amazonaws.com" for S3 Endpoint and not modify it to the target Amazon S3.
S3 Endpoint [s3.amazonaws.com]: s3ds.mdx.jp

Use "%(bucket)s.s3.amazonaws.com" to the target Amazon S3. "%(bucket)s" and "%(location)s" vars can be used
if the target S3 system supports dns based buckets.
DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]: s3ds.mdx.jp

Encryption password:
Path to GPG program [/usr/bin/gpg]:
Use HTTPS protocol [Yes]:
HTTP Proxy server name:

Test access with supplied credentials? [Y/n]
Please wait, attempting to list all buckets...
Success. Your access key and secret key worked fine :-)  == If you see this, it's OK!

Now verifying that encryption works...
Not configured. Never mind.
Save settings? [y/N] y

Upload:

s3cmd put (filename) s3://(bucket-name)

Download:

s3cmd get s3://(bucket-name)/(object-key)

Operations on mdx.jp

We use 1 GPU pack on mdx.jp.

Download the dataset from object storage using the following commands:

s3cmd get s3://satoru196/dataset/split_dataset_full.zip
unzip split_dataset_full.zip

Then execute training using the following script:

your_training_script.py

from ultralytics import YOLO

# Load the YOLOv8 classification model
model = YOLO('yolo11x-cls.pt')  # Nano-size classification model

# Dataset and training configuration
model.train(
    data='/home/mdxuser/yolo/split_dataset_full',  # Dataset path
    epochs=10,                 # Number of epochs
    # imgsz=224,                 # Input image size
    batch=256                   # Batch size (optional)
)

We use tmux so that training continues even if the SSH connection drops:

tmux new -s my_training_session
python your_training_script.py

After execution, the following checks are performed:

Downloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11x-cls.pt to 'yolo11x-cls.pt'...
100%|██████████| 56.9M/56.9M [00:0147.4MB/s]
Ultralytics 8.3.27 Python-3.10.12 torch-2.5.1+cu124 CUDA:0 (NVIDIA A100-SXM4-40GB, 40339MiB)
engine/trainer: task=classify, mode=train, model=yolo11x-cls.pt, data=/home/mdxuser/yolo/split_dataset_full, epochs=10, time=None, patience=100, batch=16, imgsz=224, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=train, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=True, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0, copy_paste_mode=flip, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs/classify/train
train: /home/mdxuser/yolo/split_dataset_full/train... found 758321 images in 3538 classes: ERROR requires 4328 classes, not 3538
val: /home/mdxuser/yolo/split_dataset_full/val... found 162622 images in 2901 classes: ERROR requires 4328 classes, not 2901
test: /home/mdxuser/yolo/split_dataset_full/test... found 165383 images in 4328 classes
Overriding model.yaml nc=80 with nc=4328

                   from  n    params  module                                       arguments
  0                  -1  1      2784  ultralytics.nn.modules.conv.Conv             [3, 96, 3, 2]
  1                  -1  1    166272  ultralytics.nn.modules.conv.Conv             [96, 192, 3, 2]
  2                  -1  2    389760  ultralytics.nn.modules.block.C3k2            [192, 384, 2, True, 0.25]
  3                  -1  1   1327872  ultralytics.nn.modules.conv.Conv             [384, 384, 3, 2]
  4                  -1  2   1553664  ultralytics.nn.modules.block.C3k2            [384, 768, 2, True, 0.25]
  5                  -1  1   5309952  ultralytics.nn.modules.conv.Conv             [768, 768, 3, 2]
  6                  -1  2   5022720  ultralytics.nn.modules.block.C3k2            [768, 768, 2, True]
  7                  -1  1   5309952  ultralytics.nn.modules.conv.Conv             [768, 768, 3, 2]
  8                  -1  2   5022720  ultralytics.nn.modules.block.C3k2            [768, 768, 2, True]
  9                  -1  2   3264768  ultralytics.nn.modules.block.C2PSA           [768, 768, 2]
 10                  -1  1   6529768  ultralytics.nn.modules.head.Classify         [768, 4328]
YOLO11x-cls summary: 309 layers, 33,900,232 parameters, 33,900,232 gradients, 115.4 GFLOPs
Transferred 492/494 items from pretrained weights
Ultralytics HUB: View model at https://hub.ultralytics.com/models/VMzOVApOedDA0LLYqdho
AMP: running Automatic Mixed Precision (AMP) checks...

Training progress can be monitored by visiting the URL shown at “Ultralytics HUB: View model at”.

Batch Size

Initially, training was run without specifying a batch size. The default appears to be 16, but GPU memory usage was low as shown below:

Epoch    GPU_mem       loss  Instances       Size
       1/10      1.94G      5.908         16        224:  16%|          | 7471/47386 [08:1615.33it/s]

To take advantage of the A100’s 40 GB memory, we changed the batch size to 256. As a result, execution time per epoch was reduced:

Epoch    GPU_mem       loss  Instances       Size
       1/10      22.7G      8.017        256        224:  12%|          | 349/2962 [02:442.14it/s]

Reference: tmux

List current tmux sessions:

tmux ls

Reattach to a specific session:

tmux attach -t 0

Detach from a tmux session:

Press Ctrl + b, then press d.

Training Results

The trained model is available from the following Hugging Face Space:

https://huggingface.co/spaces/nakamura196/yolov11x-cls-codh-char

While there is room for improvement in accuracy, there were cases where correct results were returned:

The character images above were sourced from:

https://mojiportal.nabunken.go.jp/

Summary

We hope this serves as a useful reference for training with YOLO, mdx.jp, and related tools.

Overview#

Dataset#

Creating the Dataset#

Using Ultralytics HUB (Failed)#

Uploading to mdx.jp Object Storage#

Reference: Initial Setup#

Operations on mdx.jp#

Batch Size#

Reference: tmux#

Training Results#

Summary#