Notebook: Open in Google Colab / GitHub

TL;DR

  • Collected 617 bibliographic records from the National Diet Library Search API (SRU endpoint)
  • Fine-tuned llm-jp-3-1.8b with LoRA, training only 0.67% of all parameters
  • Pre-training accuracy: 22.0% → Post-training: 78.0% (+56 points)
  • LoRA teaches the model how to perform a task, not just memorize facts

What is NDC?

The Nippon Decimal Classification (NDC) is the standard book classification system used across Japanese libraries. Every book is assigned a numeric code, where the first digit indicates one of ten broad categories:

NDCCategory
0General works (encyclopedias, information science)
1Philosophy & religion
2History & geography
3Social sciences (law, economics, education)
4Natural sciences (math, physics, medicine)
5Technology & engineering
6Industry (agriculture, commerce, transport)
7Arts & sports
8Language
9Literature

Assigning NDC codes during cataloging requires subject analysis expertise. An AI that can estimate the broad category from a title alone would be useful as a first-pass screening tool, supporting librarians in the classification workflow.

LoRA in a Nutshell

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method for large language models. Instead of updating all 1.8 billion parameters, LoRA freezes the original model and inserts small adapter matrices into the Attention layers:

Base model (1.8B parameters)  →  Frozen (unchanged)
LoRA adapters (~9M parameters)  →  Only these are trained

In this project, only about 0.67% of the total parameters (12,582,912 / 1,880,197,120) are trainable. This keeps GPU memory usage low while still achieving task-specific performance. Task adaptation often lies in a low-rank subspace, so updating a small fraction of the parameters can be sufficient.

Step 1. Data Collection from the NDL Search API

The National Diet Library Search provides a free SRU (Search/Retrieve via URL) API. We fetch up to 80 books per NDC category (roughly 800 records before filtering). After filtering by title length (3–80 characters), the per-category counts are as follows:

NDCCategoryRecords
0General works65
1Philosophy67
2History73
3Social sciences59
4Natural sciences52
5Technology63
6Industry65
7Arts & sports57
8Language67
9Literature49

Note that the dataset contains some noise — the API returns records with very short or ambiguous titles that are difficult to classify even for humans.

NDC_NAMES = {
    "0": "General works", "1": "Philosophy", "2": "History",
    "3": "Social sciences", "4": "Natural sciences", "5": "Technology",
    "6": "Industry", "7": "Arts & sports", "8": "Language", "9": "Literature",
}

def fetch_ndl_books(ndc_digit, count=80, start=1):
    """Fetch bibliographic records for a given NDC digit from the NDL SRU API."""
    base_url = "https://ndlsearch.ndl.go.jp/api/sru"
    query = f'ndc="{ndc_digit}*"'
    params = (
        f"?operation=searchRetrieve"
        f"&query={urllib.parse.quote(query)}"
        f"&maximumRecords={count}"
        f"&startRecord={start}"
        f"&recordSchema=dcndl"
    )
    url = base_url + params
    # Parse XML response, extract title and NDC code
    # Filter to titles between 3-80 characters
    return books

# Collect from all 10 categories
all_books = []
for digit in "0123456789":
    books = fetch_ndl_books(digit, count=80)
    all_books.extend(books)
    time.sleep(1)  # Rate limiting

The data is shuffled and split: 5 samples per category (50 total) are reserved for testing, and the rest (567) are used for training.

Step 2. Prompt Design

The model is given a classification task: look at a book title and output the first digit of its NDC code.

以下の本のタイトルから、NDC(日本十進分類法)の1桁目を答えてください。

【NDC一覧】
0: 総記
1: 哲学
2: 歴史
3: 社会科学
4: 自然科学
5: 技術・工学
6: 産業
7: 芸術・スポーツ
8: 言語
9: 文学

【タイトル】吾輩は猫である
【NDC】

The prompt is in Japanese (matching the model’s training language). For training samples, the correct answer digit is appended after 【NDC】. At inference time, the model generates freely and the first digit (0-9) in the output is taken as the prediction.

Step 3. Model and LoRA Configuration

All experiments were run on Google Colab with a Tesla T4 GPU. The base model is llm-jp/llm-jp-3-1.8b, a Japanese-focused causal language model (~1.88 billion parameters).

MODEL_NAME = "llm-jp/llm-jp-3-1.8b"
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME, torch_dtype=torch.bfloat16, device_map="auto",
)

The LoRA configuration:

lora_config = LoraConfig(
    r=32,                # Rank of the adapter matrices
    lora_alpha=32,       # Scaling factor
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],  # All attention projections
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, lora_config)
Total parameters:     1,880,197,120
Trainable (LoRA):        12,582,912 (0.67%)
→ Only 1/149 of all parameters are trained

With r=32, the adapter matrices are 32-dimensional. Applying LoRA to all four attention projections (Q, K, V, O) gives the model enough flexibility to learn the classification mapping. The resulting trainable parameters are roughly 0.67% of the total.

Step 4. Training

Training uses TRL’s SFTTrainer for supervised fine-tuning:

training_args = SFTConfig(
    output_dir="./lora_ndc_output",
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,   # Effective batch size: 16
    learning_rate=5e-4,
    num_train_epochs=5,
    logging_steps=10,
    save_strategy="epoch",
    bf16=True,
    dataset_text_field="text",
    report_to="none",
)

trainer = SFTTrainer(
    model=model, train_dataset=train_dataset, args=training_args,
)
trainer.train()

The training runs for 5 epochs (180 steps total) over 567 samples with an effective batch size of 16. With bf16 precision and LoRA, the entire training completes in a few minutes on a Tesla T4 GPU.

Reading the Training Loss

During training, the Training Loss indicates how wrong the model’s predictions are — lower is better.

StepTraining Loss
100.8592
200.5131
300.4557
400.4147
500.3825
600.3772
700.3523
800.3150
900.3094
1000.2815
1100.2545
1200.2198
1300.2311
1400.2286
1500.1983
1600.1681
1700.1775
1800.1766

The loss decreased from 0.86 to 0.18 over 5 epochs (180 steps). This is a cross-entropy loss, so Loss = -log(probability of correct token):

LossCorrect Prediction ProbabilityInterpretation
2.30~10%Random guessing (10 classes)
0.86~42%Step 10
0.46~63%Step 30
0.18~84%Final step

Note: This loss is averaged over the entire prompt (including the boilerplate NDC legend and title text). The template portions are memorized quickly and contribute low loss, while the actual NDC digit prediction — the part we care about — has higher loss than the average suggests. So the final test accuracy can’t be read directly from the loss number; you need to evaluate on held-out test data.

What LoRA Actually Teaches — Behavior, Not Knowledge

LoRA is not cramming NDC classification expertise into the model. It teaches a behavioral skill: given this input format, produce this output format. This is the same pattern seen across LoRA use cases:

Legal Exam ExampleNDC Classification
What’s taughtAnswer format (a/b/c/d for multiple choice)Answer format (0-9 for NDC)
What’s NOT taughtLegal knowledgeNDC classification expertise
ResultAccuracy improvesAccuracy improves

LoRA efficiently teaches the model to leverage its pre-existing Japanese vocabulary knowledge (e.g., “programming” → technology, “poetry collection” → literature) in a task-appropriate output format. The domain knowledge was already latent in the base model’s 1.8 billion parameters.

Step 5. Results

Before vs. After

  • Before LoRA: 11/50 correct = 22.0% accuracy (above the 10% random baseline, but unstable)
  • After LoRA: 39/50 correct = 78.0% accuracy (+56 points)

This was achieved by training only 0.67% of the model’s parameters.

Output Format: Before vs. After

A notable difference is in the output format itself:

  • Before: The model tends to output full NDC codes like “910.2”, “010”, “010.3”, or “369.3” — it doesn’t understand that the task requires a single digit
  • After: Outputs are consistently single digits ("1", “7”, “9”), showing that the model has learned the expected task format

Here is a sample from the 50-item test set comparing predictions before and after LoRA:

TitleAnswerBeforeAfter
嗚呼孝子元政上人1 (Philosophy)9 (Literature)1 (Philosophy)
「アーカイブ中核拠点形成モデル事業」(撮影所等に)7 (Arts)9 (Literature)7 (Arts)
アーカーシャ年代記より1 (Philosophy)0 (General)1 (Philosophy)
ああ言えばこう食う 往復エッセイ9 (Literature)9 (Literature)9 (Literature)
あゝ愛宕丘の灯:追憶の四十有余年3 (Social sci.)9 (Literature)3 (Social sci.)
アーク溶接作業における粉じん対策に関する調査研究報告4 (Natural sci.)3 (Social sci.)4 (Natural sci.)
アーキテクチャとプログラミングの基礎4 (Natural sci.)0 (General)5 (Technology)
ああアメリカ:傷だらけの巨象3 (Social sci.)9 (Literature)3 (Social sci.)

Before training, the model defaults to “9 (Literature)” or “0 (General)” for most inputs, showing no real classification ability. After LoRA, most predictions are correct. Cases like “アーキテクチャとプログラミングの基礎” (Fundamentals of Architecture and Programming) remain misclassified — distinguishing Technology (5) from Natural Sciences (4) based on title alone is inherently difficult.

Per-Category Analysis

Performance varies across categories. Categories with distinctive vocabulary in their titles (e.g., “Philosophy” with characteristic terms) tend to score higher. “General works” (NDC 0), which encompasses a broad range of topics, is harder to classify from title alone.

Step 6. Interactive Demo

The trained model can classify arbitrary book titles:

def predict_ndc(model, title):
    """Predict NDC category from a book title."""
    book = {"title": title, "ndc": "?", "ndc_name": "?"}
    prompt = make_prompt(book, include_answer=False)
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs, max_new_tokens=5,
            do_sample=False, pad_token_id=tokenizer.pad_token_id,
        )
    generated = tokenizer.decode(
        outputs[0][inputs["input_ids"].shape[1]:],
        skip_special_tokens=True
    ).strip()

    predicted = "?"
    for ch in generated:
        if ch in "0123456789":
            predicted = ch
            break
    return predicted, NDC_NAMES.get(predicted, "Unknown")

Example predictions:

TitlePredicted NDCCategory
吾輩は猫である (I Am a Cat)8Language
相対性理論入門 (Intro to Relativity)4Natural sciences
日本経済の構造改革 (Structural Reform of Japan’s Economy)3Social sciences
フランス料理の基本技法 (French Cooking Techniques)7Arts & sports
はじめてのPython入門 (Intro to Python)4Natural sciences
万葉集を読む (Reading the Man’yoshu)9Literature
西洋美術史 (History of Western Art)7Arts & sports
憲法判例百選 (100 Constitutional Law Cases)3Social sciences
英語の語源辞典 (English Etymology Dictionary)8Language
鉄道の歴史と未来 (History and Future of Railways)6Industry

Most predictions are reasonable. Some are incorrect — “I Am a Cat” (a classic novel) is predicted as Language (8) instead of Literature (9), and “Intro to Python” gets Natural Sciences (4) instead of General Works (0) — but on the whole the model shows a reasonable title-to-category mapping from 567 training examples.

Practical Considerations

Applications

  • Library cataloging support: Auto-classify incoming books as a first pass, reducing manual effort for librarians
  • Bookstore/publisher categorization: Automatic shelf assignment for inventory management
  • Bibliographic data enrichment: Fill in missing classification codes in incomplete records

Limitations

  • Top-level classification only: Real-world use requires 3-digit NDC codes (e.g., 913 = Japanese novels, 490 = Medicine). This is achievable with more training data.
  • Title-only input: Adding author names, publisher, and table of contents would improve accuracy substantially.
  • Data bias: Books available through the API skew toward recent publications.

Future Directions

  • Extend to 3-digit NDC classification for practical utility
  • Incorporate additional metadata (author, publisher) into the prompt
  • Combine with RAG (Retrieval-Augmented Generation) to reference similar books’ classifications during inference

Acknowledgments

I would like to thank Toru Aoike of the National Diet Library for introducing me to LoRA.