Building an NDC Book Classifier with LoRA: Fine-Tuning a Japanese LLM on Library Data
Notebook: Open in Google Colab / GitHub TL;DR Collected 617 bibliographic records from the National Diet Library Search API (SRU endpoint) Fine-tuned llm-jp-3-1.8b with LoRA, training only 0.67% of all parameters Pre-training accuracy: 22.0% → Post-training: 78.0% (+56 points) LoRA teaches the model how to perform a task, not just memorize facts What is NDC? The Nippon Decimal Classification (NDC) is the standard book classification system used across Japanese libraries. Every book is assigned a numeric code, where the first digit indicates one of ten broad categories: ...
