Home Articles Books Search About
日本語
Building an NDC Book Classifier with LoRA: Fine-Tuning a Japanese LLM on Library Data

Building an NDC Book Classifier with LoRA: Fine-Tuning a Japanese LLM on Library Data

Notebook: Open in Google Colab / GitHub TL;DR Collected 617 bibliographic records from the National Diet Library Search API (SRU endpoint) Fine-tuned llm-jp-3-1.8b with LoRA, training only 0.67% of all parameters Pre-training accuracy: 22.0% → Post-training: 78.0% (+56 points) LoRA teaches the model how to perform a task, not just memorize facts What is NDC? The Nippon Decimal Classification (NDC) is the standard book classification system used across Japanese libraries. Every book is assigned a numeric code, where the first digit indicates one of ten broad categories: ...

Voyant Tools: A Browser-Based Text Analysis Platform for Digital Humanities

Voyant Tools: A Browser-Based Text Analysis Platform for Digital Humanities

TL;DR Voyant Tools is a browser-based text analysis platform. Simply paste or upload text data to instantly perform word cloud generation, KWIC (Key Word In Context) analysis, co-occurrence analysis, topic modeling, TF-IDF calculations, and more. It supports Japanese morphological analysis and is widely used as a standard text mining tool in Digital Humanities. What is Voyant Tools? Voyant Tools is an open-source text analysis environment developed by Stéfan Sinclair and Geoffrey Rockwell. With over 20 years of history since its first version in 2003, it is one of the most widely used text analysis tools in the DH field. ...