Overview

I had the opportunity to convert information managed in a tabular format into a vertical-writing Microsoft Word format, so here are my notes.

Before conversion:

Research Project TitleProject NumberDirect Costs
Development of Digital Archive System Construction Methods Considering Sustainability and Reusability21K180142600000

After conversion:

The implementation uses a specified template and the “Kanjize” library for mutual conversion between numbers and kanji numerals.

Creating Microsoft Word Files with python-docx

First, create a Microsoft Word template file like the following. While using the specified layout, place {<variable_name>} in the parts where values should be changed.

Then, create a JSON file or similar that maps to the variable names used in the template.

{
    "direct_cost": "二百六十万",
    "indirect_cost": "七十八万",
    "period_end": "二〇二三",
    "period_start": "二〇二一",
    "principal_investigator": "中村 覚",
    "project_number": "二一K一八〇一四",
    "research_category": "若手研究",
    "research_title": "持続性と利活用性を考慮したデジタルアーカイブシステム構築手法の開発"
}

Then, by running the following Python script, you can create a Microsoft Word file based on the specified format.

import docx
import json

item = {
    "direct_cost": "二百六十万",
    "indirect_cost": "七十八万",
    "period_end": "二〇二三",
    "period_start": "二〇二一",
    "principal_investigator": "中村 覚",
    "project_number": "二一K一八〇一四",
    "research_category": "若手研究",
    "research_title": "持続性と利活用性を考慮したデジタルアーカイブシステム構築手法の開発"
}
doc = docx.Document("template.docx")

for para in doc.paragraphs:
	text = para.text
	for key in item:
	    target = "{"+key+"}"
	    if target in text:
		text = text.replace(target, item[key])

	para.text = text

opath = "output.docx"
doc.save(opath)

Converting Between Numbers and Kanji Numerals

To create the input JSON file above, it was necessary to convert numbers to kanji numerals.

Example: 2600000 -> 二百六十万 (two million six hundred thousand)

The following library was used for this conversion:

https://github.com/nagataaaas/Kanjize

By running the following, data managed as numbers can be converted to kanji numerals:

from kanjize import int2kanji
print(int2kanji(2600000))

Summary

I hope this serves as a useful reference when creating Microsoft Word files.