Overview

Notes on programs for formatting XML strings in Python.

Program 1

I referenced the following.

https://hawk-tech-blog.com/python-learn-prettyprint-xml/

I added processing to remove unnecessary blank lines.

from xml.dom import minidom
import re
def prettify(rough_string):
    reparsed = minidom.parseString(rough_string)
    pretty = re.sub(r"[\t ]+\n", "", reparsed.toprettyxml(indent="\t"))  # Remove unnecessary line breaks after indentation
    pretty = pretty.replace(">\n\n\t<", ">\n\t<")  # Remove unnecessary blank lines
    pretty = re.sub(r"\n\s*\n", "\n", pretty)  # Replace consecutive line breaks (including blank lines) with a single line break
    return pretty

Program 2

I referenced the following.

https://qiita.com/hrys1152/items/a87b4ca3c74ec4997f66

When processing TEI/XML, I recommend registering the namespace.

import xml.etree.ElementTree as ET

# Register namespace
ET.register_namespace('', "http://www.tei-c.org/ns/1.0")

tree = ET.ElementTree(ET.fromstring(xml_string))
ET.indent(tree, space='  ')
tree.write('output.xml', encoding='UTF-8', xml_declaration=True)

Summary

I hope this serves as a useful reference.