Overview
Notes on programs for formatting XML strings in Python.
Program 1
I referenced the following.
https://hawk-tech-blog.com/python-learn-prettyprint-xml/
I added processing to remove unnecessary blank lines.
from xml.dom import minidom
import re
def prettify(rough_string):
reparsed = minidom.parseString(rough_string)
pretty = re.sub(r"[\t ]+\n", "", reparsed.toprettyxml(indent="\t")) # Remove unnecessary line breaks after indentation
pretty = pretty.replace(">\n\n\t<", ">\n\t<") # Remove unnecessary blank lines
pretty = re.sub(r"\n\s*\n", "\n", pretty) # Replace consecutive line breaks (including blank lines) with a single line break
return pretty
Program 2
I referenced the following.
https://qiita.com/hrys1152/items/a87b4ca3c74ec4997f66
When processing TEI/XML, I recommend registering the namespace.
import xml.etree.ElementTree as ET
# Register namespace
ET.register_namespace('', "http://www.tei-c.org/ns/1.0")
tree = ET.ElementTree(ET.fromstring(xml_string))
ET.indent(tree, space=' ')
tree.write('output.xml', encoding='UTF-8', xml_declaration=True)
Summary
I hope this serves as a useful reference.