In Archivematica, metadata schemas other than Dublin Core (DC) can be embedded in the AIP’s METS.xml. This guide uses source-metadata.csv to include non-DC metadata such as EAD and MODS in a Transfer and verifies via the API whether they are correctly stored in the AIP.

Table of Contents

  1. Background and Purpose
  2. How source-metadata.csv Works
  3. XML Validation Feature
  4. Test 1: MODS-Only Metadata Registration
  5. Test 2: Simultaneous EAD + MODS Registration
  6. Storage Format of Non-DC Metadata in METS.xml
  7. Test 3: Adding Metadata via Reingest
  8. Summary

Background and Purpose

In a standard Archivematica Transfer, Dublin Core metadata described in metadata/metadata.csv is stored in METS.xml as <dmdSec>. However, in actual digital archive operations, there are use cases requiring metadata schemas other than DC:

  • EAD (Encoded Archival Description): A widely used standard for hierarchical archival description
  • MODS (Metadata Object Description Schema): A schema used for detailed description of library materials
  • LIDO: A description standard for museum and art gallery materials
  • MARC21: A catalog data format for libraries

Archivematica provides a feature to associate arbitrary XML metadata with a Transfer through a CSV file called source-metadata.csv, and store them in the AIP’s METS.xml as <dmdSec>. This guide verifies this feature via the API.

How source-metadata.csv Works

CSV Format

source-metadata.csv is a CSV file placed in the Transfer’s metadata/ directory, consisting of three columns.

filename,metadata,type
objects,ead.xml,EAD
objects,mods.xml,MODS
objects/dir/file.pdf,file_metadata.xml,CustomType
ColumnDescription
filenameRelative path to the file or directory targeted by the metadata (starting with objects/)
metadataPath to the XML metadata file (relative to the metadata/ directory)
typeMetadata type identifier. Used in the OTHERMDTYPE attribute of METS.xml

Transfer Directory Structure

my-transfer/
├── objects/
│   └── test-document.txt      <- Digital object to be preserved
└── metadata/
    ├── source-metadata.csv    <- Mapping definition between metadata and objects
    ├── ead.xml                <- EAD metadata
    └── mods.xml               <- MODS metadata

Associating Multiple Metadata with a Single File

In source-metadata.csv, different type metadata can be associated with the same filename across multiple rows.

filename,metadata,type
objects,ead.xml,EAD
objects,mods.xml,MODS

In this example, both EAD and MODS are associated with the objects directory (all files underneath), and each is stored as an independent <dmdSec> in METS.xml.

Processing Flow

  1. source-metadata.csv is read during Transfer
  2. The XML files specified in the metadata column are parsed
  3. If XML Validation is enabled, validation against the schema is performed
  4. XML that passes validation is embedded in METS.xml as <dmdSec>

XML Validation Feature

Overview

Archivematica has a feature that validates XML metadata specified in source-metadata.csv against schemas. This feature is disabled by default and is positioned as an experimental feature.

To enable it, set the MCP Client environment variables.

ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_METADATA_XML_VALIDATION_ENABLED=true
METADATA_XML_VALIDATION_SETTINGS_FILE=/path/to/xml_validation.py

Validation Configuration File

The validation configuration is written as a Python file. Below is a configuration example used in Archivematica’s test environment.

from pathlib import Path

__DIR = Path(__file__).parents[0] / "schemas"

XML_VALIDATION = {
    "http://www.openarchives.org/OAI/2.0/oai_dc/": (__DIR / "oai_dc.xsd").as_posix(),
    "http://www.lido-schema.org": (__DIR / "lido-v1.1.xsd").as_posix(),
    "http://www.loc.gov/MARC21/slim": (__DIR / "MARC21slim.xsd").as_posix(),
    "http://www.loc.gov/mods/v3": (__DIR / "mods.xsd").as_posix(),
    "http://slubarchiv.slub-dresden.de/rights1": (__DIR / "rights1.xsd").as_posix(),
    "alto": (__DIR / "alto-v2.0.xsd").as_posix(),
    "metadata": None,
    "bag-info": None,
}
XML_VALIDATION_FAIL_ON_ERROR = False

The keys of the XML_VALIDATION dictionary are matched against XML documents in the following order:

  1. Value of the xsi:noNamespaceSchemaLocation attribute
  2. Last value of the xsi:schemaLocation attribute
  3. Namespace URI of the root element
  4. Local name of the root element

If the value for a key is None, validation is skipped but storage in <dmdSec> still occurs. If the key does not exist in the dictionary, that metadata is silently skipped and is not stored in <dmdSec> either.

Supported Schemas in the Test Environment

Namespace / KeyMetadata TypeValidation
http://www.openarchives.org/OAI/2.0/oai_dc/Dublin Core (OAI-PMH)XSD
http://www.lido-schema.orgLIDOXSD
http://www.loc.gov/MARC21/slimMARC21XSD
http://www.loc.gov/mods/v3MODSXSD
http://slubarchiv.slub-dresden.de/rights1SLUB RightsXSD
altoALTO (OCR)XSD
metadataGeneral metadataSkipped
bag-infoBagIt informationSkipped

Note: EAD (urn:isbn:1-931666-22-9) is not included in the test environment’s default configuration. If using EAD, you must add an entry to the configuration file.

Test 1: MODS-Only Metadata Registration

Administration > Processing configuration screen – Manages processing profiles (default / automated / backlog) used when submitting Transfers

Test Environment

  • Archivematica 1.19 (Docker environment)
  • Dashboard: http://127.0.0.1:62080
  • Storage Service: http://127.0.0.1:62081
  • XML Validation: Enabled (test environment default settings)

Creating the Transfer Package

Create a Transfer package with the following structure.

metadata-test/
├── objects/
│   └── test-document.txt
└── metadata/
    ├── source-metadata.csv
    ├── ead.xml
    └── mods.xml

source-metadata.csv:

filename,metadata,type
objects,ead.xml,EAD
objects,mods.xml,MODS

mods.xml:

xml version="1.0" encoding="UTF-8"?>
mods xmlns="http://www.loc.gov/mods/v3" version="3.6">
  titleInfo>
    title>Test Document for Metadata Validationtitle>
  titleInfo>
  name type="corporate">
    namePart>Nakamura Test OrganizationnamePart>
    role>
      roleTerm type="text">creatorroleTerm>
    role>
  name>
  typeOfResource>texttypeOfResource>
  language>
    languageTerm type="code" authority="iso639-2b">jpnlanguageTerm>
  language>
  abstract>Test document for verifying MODS metadata registration.abstract>
mods>

Running the Transfer via API

curl -X POST http://127.0.0.1:62080/api/v2beta/package/ \
  -H "Authorization: ApiKey test:test" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "metadata-test",
    "type": "standard",
    "path": "",
    "processing_config": "automated",
    "auto_approve": true
  }'

Results

Transfer tab – Transfer processing for metadata-test and metadata-test2 is complete, with each Microservice status displayed

Transfer -> Ingest completed and the AIP was created successfully. Checking the <dmdSec> in METS.xml, only MODS was stored as a dmdSec.

mets:dmdSec ID="dmdSec_2" CREATED="2026-02-17T01:51:41" STATUS="original">
  mets:mdWrap MDTYPE="OTHER" OTHERMDTYPE="MODS">
    mets:xmlData>
      mods xmlns="http://www.loc.gov/mods/v3" version="3.6">

      mods>
    mets:xmlData>
  mets:mdWrap>
mets:dmdSec>

Why EAD was not stored: The following error was recorded in the MCP Client log.

XML validation schema not found for keys: ['urn:isbn:1-931666-22-9', 'ead']

Because the EAD namespace (urn:isbn:1-931666-22-9) was not registered in the XML Validation configuration file, it was skipped during validation and not stored in <dmdSec>.

Test 2: Simultaneous EAD + MODS Registration

Modifying the XML Validation Configuration

To handle EAD, the EAD namespace was added to the configuration file.

XML_VALIDATION = {
    # ... existing settings ...
    "urn:isbn:1-931666-22-9": None,  # EAD: Store in dmdSec without validation
}

By specifying None, XSD schema validation is skipped, and only storage in <dmdSec> of METS.xml is performed.

Running the Transfer via API

A new Transfer package (metadata-test2) was submitted via API using the same structure as before.

Results: METS.xml dmdSec

Archival Storage list – AIPs for metadata-test (Test 1) and metadata-test2 (Test 2) are stored successfully

This time, 3 dmdSec entries were generated.

dmdSec_1: PREMIS:OBJECT (standard)

mets:dmdSec ID="dmdSec_1" CREATED="2026-02-17T01:57:54" STATUS="original">
  mets:mdWrap MDTYPE="PREMIS:OBJECT">
    mets:xmlData>
      premis:object xsi:type="premis:intellectualEntity">
        premis:objectIdentifier>
          premis:objectIdentifierValue>7e26ac5e-ef3b-4f17-8717-f5239bbe355fpremis:objectIdentifierValue>
        premis:objectIdentifier>
      premis:object>
    mets:xmlData>
  mets:mdWrap>
mets:dmdSec>

dmdSec_2: EAD

mets:dmdSec ID="dmdSec_2" CREATED="2026-02-17T01:57:54" STATUS="original">
  mets:mdWrap MDTYPE="OTHER" OTHERMDTYPE="EAD">
    mets:xmlData>
      ead xmlns="urn:isbn:1-931666-22-9">
        eadheader>
          eadid>metadata-test-002eadid>
          filedesc>
            titlestmt>
              titleproper>Test Collection for EAD Metadata Validationtitleproper>
            titlestmt>
          filedesc>
        eadheader>
        archdesc level="collection">
          did>
            unittitle>Test Collection for EAD Metadata Validationunittitle>
            unitdate type="inclusive" normal="2024/2025">2024-2025unitdate>
            unitid>META-TEST-002unitid>
          did>
        archdesc>
      ead>
    mets:xmlData>
  mets:mdWrap>
mets:dmdSec>

dmdSec_3: MODS

mets:dmdSec ID="dmdSec_3" CREATED="2026-02-17T01:57:54" STATUS="original">
  mets:mdWrap MDTYPE="OTHER" OTHERMDTYPE="MODS">
    mets:xmlData>
      mods xmlns="http://www.loc.gov/mods/v3" version="3.6">
        titleInfo>
          title>Test Document for MODS Metadata Validationtitle>
        titleInfo>

      mods>
    mets:xmlData>
  mets:mdWrap>
mets:dmdSec>

Association in structMap

In the structMap, both EAD and MODS are associated with the objects directory as DMDID="dmdSec_2 dmdSec_3".

mets:structMap TYPE="physical" ID="structMap_1" LABEL="Archivematica default">
  mets:div TYPE="Directory" LABEL="metadata-test2-..." DMDID="dmdSec_1">
    mets:div TYPE="Directory" LABEL="objects" DMDID="dmdSec_2 dmdSec_3">

      mets:div TYPE="Item" LABEL="test-document.txt">
        mets:fptr FILEID="file-..."/>
      mets:div>
    mets:div>
  mets:div>
mets:structMap>

Storage Format of Non-DC Metadata in METS.xml

MDTYPE Attribute Handling

The value specified in the type column of source-metadata.csv is stored in METS.xml as follows.

mets:mdWrap MDTYPE="OTHER" OTHERMDTYPE="EAD">

Examining Archivematica’s source code (archivematicaCreateMETSMetadataXML.py), metadata from source-metadata.csv is always stored as MDTYPE="OTHER", and the type column value is set in the OTHERMDTYPE attribute.

fsentry.add_dmdsec(
    tree.getroot(),
    "OTHER",
    othermdtype=xml_type,
    status="update" if "REIN" in sip_type else "original",
)

This differs from Dublin Core (stored as MDTYPE="DC" via metadata.csv).

STATUS Attribute

  • Initial Ingest: STATUS="original"
  • Re-ingest: STATUS="update"

During re-ingest, existing dmdSec entries with the same type are treated as superseded, and new dmdSec entries are added with STATUS="update".

XML File Storage Location

XML files referenced in source-metadata.csv are also stored as files within the AIP.

data/objects/metadata/transfers/-/ead.xml
data/objects/metadata/transfers/-/mods.xml
data/objects/metadata/transfers/-/source-metadata.csv

Test 3: Metadata Update via Reingest

AIP detail screen – Showing the UUID, size, storage location, and METS file download link for metadata-test2

Purpose of the Test

Perform a Metadata re-ingest on the AIP created in Test 2 (containing EAD + MODS) and verify the following:

  • When existing MODS metadata is updated, is the old metadata retained as superseded and the new metadata added as update?
  • Are XML files added during re-ingest stored within the AIP?

Starting the Reingest

Re-ingest tab on the AIP detail screen – Select the Reingest type (Metadata / Partial / Full) and Processing config to execute. This time, Metadata re-ingest is executed via the API

Start the Metadata re-ingest using the Storage Service API.

curl -X POST "http://127.0.0.1:62081/api/v2/file//reingest/" \
  -H "Authorization: ApiKey test:test" \
  -H "Content-Type: application/json" \
  -d '{
    "pipeline": "",
    "reingest_type": "metadata"
  }'
{
  "error": false,
  "message": "Package 7e26ac5e-... sent to pipeline ... for re-ingest",
  "reingest_uuid": "7e26ac5e-...",
  "status_code": 202
}

Adding Metadata

When reingest starts, the AIP is extracted and submitted to Archivematica’s Ingest workflow. Before approving “Approve AIP reingest”, place new metadata files in the extracted AIP’s data/objects/metadata/ directory.

Important: During reingest, only objects/metadata/source-metadata.csv (root level) is processed. CSVs under objects/metadata/transfers/ are only read during the initial Ingest.

data/objects/metadata/
├── source-metadata.csv    <- Mapping file for reingest (newly created)
├── mods-updated.xml       <- Updated MODS (newly created)
├── dc-reingest.xml        <- Newly added DC metadata (newly created)
└── transfers/             <- Initial Ingest metadata (existing, do not modify)
    └── metadata-test2-/
        ├── ead.xml
        ├── mods.xml
        └── source-metadata.csv

source-metadata.csv (for Reingest):

filename,metadata,type
objects,dc-reingest.xml,DC-CUSTOM
objects,mods-updated.xml,MODS

Specifying the same value as an existing dmdSec in the type column (MODS) causes the existing dmdSec to become superseded and a new dmdSec to be added as update. Specifying a new type value (DC-CUSTOM) adds a new dmdSec.

Approving and Processing the Reingest

# Approve the reingest
curl -X POST "http://127.0.0.1:62080/api/ingest/reingest/approve/" \
  -H "Authorization: ApiKey test:test" \
  -d "uuid="

After approval, the Ingest workflow proceeds. Even with Metadata re-ingest, decision points such as Normalize and Transcribe must be passed (select manually if the automated processing config is not applied).

Results: METS.xml After Reingest

The METS.xml after reingest contained 4 dmdSec entries.

dmdSecSTATUSTYPEDescription
dmdSec_1originalPREMIS:OBJECTSIP identification (unchanged)
dmdSec_2originalOTHER(EAD)Initial Ingest EAD (unchanged)
dmdSec_3original-supersededOTHER(MODS)Initial Ingest MODS (changed to superseded)
dmdSec_4updateOTHER(MODS)Updated MODS added via Reingest

dmdSec_3 (old MODS -> superseded):

mets:dmdSec ID="dmdSec_3" CREATED="2026-02-17T01:57:54" STATUS="original-superseded">
  mets:mdWrap MDTYPE="OTHER" OTHERMDTYPE="MODS">
    mets:xmlData>
      mods xmlns="http://www.loc.gov/mods/v3" version="3.6">
        titleInfo>
          title>Test Document for MODS Metadata Validationtitle>
        titleInfo>

      mods>
    mets:xmlData>
  mets:mdWrap>
mets:dmdSec>

dmdSec_4 (new MODS -> update):

mets:dmdSec ID="dmdSec_4" CREATED="2026-02-17T02:15:22" STATUS="update">
  mets:mdWrap MDTYPE="OTHER" OTHERMDTYPE="MODS">
    mets:xmlData>
      mods xmlns="http://www.loc.gov/mods/v3" version="3.6">
        titleInfo>
          title>Test Document - MODS Updated via Reingesttitle>
        titleInfo>
        originInfo>
          dateCreated encoding="w3cdtf">2025-01-15dateCreated>
          dateModified encoding="w3cdtf">2025-06-01dateModified>
        originInfo>
        note>This MODS record was updated via metadata reingest.note>
      mods>
    mets:xmlData>
  mets:mdWrap>
mets:dmdSec>

Changes in structMap

After reingest, the structMap references all dmdSec entries including superseded ones for the objects directory.

mets:div TYPE="Directory" LABEL="objects" DMDID="dmdSec_2 dmdSec_3 dmdSec_4">

Additionally, metadata files added during reingest are also stored as files within the AIP.

mets:div TYPE="Directory" LABEL="metadata">

  mets:div TYPE="Directory" LABEL="transfers">...mets:div>

  mets:div TYPE="Item" LABEL="dc-reingest.xml">...mets:div>
  mets:div TYPE="Item" LABEL="mods-updated.xml">...mets:div>
  mets:div TYPE="Item" LABEL="source-metadata.csv">...mets:div>
mets:div>

Why DC-CUSTOM Was Not Included in dmdSec

dc-reingest.xml (in Dublin Core Terms format) was stored as a file within the AIP but was not stored in dmdSec. The following error was recorded in the MCP Client log.

XML validation schema not found for keys: ['http://purl.org/dc/terms/', 'dcterms']

The Dublin Core registered in the XML Validation configuration was only the OAI-PMH format (http://www.openarchives.org/OAI/2.0/oai_dc/), and the DC Terms format (http://purl.org/dc/terms/) was not registered. Since XML Validation registration rules are based on namespace URIs, even the same Dublin Core requires separate registration if the namespace differs.

Summary

Summary of Test Results

Test ItemResult
MODS metadata storage in dmdSecSuccess
EAD metadata storage in dmdSecSuccess (after adding XML Validation configuration)
Associating multiple metadata with the same objectSuccess (simultaneous EAD + MODS storage)
dmdSec association in structMapCorrect (DMDID="dmdSec_2 dmdSec_3")
MODS metadata update via ReingestSuccess (old: original-superseded, new: update)
structMap update after ReingestCorrect (DMDID="dmdSec_2 dmdSec_3 dmdSec_4")
Storage of files added via Reingest within AIPSuccess
Storage of unregistered namespace DC TermsFailed (not registered in XML Validation configuration)

Important Notes

  1. XML Validation Configuration: In environments where XML Validation is enabled, the namespaces of the metadata schemas you use must be registered in the validation configuration file. If unregistered, metadata is silently skipped. Errors are only recorded in logs, and the Ingest / Re-ingest process itself continues.

  2. MDTYPE Handling: Metadata via source-metadata.csv is always stored as MDTYPE="OTHER". It is not stored as METS-standard MDTYPE values like MDTYPE="MODS" or MDTYPE="EAD".

  3. source-metadata.csv Location During Reingest: During initial Ingest, metadata/transfers/<transfer-name>/source-metadata.csv is used, but during Reingest, only metadata/source-metadata.csv (root level) is processed.

  4. Metadata Versioning via the type Column: The type column in source-metadata.csv functions as an identifier for metadata updates during Reingest. Using the same type value causes the existing dmdSec to become superseded, while using a new type value adds a new dmdSec.

  5. Namespace-Based Registration: Even for the same metadata standard (e.g., Dublin Core), if the namespace URI used differs, each must be registered separately in the XML Validation configuration.

References