Overview
I confirmed the workflow for processing Dataverse data with Archivematica, so here are my notes.
Background
Archivematica provides a feature to input data from Dataverse.
https://www.archivematica.org/en/docs/archivematica-1.17/user-manual/transfer/dataverse/
I learned about this feature at the following lecture, so I decided to try it out.
https://www.kulib.kyoto-u.ac.jp/bulletin/1402322
Dataverse
I used the Demo Dataverse that was also used in the following article.
I uploaded the following data.
https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.70122/FK2/IHQZL3
From here, download both the image data itself and the JSON data. Go to the Metadata tab and select JSON from Export Metadata.

Below is a part of the JSON file. metadataBlocks contains the metadata and files contains the image file information.
{
"metadataBlocks": {
"citation": {
"displayName": "Citation Metadata",
"name": "citation",
"fields": [
{
"typeName": "title",
"multiple": false,
"typeClass": "primitive",
"value": "nakamura196"
},
{
"typeName": "author",
"multiple": true,
"typeClass": "compound",
"value": [
{
"authorName": {
"typeName": "authorName",
"multiple": false,
"typeClass": "primitive",
"value": "Nakamura, Satoru"
},
"authorAffiliation": {
"typeName": "authorAffiliation",
"multiple": false,
"typeClass": "primitive",
"value": "https://ror.org/057zh3y96",
"expandedvalue": {
"scheme": "http://www.grid.ac/ontology/",
"termName": "The University of Tokyo",
"@type": "https://schema.org/Organization"
}
}
}
]
},
{
"typeName": "datasetContact",
"multiple": true,
"typeClass": "compound",
"value": [
{
"datasetContactName": {
"typeName": "datasetContactName",
"multiple": false,
"typeClass": "primitive",
"value": "Nakamura, Satoru"
},
"datasetContactEmail": {
"typeName": "datasetContactEmail",
"multiple": false,
"typeClass": "primitive",
"value": "na.kamura.1263@gmail.com"
}
}
]
},
{
"typeName": "dsDescription",
"multiple": true,
"typeClass": "compound",
"value": [
{
"dsDescriptionValue": {
"typeName": "dsDescriptionValue",
"multiple": false,
"typeClass": "primitive",
"value": "My First Dataset"
}
}
]
},
{
"typeName": "subject",
"multiple": true,
"typeClass": "controlledVocabulary",
"value": [
"Arts and Humanities"
]
},
{
"typeName": "depositor",
"multiple": false,
"typeClass": "primitive",
"value": "Nakamura, Satoru"
},
{
"typeName": "dateOfDeposit",
"multiple": false,
"typeClass": "primitive",
"value": "2025-01-19"
}
]
}
},
"files": [
{
"label": "nakamura196.jpg",
"restricted": false,
"version": 1,
"datasetVersionId": 281093,
"dataFile": {
"id": 2514724,
"persistentId": "doi:10.70122/FK2/IHQZL3/B7JVQS",
"pidURL": "https://doi.org/10.70122/FK2/IHQZL3/B7JVQS",
"filename": "nakamura196.jpg",
"contentType": "image/jpeg",
"friendlyType": "JPEG Image",
"filesize": 53656,
"storageIdentifier": "s3://demo-dataverse-org:1948154820d-63733533ea7c",
"rootDataFileId": -1,
"md5": "72f08a8b07bacbe3b5cf021910fd26dc",
"checksum": {
"type": "MD5",
"value": "72f08a8b07bacbe3b5cf021910fd26dc"
},
"tabularData": false,
"creationDate": "2025-01-19",
"publicationDate": "2025-01-19",
"fileAccessRequest": true
}
}
]
}
Data Preparation
Dataverse sample data is stored at the following location.
https://github.com/artefactual/archivematica-sampledata/tree/master/SampleTransfers/Dataverse
Let’s store the JSON file downloaded from Dataverse as dataset.json in the metadata folder. Specifically, it looks like the following.

Here, referencing the following article, I prepared data in the mdx.jp object storage connected from GakuNin RDM, and processed it from Archivematica connected to the same object storage.
Processing in Archivematica
Set the Transfer type to “Dataverse”, select the folder created earlier, and start processing.

As a result, a METS file was created as follows. Whether it’s a problem with the data registration method or a bug is unclear, but dmdSec_1 was created twice. However, the contents of dataset.json were described in DDI format.
<mets:mets xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mets="http://www.loc.gov/METS/" xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/version1121/mets.xsd">
<mets:metsHdr CREATEDATE="2025-01-21T07:28:13" />
<mets:dmdSec ID="dmdSec_1" CREATED="2025-01-21T07:27:57" STATUS="original">
<mets:mdWrap MDTYPE="DDI">
<mets:xmlData>
<ddi:codebook xmlns:ddi="http://www.icpsr.umich.edu/DDI" version="2.5" xsi:schemaLocation="http://www.ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd">
<ddi:stdyDscr>
<ddi:citation>
<ddi:titlStmt>
<ddi:titl>nakamura196</ddi:titl>
<ddi:IDNo agency="doi">https://doi.org/10.70122/FK2/IHQZL3</ddi:IDNo>
</ddi:titlStmt>
<ddi:rspStmt>
<ddi:AuthEnty affiliation="https://ror.org/057zh3y96">Nakamura, Satoru</ddi:AuthEnty>
</ddi:rspStmt>
<ddi:distStmt>
<ddi:distrbtr>Demo Dataverse</ddi:distrbtr>
</ddi:distStmt>
<ddi:verStmt>
<ddi:version date="2025-01-20T01:30:23Z" type="RELEASED">1.0</ddi:version>
</ddi:verStmt>
</ddi:citation>
<ddi:dataAccs>
<ddi:useStmt>
<ddi:restrctn />
</ddi:useStmt>
</ddi:dataAccs>
</ddi:stdyDscr>
</ddi:codebook>
</mets:xmlData>
</mets:mdWrap>
</mets:dmdSec>
<mets:dmdSec ID="dmdSec_1" CREATED="2025-01-21T07:27:57" STATUS="original">
<mets:mdWrap MDTYPE="DDI">
<mets:xmlData>
<ddi:codebook xmlns:ddi="http://www.icpsr.umich.edu/DDI" version="2.5" xsi:schemaLocation="http://www.ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd">
<ddi:stdyDscr>
<ddi:citation>
<ddi:titlStmt>
<ddi:titl>nakamura196</ddi:titl>
<ddi:IDNo agency="doi">https://doi.org/10.70122/FK2/IHQZL3</ddi:IDNo>
</ddi:titlStmt>
<ddi:rspStmt>
<ddi:AuthEnty affiliation="https://ror.org/057zh3y96">Nakamura, Satoru</ddi:AuthEnty>
</ddi:rspStmt>
<ddi:distStmt>
<ddi:distrbtr>Demo Dataverse</ddi:distrbtr>
</ddi:distStmt>
<ddi:verStmt>
<ddi:version date="2025-01-20T01:30:23Z" type="RELEASED">1.0</ddi:version>
</ddi:verStmt>
</ddi:citation>
<ddi:dataAccs>
<ddi:useStmt>
<ddi:restrctn />
</ddi:useStmt>
</ddi:dataAccs>
</ddi:stdyDscr>
</ddi:codebook>
</mets:xmlData>
</mets:mdWrap>
</mets:dmdSec>
<mets:dmdSec ID="dmdSec_2" CREATED="2025-01-21T07:27:57" STATUS="original">
<mets:mdRef LABEL="dataset.json" xlink:href="metadata/dataset.json" MDTYPE="OTHER" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM" />
</mets:dmdSec>
<mets:amdSec ID="amdSec_1">
<mets:techMD ID="techMD_1">...</mets:techMD>
...
</mets:amdSec>
...
</mets:mets>
Summary
I found this to be a very useful feature when considering long-term preservation of research data. I hope this serves as a helpful reference for connecting Dataverse and Archivematica.