researchmap Achievement Registration: Options for Individual Researchers and a Playwright Implementation

This article was written in collaboration with generative AI. While facts have been cross-checked against official documentation where possible, errors may remain. Please verify primary sources before making important decisions.

This is a follow-up to Automating researchmap KAKENHI-Achievement Linking with Playwright. The previous article focused on linking; this one focuses on registering achievements themselves.

While drafting, I went through fact-checking and found that my initial assumption (“the only official write path is the Web UI”) was incorrect. This article reflects the corrected picture and explains where a custom Playwright script still adds value.

Write Paths to researchmap

1. Write API (researchmap.v2 API)

Official specification: researchmap.v2 API design document (V4.7)

A write API is published that supports add / update / delete operations across all 19 achievement types. Authentication uses OAuth 2.0 (JWT Bearer Flow), with tokens issued at https://api.researchmap.jp/oauth2/token.

Use of the API requires an application, and the documented operation looks like this:

The available application form is the Institutional version
Section 2(3)(4) of the WebAPI terms of service states that applications “judged to be from an individual rather than from a university or research institution (except where there is a reasonable basis)” may not be approved
A fixed IP address is required for use, and applications are renewed annually

The terms include a “reasonable basis” exception, but operation centers on institutional applications. For an individual researcher who only wants to register their own achievements, the CSV/JSON import or Web UI workflows discussed below are the practical options.

2. Official CSV / JSON / JSONL Import (available to individual users)

From Settings > “Researcher / Achievement Import”, the logged-in researcher can bulk-register their own achievements.

Supported formats: JSON / CSV / JSONL / ZIP
Covers all 19 achievement types
10 MB per file
Specifications: v2CSV field definitions, API design document (the same internal format)

So even individual users can effectively perform bulk writes if they can write JSONL.

A few notes:

Attachments such as PDF files are out of scope for import. The API design document marks fields under dataset as “not updatable” for every achievement type that has attachments (presentations, published_papers, misc, works, etc.). Fields like access_url are output-only
ZIP upload is defined as a way to bundle multiple JSON/CSV files; bundling PDFs is not part of the spec
The file upload step on the import screen is performed manually by the user

3. Manual Web UI

For one-off additions and edits, this is the fastest path. It becomes tedious at the scale of dozens of entries, and there is no version history.

4. Browser Automation with Playwright (this article)

Where the official CSV import covers what’s needed, that path is fine. The cases I wanted to automate end-to-end were:

Once the JSONL is written, complete everything (including file upload) without further manual steps
Attach PDFs as presentation materials — outside the scope of official import
Apply pinpoint updates to existing entries — for example, replacing “ongoing” with a specific end month, kept as a JSONL record

The script described below addresses these cases.

JSONL Format

researchmap/
├── migrations/
│   ├── awards/
│   ├── books_etc/
│   ├── committee_memberships/
│   ├── misc/
│   ├── presentations/
│   ├── profile/                # contains research_experience
│   ├── published_papers/
│   ├── research_projects/      # config for the linking script
│   └── works/
└── scripts/
    ├── link_research_project.py    # KAKENHI linking (previous article)
    └── register.py                 # the generic registration script

A minimal new-entry example (presentations):

{
  "insert": {"type": "presentations"},
  "merge": {
    "display": "disclosed",
    "presentation_title": {"ja": "NDL古典籍OCR-Liteを活用したiOSアプリの開発"},
    "presenters": {"ja": [{"name": "中村覚"}]},
    "event": {"ja": "次世代システム開発研究室 オンライン勉強会（2026年3月）"},
    "publication_date": "2026-03-19",
    "from_event_date": "2026-03-19",
    "to_event_date": "2026-03-19",
    "invited": false,
    "languages": ["jpn"],
    "presentation_type": "oral_presentation",
    "is_international_presentation": false
  }
}

For updates, specify the existing entry ID under insert.id. To close an “ongoing” affiliation at 2026-03, for example:

{
  "insert": {"type": "research_experience", "id": "35987576"},
  "merge": {"to_date": "2026-03"}
}

Only the fields you write under merge are updated; everything else is left as it was.

Compatibility with the Official Format

The JSONL above is compatible with the input format of the researchmap.v2 API.

The API design document (p. 25, “Parameters (POST BODY)”) defines the structure of placing insert / update / delete at the top level, with sibling keys merge / similar_merge / force / doc. Page 27 notes that specifying id under insert updates an existing entry. Field names such as presentation_title.ja, presenters.ja[].name, and from_event_date match the samples on pp. 101–105.

The same JSONL therefore works in either of:

Manual upload via researchmap’s “Researcher / Achievement Import” screen
Auto-fill via the custom Playwright script

Keeping the JSONL in the official-compatible shape leaves the option open to switch between the official import and Playwright as needed.

Commands

# Register all rows in the JSONL
python3 scripts/register.py migrations/presentations/0009_xxx.jsonl

# Process a specific row only
python3 scripts/register.py migrations/awards/0001_jsda_2026_awards.jsonl --line 1

# Attach a PDF as the presentation material
python3 scripts/register.py migrations/presentations/0012_icadl2025.jsonl \
    --pdf /path/to/slides.pdf

# Attach a PDF to an existing entry (no new registration)
python3 scripts/register.py --type presentations --id 53536588 --pdf slides.pdf

# Dry-run (do not submit)
python3 scripts/register.py <jsonl> --dry-run

Generic Handling of 8 Achievement Types

Across types, researchmap forms (built on CakePHP + NetCommons) share a common shape:

Form id: {Type}IndexAddDetailForm (e.g. PresentationsIndexAddDetailForm)
Input names: data[{Type}Index][_source][{field}]
Common fields: display, see_also, dataset (for types that support attachments)

The script captures this with a small per-type registry:

TYPE_META = {
    "presentations": {
        "form_id": "PresentationsIndexAddDetailForm",
        "index_camel": "PresentationsIndex",
        "title_field": "presentation_title",
        "supports_dataset": True,
    },
    "awards": {
        "form_id": "AwardsIndexAddDetailForm",
        "index_camel": "AwardsIndex",
        "title_field": "award_name",
        "supports_dataset": False,
    },
    # ... six more types
}

FILLERS = {
    "presentations": fill_presentations,
    "awards": fill_awards,
    # ... per-type input logic
}

Adding a new type means adding one entry to TYPE_META and one filler function.

A Detail That Took Some Time: CakePHP Hidden Fields

When I first wrote the script for updating the end month on a research_experience entry, the form would submit successfully but to_date would come back as null.

Inspecting the DOM revealed that two elements share the same name attribute:

INPUT type=hidden name="...[to_date][year]" value=""        (visible: False)
INPUT type=number name="...[to_date][year]" value=""        (visible: True)

These hidden duplicates are generated by CakePHP (likely related to AntiArrayAttack-style protections). When you query by name with document.querySelector, the hidden one comes back first. My JS-based value setter was writing into the hidden input and leaving the visible field empty.

The fix is to enumerate with querySelectorAll and pick the last non-hidden element:

const els = Array.from(document.querySelectorAll(`[name="${n}"]`));
const real = els.filter(e => (e.type || '').toLowerCase() !== 'hidden');
const target = real.length > 0 ? real[real.length - 1] : els[els.length - 1];
target.value = v;
target.dispatchEvent(new Event('input', {bubbles: true}));
target.dispatchEvent(new Event('change', {bubbles: true}));

I use to_date: "9999" as a marker for “ongoing”, so that when 9999 appears the script ticks the is_current checkbox and skips the date inputs.

Attaching Presentation PDFs

This is the part that the official import does not cover. In the form, expanding the “Details” section reveals a dataset file input, where setting dataset_type to published (presentation material) lets you upload a PDF.

file_input = page.locator(
    f'input[type="file"][name="{src}[dataset][dataset_name]"]').first
await file_input.set_input_files(pdf_abs)
await select_option(page, f"{src}[dataset][dataset_type]", "published")

The --id mode is convenient for retroactively attaching slides to presentations that were registered manually before.

Helper for Finding Unregistered Work

To check for missing achievements, I keep a small script that pulls from the OpenAlex API by ORCID and compares against my registered DOIs and titles on researchmap:

ORCID = "0000-0001-8245-7925"
r = requests.get(
    f"https://api.openalex.org/works?filter=author.orcid:{ORCID}&per-page=200")

ORCID-filtered queries can still pick up entries from people with similar identifiers, so the final review remains manual. Even so, it is more efficient than a fully manual check.

On Headless Mode

Running the browser in the background (headless) is a natural wish, but a straightforward switch to headless seems difficult at the moment.

Setting headless=True returns 403 Forbidden already at the login page. From a few quick checks, the rule appears to be simple: requests whose User-Agent string contains HeadlessChrome or python-requests get a 403. Plain curl requests, or requests with a normal browser UA, return 200.

For reference, what can be inferred from response headers:

IP is 160.74.72.121 (JSTNET, hosted by JST)
A cookie like TS012e09a0=... suggests an F5 BIG-IP ASM-style protection layer
The front-end is nginx

There are known techniques to get a headless browser through (replacing the User-Agent with a real Chrome string, hiding the webdriver flag with playwright-stealth and similar, or using the system Chrome instead of the bundled Chromium). Each of these works in a way that runs against the operator’s apparent intent of suppressing automated access.

The researchmap terms of service Section 4(10) asks users to refrain from “downloading information in large volumes by mechanical or comparable means.” robots.txt also disallows crawling under the researcher pages.

To respect the operator’s intent and the terms, the script does not adopt UA-spoofing or flag-suppression tricks.

On the other hand, storage_state-based cookie persistence is the same mechanism a normal browser uses to remember you are logged in. It is not impersonation:

First run (headed): log in normally → save cookies to scripts/.session.json
Subsequent runs (headed): load the saved session and skip the login page entirely

ctx_kwargs = {"viewport": {"width": 1400, "height": 1000}}
if SESSION_PATH.exists():
    ctx_kwargs["storage_state"] = str(SESSION_PATH)
context = await browser.new_context(**ctx_kwargs)
# ... after a successful login
await context.storage_state(path=str(SESSION_PATH))

In practice this saves about 10 seconds per run. The browser is still visible, but I am free to do other work while it runs, so the operational overhead is small.

The script keeps a --headless flag as well. Given the current behavior described above, it mostly stays as an internal switch.

Summary of Choices for an Individual User

Path	Individual use	File handover	PDF attach	Resilience to UI changes
Write API v2	△ (institution-centric)	◎	△ (dataset out of scope)	◎
Official CSV/JSON import	◎	manual UI	out of scope	○
Playwright automation	◎	fully automatic	◎	△
Manual Web UI	◎	manual UI	◎	─

The Playwright automation depends on form ids (e.g. PresentationsIndexAddDetailForm) and field naming conventions (data[XxxIndex][_source][...]). If the registration UI is restructured, selectors may break and the script may stop working. CSV/JSON import and the API are tied to input schemas, so they tend to be less affected by visual redesigns.

If the JSONL is kept in the official-compatible form, even when the Playwright side stops working, the same files can be uploaded to the official import screen and work can continue. Keeping migrations/ in a Git repository also leaves a natural log of what was registered when.

The scripts here are for personal workflow improvement. Replacing SLUG = "nakamura.satoru" and the JSONL files under migrations/ with your own should let you do the same. In keeping with the spirit of the terms of service, I use this only for registering my own achievements on my own account.

Write Paths to researchmap#

1. Write API (researchmap.v2 API)#

2. Official CSV / JSON / JSONL Import (available to individual users)#

3. Manual Web UI#

4. Browser Automation with Playwright (this article)#

JSONL Format#

Compatibility with the Official Format#

Commands#

Generic Handling of 8 Achievement Types#

A Detail That Took Some Time: CakePHP Hidden Fields#

Attaching Presentation PDFs#

Helper for Finding Unregistered Work#

On Headless Mode#

Summary of Choices for an Individual User#