How Calil's Cross-Library Search Works: Staged Result Retrieval with search and polling

This article was written with the assistance of generative AI. Factual claims have been checked against official documentation where possible, but errors may remain. Please verify with primary sources before making important decisions.

When you use Calil, a service for cross-searching the holdings of libraries, you notice that results come back quickly even though it searches many libraries and databases at once. I had a vague sense that the cross-search was running asynchronously behind the scenes, but I did not know how it was actually implemented.

While watching the Hirogari Search, a literature-discovery cross-search provided by the Osaka Prefectural Library, in the browser developer tools, I saw two requests fired in succession: search and polling. This article is a learning record of what those two do, confirmed by reading the source code Calil publishes.

⚠️ This article does not explain how to use a public API.

The endpoints observed here (unitrac-*.calil.jp) are instances of "Unitrad API," a business API that each library operates under a contract with Calil. As of now, it is not an API published for third-party use. The terms of use are not public either, so out of respect for the intent of those terms, I limited my checks to a small number of manual requests against a service I am a user of. To use a cross-search as an API through proper channels, the entry point is the documented Library API (free, with an application-key request) or a direct inquiry to Calil.

The client-side code I read is CALIL/unitrad-ui (MIT License, Copyright (c) CALIL Inc.), which Calil publishes as open-source software (OSS). I was able to trace the behavior because the source is public.

/search returns only a job acknowledgement

When you search in Hirogari Search, this request fires first.

GET /search?free=<keyword>&region=<region ID>

Looking at the response, it contains almost no search results. Instead, it contains something like this.

{
  "uuid": "unitrac-tokyo-1-xxxxxxxx-...",
  "version": 1,
  "running": true,
  "books": [],
  "remains": [ ... ],
  "errors": []
}

As far as I can tell, /search only returns an acknowledgement that the search was accepted. uuid is the identifier of the search job (a UUID, universally unique identifier), and running: true is a flag meaning "still collecting." The holdings data itself is not included here.

I had vaguely assumed that a cross-search waits for responses from all target libraries before returning results. At least in this implementation that is not the case: it first completes only the acceptance and returns a response immediately.

Receiving diffs via /polling

Results are retrieved with a separate request. Holding the uuid, the client repeatedly sends the following request.

GET /polling?uuid=<uuid>&version=<N>&diff=1&timeout=10

Each parameter has a role.

Parameter	Role
`uuid`	Which search job's results are wanted
`version`	A cursor indicating how far results have been received
`diff=1`	Return only the difference, not the full list
`timeout=10`	The server may hold the connection for up to 10 seconds

Because of timeout=10, the server holds the connection for up to 10 seconds until new results appear, and returns the moment something appears. This reduces empty requests while still delivering results at short intervals. This technique is called long polling.

The response is in a diff format.

{
  "version": 5,
  "running": true,
  "books_diff": {
    "insert": [ { "title": "...", "author": "...", "holdings": [ ... ] } ],
    "update": [ { "_idx": 12, "holdings": [ ... ] } ]
  }
}

insert holds newly found books, and update holds additions to books already displayed (the _idx-th one). The client merges these diffs into its local list and repeats /polling until running becomes false. Results from faster-responding libraries appear first, and results from slower libraries flow in afterward.

Reading the source to confirm

Up to here this was observation through the browser developer tools. To confirm it more precisely, I cloned unitrad-ui and read the client implementation. The core is the api class in src/js/api.js. Extracting the key parts, it looks like this (MIT License, CALIL Inc.).

search(query) {
  // Start the job via /search. On failure, retry after 1000ms.
  _request('search').query(stripQuery(query)).end((err, res) => {
    if (!err) this.receive(res.body);
    else setTimeout(() => this.search(query), 1000);
  });
}

polling() {
  // Long-poll with version as cursor, diff=1 + timeout=10
  _request('polling')
    .query({ uuid: this.data.uuid, version: this.data.version, diff: 1, timeout: 10 })
    .end((err, res) => {
      if (res.body === null) setTimeout(() => this.polling(), 100); // nothing yet
      else this.receive(res.body);
    });
}

receive(data) {
  if (data.books_diff) {
    // Append insert, patch update in place by _idx
    Array.prototype.push.apply(this.data.books, data.books_diff.insert);
    // ... (overwriting version/running etc., merging update) ...
  } else {
    this.data = data; // first time (the full /search response)
  }
  this.callback(this.data);
  if (data.running === true) {
    if (data.version === 1 && this.data.books.length === 0) {
      setTimeout(() => this.polling(), 20);   // initial burst
    } else {
      setTimeout(() => this.polling(), 500);  // after results appear
    }
  }
}

A few things became visible that observation alone did not reveal.

The polling interval is not fixed

This is the end of receive(). Polling repeats while running continues, but the interval changes with the situation.

Right after the search (version is 1 and the count is zero), it re-polls after 20ms
Once results start appearing, it re-polls after 500ms

The interval is short until the first results appear, then switches to a longer one once they do. The perceived speed seems to be supported in part by this initial interval.

Error handling

If /search fails, it retries after 1000ms; if the /polling response is null (no results yet), it resends after 100ms. Because a cross-search targets many libraries and databases, some responses may fail, so retries are built into the client side.

Showing estimated holdings first

Looking at the type definitions (flow/declare.js), a book's data has the following two fields.

holdings:           Array<number>   // IDs of confirmed holding libraries
estimated_holdings: Array<number>   // IDs of estimated holding libraries

estimated_holdings — that is, estimated holdings. holdingsFromBook() in sort.js, which computes the number of holding libraries, combines the confirmed (holdings) and the estimated (estimated_holdings) to produce the count.

let _holdings = book.holdings.concat();
if (book.estimated_holdings) {
  _holdings = [...new Set(_holdings.concat(book.estimated_holdings))];
}
return countHoldings(_holdings, includes);

While waiting for slower libraries, it appears to display the library count including estimated holdings. Since the screen fills up even before collection is complete, this seems to be one reason the search feels fast.

Sorting and filtering, as far as I checked, are also completed on the client side (sort.js). It includes normalizePubdate(), which normalizes Japanese era years into the Western calendar (parsing "Reiwa," "Heisei," "first year," and so on), and normalizeIsbn(), which aligns ISBNs (International Standard Book Number, a book identifier) of differing digit counts. The server streams raw data, and the browser handles the formatting.

Refining accuracy only for opened books

view/book.jsx has a process called doDeepSearch(). It is the function that starts the re-search, and doUpdate(), which receives its result, carries a comment that reads "high-precision experiment."

doDeepSearch() {
  if (this.props.opened && !this.api) {
    this.api = new api({ isbn: this.props.book.isbn, region: this.props.region },
                       this.doUpdate.bind(this));
  }
}

When a book's details are opened, it runs the cross-search once more, keyed by that ISBN. Because the list search casts a wide net with free-text terms, the holdings of individual books can remain estimated. So, at the point where the user signals "I want to see this book," it re-searches with the ISBN specified and replaces the holdings from estimated to confirmed — that appears to be the flow. The re-search is delayed by one second with setTimeout, which is presumably a grace period to avoid a search firing immediately on a misclick.

Rather than confirming everything at the list stage, it displays with estimates and re-searches only the opened books afterward — that is the structure.

How the work is divided

After reading the source, here is how I organized the division of work.

Layer	Responsibility
Server (Unitrad)	Collects from each library and streams what it can as diffs, even incomplete or estimated
`api.js`	Receives diffs via `search` and `polling`, merges them into a single dataset
`sort.js`	Computes the count from confirmed plus estimated holdings, normalizes era years and ISBNs, sorts
`book.jsx`	Displays with estimates first, replaces with confirmed via an ISBN re-search only for opened books

As far as I checked, the reasons the cross-search feels fast can be organized into these three points.

It does not wait for all target libraries to finish; it receives diffs in sequence via search and polling
estimated_holdings (estimated holdings) lets it display the library count before collection is complete
Sorting and filtering are completed on the client side

Accuracy, on the other hand, is backed up afterward by the ISBN re-search (deep search) for opened books. I understood it as a design that splits roles between the server's responsiveness and the client's follow-up refinement.

Takeaways

This cross-library search includes several implementation details aimed at responsiveness. Here are some points that seem generally applicable.

A heavy process can return just a job acknowledgement first, rather than making the caller wait for results
The combination of long polling, a version cursor, and diffs is an option for near-real-time result delivery
The polling interval need not be fixed; even just shortening the initial interval can change the perceived speed
Producing estimated values first and replacing them with confirmed ones later lets you balance speed and accuracy by shifting them along the time axis

A blog post by Calil explains that the slowness of cross-search stems not from the library systems themselves but from the design of the cross-search system. After reading the source, that explanation became easier to grasp.

Scope of this article

To repeat: this article is not a guide to a public API. It is a learning record of reading published source code (unitrad-ui) and confirming, by a small number of manual requests, the behavior of a service I am a user of. What is published under the MIT License is the client-side UI code; that does not mean the backend API is free to use. I also did not probe by enumerating region IDs. Reading the published source was enough to grasp the mechanism.

References

Calil / Library API
CALIL/unitrad-ui — GitHub (MIT License)
Launching "Calil Unitrad API" — Calil's blog (Japanese)

📚How Calil's Cross-Library Search Works: Staged Result Retrieval with search and polling

/search returns only a job acknowledgement

Receiving diffs via /polling

Reading the source to confirm

The polling interval is not fixed

Error handling

Showing estimated holdings first

Refining accuracy only for opened books

How the work is divided

Takeaways

Scope of this article

References

😸Partially Implementing Exact Non-Match Search with Fuse.js

🔖Searching Including Private Posts with WordPress REST API

📚Using the Archivematica API to Perform Transfer Through AIP Download

Comments

/search returns only a job acknowledgement

Receiving diffs via /polling

Reading the source to confirm

The polling interval is not fixed

Error handling

Showing estimated holdings first

Refining accuracy only for opened books

How the work is divided

Takeaways

Scope of this article

References

Related Articles

😸Partially Implementing Exact Non-Match Search with Fuse.js

🔖Searching Including Private Posts with WordPress REST API

📚Using the Archivematica API to Perform Transfer Through AIP Download

Comments