Introduction
FromThePage is a web platform specialized for crowdsourced transcription of historical documents. It enables efficient management of the process of converting handwritten manuscripts and printed materials into text data with the help of volunteers.
Adopted by libraries, museums, and archives worldwide—including the Library of Congress and the Smithsonian Institution—FromThePage has become a core tool for document digitization in Digital Humanities (DH).
Key Features of FromThePage
Crowdsourced Transcription
The defining feature of FromThePage is the ability to conduct large-scale transcription projects through crowdsourcing.
- Transcription Workflow: Staged workflow of transcription, review, and approval
- Rich Text Editor: Support for Markdown and Wiki markup
- Version Control: Complete edit history retention
- Volunteer Management: Contribution tracking, badge system
IIIF Support
FromThePage fully supports IIIF (International Image Interoperability Framework).
# Example IIIF manifest import
https://example.org/iiif/manifest/12345
- IIIF Manifest Import: Directly import images from external IIIF-compatible repositories
- Mirador Integration: Seamless integration with the Mirador IIIF image viewer
- IIIF Content Search API: Publish transcribed text via IIIF Search API
- IIIF Manifest Export: Output manifests containing transcribed text
Structured Data Input
Beyond simple transcription, FromThePage supports structured data entry.
| Feature | Description |
|---|---|
| Field-based Transcription | Collect structured data with custom forms |
| Metadata Entry | Record document metadata by field |
| Spreadsheet Transcription | Specialized for tabular data transcription |
| Markup Transcription | Transcription with TEI-XML and other markup |
OCR Integration
In addition to manual transcription, FromThePage includes OCR (Optical Character Recognition) integration.
- OCR Pre-processing: Use OCR output as initial text for transcription
- OCR Correction Workflow: Volunteers correct OCR results
- HTR Support: Integration with Handwritten Text Recognition
Adoption Examples
Library of Congress
The Library of Congress uses FromThePage for its By the People (formerly Transcribe) project, where volunteers transcribe tens of thousands of historical documents including presidential letters, Civil War records, and women’s suffrage movement documents.
Smithsonian Institution
The Smithsonian Transcription Center runs museum collection transcription projects powered by FromThePage. Natural history specimen labels, diaries, and field notes are among the materials being transcribed.
Other Institutions
- Huntington Library: Medieval manuscript transcription
- Texas State Library: Historical legal document transcription
- Various university libraries: Special collections digitization
Technical Features
Export Formats
Transcription results can be exported in various formats.
- TEI-XML: Standard markup format for humanities texts
- Plain Text: Simple text output
- HTML: For web publishing
- CSV: Tabular output for structured data
- IIIF Manifest: For display in IIIF-compatible viewers
- ALTO XML: Text with page layout information
API
# FromThePage API usage example
curl -H "Accept: application/json" \
https://fromthepage.com/api/v1/collections
The RESTful API enables programmatic retrieval of transcription data.
Pricing Plans
FromThePage offers multiple plans, including a free tier.
| Plan | Features |
|---|---|
| Free | Basic transcription features, public projects |
| Institutional | Private projects, customization, support |
| Enterprise | Large-scale projects, SLA, dedicated environment |
For small-scale projects and individual research, the free plan provides sufficient functionality.
Conclusion
FromThePage is a powerful platform specialized for crowdsourced transcription of historical documents. With comprehensive IIIF integration, structured data input, OCR integration, and diverse export formats, it provides all the features needed for DH research.
As demonstrated by its adoption at major libraries and museums worldwide, it is a reliable tool capable of managing large-scale transcription projects. With a free plan available, we recommend starting with a small-scale project to experience its capabilities.