TL;DR
Transkribus is an AI-based Handwritten Text Recognition (HTR) platform. Supporting over 100 languages, it can recognize not only printed text but also handwriting. Its custom model training feature allows you to optimize recognition accuracy for specific handwriting styles and scripts. It has become an essential tool for DH researchers working on historical document transcription.
What is Transkribus?
Transkribus originated as a project at the University of Innsbruck, Austria, and is currently managed by READ-COOP SCE (a European cooperative). Its development has been supported by funding from the EU’s Horizon 2020 programme and other sources.
Key features include:
- HTR (Handwritten Text Recognition): Deep learning-based handwriting recognition engine
- 100+ languages: Supports diverse writing systems including Latin, Cyrillic, Arabic, and Hebrew scripts
- Custom model training: Train recognition models on your own data for high-accuracy recognition specialized to specific documents
- Layout analysis: Automatic detection of text regions, lines, and paragraphs within pages
- Collaborative work: Supports team-based collaboration for efficient large-scale transcription projects
Key Features
Text Recognition (HTR/OCR)
The core functionality of Transkribus. Pre-trained general models allow you to start text recognition immediately. Available public models cover various periods and languages, including medieval Latin manuscripts, early modern German Kurrent script, and English cursive handwriting.
Custom Model Training
One of the most powerful features. With approximately 50 pages of Ground Truth (images with correct transcription text), you can train a model specialized for specific handwriting or scripts. Trained models can also be shared with other users.
Layout Analysis
Automatically analyzes document image layouts, detecting text regions (TextRegion), text lines (TextLine), and baselines. It handles complex layouts including multi-column text, tables, and marginal notes.
Transkribus Lite
A browser-based interface that requires no installation. It provides basic HTR functionality and layout analysis, making it suitable for quick trials.
How to Use
Basic Workflow
- Create an account: Register at Transkribus
- Upload documents: Upload image files (JPEG, PNG, TIFF) or PDFs
- Layout analysis: Run automatic layout analysis to detect text regions and lines
- Select a model: Choose an appropriate HTR model (searchable from the public model list)
- Run text recognition: Execute HTR with the selected model
- Review and correct: Review and correct recognition results while comparing against the original images
- Export: Export in formats such as TEI-XML, PAGE XML, ALTO XML, or plain text
Pricing
Transkribus uses a pay-per-use model. A free tier (500 credits per month) is available, allowing small-scale use at no cost. Subscription plans are available for large-scale projects.
Practical Applications in DH Research
Historical Document Transcription
Transcribe handwritten documents from archives to build full-text searchable digital archives. For example, transcribing Edo-period historical documents or Meiji-era handwritten government records.
Large-Scale Corpus Building
By combining custom model training with automated recognition, efficiently transcribe documents at the scale of thousands of pages to build corpora for text mining and linguistic analysis.
Comparative Document Studies
Transcribe different manuscript copies of the same text to analyze textual variations in stemmatological studies. TEI-XML export facilitates the creation of critical editions.
Citizen Science Projects
Leverage Transkribus’s collaborative features to run crowdsourced transcription projects with volunteers. Quality control features ensure high-quality outcomes even with citizen participation.
Comparison with Other Tools
| Feature | Transkribus | Google Cloud Vision | Tesseract OCR |
|---|---|---|---|
| HTR (handwriting) | High accuracy | Basic | Not supported |
| Custom models | Yes | Via AutoML | Trainable |
| Historical documents | Specialized | General | General |
| Layout analysis | Advanced | Basic | Basic |
| Pricing | Pay-per-use | Pay-per-use | Free (OSS) |
| Output formats | TEI/ALTO/PAGE | JSON | Text/hOCR |
Conclusion
Transkribus is the most proven platform for historical document transcription. Its AI-based HTR engine and custom model training capabilities handle handwritten documents across various periods and languages. In DH research, transcription is the starting point for many analyses, and Transkribus provides the essential foundation for that work.