Overview
We performed OCR processing on Japanese vertical-writing manuscript paper using two OCR services provided by Microsoft Azure (Azure OpenAI GPT-4 Vision and Azure Document Intelligence), and conducted a detailed comparative evaluation of the results.
Test Image
- Image Source: Canva template (400-character manuscript paper)
- URL: https://www.canva.com/ja_jp/templates/EAFbqUoH7P8/
- Image Characteristics:
- 20x20 grid, 400-character manuscript paper
- Vertical writing layout
- Light grid lines (cells)
- Distinction between title and body sections

Ground Truth
原稿のタイトル
佐藤ちあき
原稿用紙に書くテキストが入ります。作文や小論文を作ったり、小説を書いたりなどにご活用ください。
このテキストを使用する場合は、日本語の全角を使うことでマスにあった文字を打つことができます。手書きで使用したい場合は、このテキストを削除し、印刷してご使用ください。
1. Recognition Results by Azure OpenAI GPT-4.1
Recognized Text
原稿のタイトル
佐藤 ちあき
原稿用紙に書くテキストが入ります。作文や小論文を作ったり、小説を書いたりなどにご活用ください。
このテキストを使用する場合は、日本語の全角を使うことでマスにあった文字を打つことができます。手書きで使用したい場合は、このテキストを削除し、印刷してご使用ください。
Evaluation
GPT-4.1 demonstrated the following characteristics with vertical-writing manuscript paper:
- Correctly recognized the order of title and author name
- Accurately recognized the beginning of the body text
- Recognized descriptions related to manuscript paper cells
- Perfectly understood the vertical writing reading order (right to left)
- Maintained text continuity
Differences from Ground Truth
- “Sato Chiaki” had a full-width space added between family name and given name
- This is a reasonable interpretation since a space appears to exist in the image
- All other text was a perfect match
Accuracy Rating: 99%
2. Recognition Results by Azure Document Intelligence
Visualization of Recognized Regions

Evaluation
Document Intelligence demonstrated the following characteristics:
- Character recognition ability - Individual characters were accurately recognized (“Sato”, “Chiaki”, “manuscript”, etc.)
- Text fragmentation - Each cell was processed as an independent element, losing continuity
- Vertical reading order issues - Unable to properly handle the right-to-left flow of vertical writing
- Post-processing required - Some reconstruction is possible using coordinate information
- Detailed coordinate information - Precise position information for each character was perfectly captured
Accuracy Rating: Character recognition accuracy is approximately 80%, but with challenges in understanding vertical layout
Comparative Analysis
Performance Comparison Table
| Evaluation Item | Azure OpenAI GPT-4.1 | Document Intelligence |
|---|---|---|
| Character Recognition Accuracy | 5/5 (99%) | 4/5 (80%) |
| Vertical Writing Support | 5/5 Perfect | 2/5 Post-processing required |
| Context Understanding | 5/5 Excellent | 2/5 Limited |
| Reading Order Comprehension | 5/5 Perfect | 2/5 Reconstruction required |
| Manuscript Paper Handling | 5/5 Optimal | 3/5 Possible with adjustments |
| Coordinate Information | N/A | 5/5 Detailed retrieval possible |
| Processing Speed | 3/5 ~7 sec/image | 5/5 ~3 sec/image |
| Cost | 2/5 Expensive | 4/5 Affordable |
Visual Comparison
GPT-4.1 Recognition Pattern
- Understands the entire image and interprets it as a document
- Correctly grasps the vertical writing structure
- Extracts only text while ignoring the grid cells
Document Intelligence Recognition Pattern
- Processes each cell as an individual text block
- Recognizes vertical columns as “lines” (designed for horizontal writing)
- Reconstruction is possible by leveraging coordinate information
Conclusion
Key Findings
Overwhelming Superiority of GPT-4.1
- For Japanese vertical-writing documents, GPT-4.1 achieves near-perfect recognition
- Correctly understands document structure, reading order, and context
Characteristics of Document Intelligence
- No direct support for vertical Japanese; post-processing required
- High character detection accuracy but challenges in layout understanding
- Advanced processing possible by leveraging coordinate information
- High performance for horizontal-writing documents
Practical Recommendations
When to Choose Azure OpenAI GPT-4
- Digitization of Japanese vertical-writing documents
- OCR of historical documents and old manuscripts
- Processing manuscript paper
- When high-accuracy text extraction is required
When to Choose Document Intelligence
- When character position identification is important
- Processing horizontal-writing documents
- When cost is a priority for large-volume processing
- When processing speed is the top priority
- When advanced customization through post-processing is possible
Technical Considerations
This experiment clearly revealed the differences between the LLM-based vision model (GPT-4) and the traditional OCR engine (Document Intelligence) approaches:
- GPT-4: “Understands” images and performs intelligent processing considering context. Flexibly handles diverse layouts including vertical writing
- Document Intelligence: Specializes in high-precision character detection and coordinate extraction. Capable of advanced processing when combined with programmable post-processing
Both services have distinct strengths, and it is important to choose the right one based on the use case. For special layouts like Japanese vertical writing, GPT-4 currently has the advantage, though Document Intelligence can also handle it through post-processing using coordinate information.
Future Outlook
- Hoping for improvement in Document Intelligence’s support for Japanese vertical writing
- Speed improvements and cost reduction for GPT-4
- Possibility of hybrid approaches (GPT-4 for character recognition, DI for coordinate retrieval)