Azure OpenAI GPT-4 vs Document Intelligence: Comparative Evaluation of Japanese Vertical Text OCR

Overview

We performed OCR processing on Japanese vertical-writing manuscript paper using two OCR services provided by Microsoft Azure (Azure OpenAI GPT-4 Vision and Azure Document Intelligence), and conducted a detailed comparative evaluation of the results.

Test Image

Image Source: Canva template (400-character manuscript paper)
URL: https://www.canva.com/ja_jp/templates/EAFbqUoH7P8/
Image Characteristics:
- 20x20 grid, 400-character manuscript paper
- Vertical writing layout
- Light grid lines (cells)
- Distinction between title and body sections

Ground Truth

原稿のタイトル

佐藤ちあき

原稿用紙に書くテキストが入ります。作文や小論文を作ったり、小説を書いたりなどにご活用ください。
このテキストを使用する場合は、日本語の全角を使うことでマスにあった文字を打つことができます。手書きで使用したい場合は、このテキストを削除し、印刷してご使用ください。

1. Recognition Results by Azure OpenAI GPT-4.1

Recognized Text

原稿のタイトル
佐藤　ちあき
原稿用紙に書くテキストが入ります。作文や小論文を作ったり、小説を書いたりなどにご活用ください。
このテキストを使用する場合は、日本語の全角を使うことでマスにあった文字を打つことができます。手書きで使用したい場合は、このテキストを削除し、印刷してご使用ください。

Evaluation

GPT-4.1 demonstrated the following characteristics with vertical-writing manuscript paper:

Correctly recognized the order of title and author name
Accurately recognized the beginning of the body text
Recognized descriptions related to manuscript paper cells
Perfectly understood the vertical writing reading order (right to left)
Maintained text continuity

Differences from Ground Truth

“Sato Chiaki” had a full-width space added between family name and given name
- This is a reasonable interpretation since a space appears to exist in the image
All other text was a perfect match

Accuracy Rating: 99%

2. Recognition Results by Azure Document Intelligence

Visualization of Recognized Regions

Evaluation

Document Intelligence demonstrated the following characteristics:

Character recognition ability - Individual characters were accurately recognized (“Sato”, “Chiaki”, “manuscript”, etc.)
Text fragmentation - Each cell was processed as an independent element, losing continuity
Vertical reading order issues - Unable to properly handle the right-to-left flow of vertical writing
Post-processing required - Some reconstruction is possible using coordinate information
Detailed coordinate information - Precise position information for each character was perfectly captured

Accuracy Rating: Character recognition accuracy is approximately 80%, but with challenges in understanding vertical layout

Comparative Analysis

Performance Comparison Table

Evaluation Item	Azure OpenAI GPT-4.1	Document Intelligence
Character Recognition Accuracy	5/5 (99%)	4/5 (80%)
Vertical Writing Support	5/5 Perfect	2/5 Post-processing required
Context Understanding	5/5 Excellent	2/5 Limited
Reading Order Comprehension	5/5 Perfect	2/5 Reconstruction required
Manuscript Paper Handling	5/5 Optimal	3/5 Possible with adjustments
Coordinate Information	N/A	5/5 Detailed retrieval possible
Processing Speed	3/5 ~7 sec/image	5/5 ~3 sec/image
Cost	2/5 Expensive	4/5 Affordable

Visual Comparison

GPT-4.1 Recognition Pattern

Understands the entire image and interprets it as a document
Correctly grasps the vertical writing structure
Extracts only text while ignoring the grid cells

Document Intelligence Recognition Pattern

Processes each cell as an individual text block
Recognizes vertical columns as “lines” (designed for horizontal writing)
Reconstruction is possible by leveraging coordinate information

Conclusion

Key Findings

Overwhelming Superiority of GPT-4.1
- For Japanese vertical-writing documents, GPT-4.1 achieves near-perfect recognition
- Correctly understands document structure, reading order, and context
Characteristics of Document Intelligence
- No direct support for vertical Japanese; post-processing required
- High character detection accuracy but challenges in layout understanding
- Advanced processing possible by leveraging coordinate information
- High performance for horizontal-writing documents

Practical Recommendations

When to Choose Azure OpenAI GPT-4

Digitization of Japanese vertical-writing documents
OCR of historical documents and old manuscripts
Processing manuscript paper
When high-accuracy text extraction is required

When to Choose Document Intelligence

When character position identification is important
Processing horizontal-writing documents
When cost is a priority for large-volume processing
When processing speed is the top priority
When advanced customization through post-processing is possible

Technical Considerations

This experiment clearly revealed the differences between the LLM-based vision model (GPT-4) and the traditional OCR engine (Document Intelligence) approaches:

GPT-4: “Understands” images and performs intelligent processing considering context. Flexibly handles diverse layouts including vertical writing
Document Intelligence: Specializes in high-precision character detection and coordinate extraction. Capable of advanced processing when combined with programmable post-processing

Both services have distinct strengths, and it is important to choose the right one based on the use case. For special layouts like Japanese vertical writing, GPT-4 currently has the advantage, though Document Intelligence can also handle it through post-processing using coordinate information.

Future Outlook

Hoping for improvement in Document Intelligence’s support for Japanese vertical writing
Speed improvements and cost reduction for GPT-4
Possibility of hybrid approaches (GPT-4 for character recognition, DI for coordinate retrieval)

Overview#

Test Image#

Ground Truth#

1. Recognition Results by Azure OpenAI GPT-4.1#

Recognized Text#

Evaluation#

Differences from Ground Truth#

2. Recognition Results by Azure Document Intelligence#

Visualization of Recognized Regions#

Evaluation#

Comparative Analysis#

Performance Comparison Table#

Visual Comparison#

GPT-4.1 Recognition Pattern#

Document Intelligence Recognition Pattern#

Conclusion#

Key Findings#

Practical Recommendations#

When to Choose Azure OpenAI GPT-4#

When to Choose Document Intelligence#

Technical Considerations#

Future Outlook#

Overview

Test Image

Ground Truth

1. Recognition Results by Azure OpenAI GPT-4.1

Recognized Text

Evaluation

Differences from Ground Truth

2. Recognition Results by Azure Document Intelligence

Visualization of Recognized Regions

Evaluation

Comparative Analysis

Performance Comparison Table

Visual Comparison

GPT-4.1 Recognition Pattern

Document Intelligence Recognition Pattern

Conclusion

Key Findings

Practical Recommendations

When to Choose Azure OpenAI GPT-4

When to Choose Document Intelligence

Technical Considerations

Future Outlook