Overview

We performed OCR processing on Japanese vertical-writing manuscript paper using two OCR services provided by Microsoft Azure (Azure OpenAI GPT-4 Vision and Azure Document Intelligence), and conducted a detailed comparative evaluation of the results.

Test Image

  • Image Source: Canva template (400-character manuscript paper)
  • URL: https://www.canva.com/ja_jp/templates/EAFbqUoH7P8/
  • Image Characteristics:
    • 20x20 grid, 400-character manuscript paper
    • Vertical writing layout
    • Light grid lines (cells)
    • Distinction between title and body sections

Ground Truth

原稿のタイトル

佐藤ちあき

原稿用紙に書くテキストが入ります。作文や小論文を作ったり、小説を書いたりなどにご活用ください。
このテキストを使用する場合は、日本語の全角を使うことでマスにあった文字を打つことができます。手書きで使用したい場合は、このテキストを削除し、印刷してご使用ください。

1. Recognition Results by Azure OpenAI GPT-4.1

Recognized Text

原稿のタイトル
佐藤 ちあき
原稿用紙に書くテキストが入ります。作文や小論文を作ったり、小説を書いたりなどにご活用ください。
このテキストを使用する場合は、日本語の全角を使うことでマスにあった文字を打つことができます。手書きで使用したい場合は、このテキストを削除し、印刷してご使用ください。

Evaluation

GPT-4.1 demonstrated the following characteristics with vertical-writing manuscript paper:

  • Correctly recognized the order of title and author name
  • Accurately recognized the beginning of the body text
  • Recognized descriptions related to manuscript paper cells
  • Perfectly understood the vertical writing reading order (right to left)
  • Maintained text continuity

Differences from Ground Truth

  • “Sato Chiaki” had a full-width space added between family name and given name
    • This is a reasonable interpretation since a space appears to exist in the image
  • All other text was a perfect match

Accuracy Rating: 99%

2. Recognition Results by Azure Document Intelligence

Visualization of Recognized Regions

Evaluation

Document Intelligence demonstrated the following characteristics:

  • Character recognition ability - Individual characters were accurately recognized (“Sato”, “Chiaki”, “manuscript”, etc.)
  • Text fragmentation - Each cell was processed as an independent element, losing continuity
  • Vertical reading order issues - Unable to properly handle the right-to-left flow of vertical writing
  • Post-processing required - Some reconstruction is possible using coordinate information
  • Detailed coordinate information - Precise position information for each character was perfectly captured

Accuracy Rating: Character recognition accuracy is approximately 80%, but with challenges in understanding vertical layout

Comparative Analysis

Performance Comparison Table

Evaluation ItemAzure OpenAI GPT-4.1Document Intelligence
Character Recognition Accuracy5/5 (99%)4/5 (80%)
Vertical Writing Support5/5 Perfect2/5 Post-processing required
Context Understanding5/5 Excellent2/5 Limited
Reading Order Comprehension5/5 Perfect2/5 Reconstruction required
Manuscript Paper Handling5/5 Optimal3/5 Possible with adjustments
Coordinate InformationN/A5/5 Detailed retrieval possible
Processing Speed3/5 ~7 sec/image5/5 ~3 sec/image
Cost2/5 Expensive4/5 Affordable

Visual Comparison

GPT-4.1 Recognition Pattern

  • Understands the entire image and interprets it as a document
  • Correctly grasps the vertical writing structure
  • Extracts only text while ignoring the grid cells

Document Intelligence Recognition Pattern

  • Processes each cell as an individual text block
  • Recognizes vertical columns as “lines” (designed for horizontal writing)
  • Reconstruction is possible by leveraging coordinate information

Conclusion

Key Findings

  1. Overwhelming Superiority of GPT-4.1

    • For Japanese vertical-writing documents, GPT-4.1 achieves near-perfect recognition
    • Correctly understands document structure, reading order, and context
  2. Characteristics of Document Intelligence

    • No direct support for vertical Japanese; post-processing required
    • High character detection accuracy but challenges in layout understanding
    • Advanced processing possible by leveraging coordinate information
    • High performance for horizontal-writing documents

Practical Recommendations

When to Choose Azure OpenAI GPT-4

  • Digitization of Japanese vertical-writing documents
  • OCR of historical documents and old manuscripts
  • Processing manuscript paper
  • When high-accuracy text extraction is required

When to Choose Document Intelligence

  • When character position identification is important
  • Processing horizontal-writing documents
  • When cost is a priority for large-volume processing
  • When processing speed is the top priority
  • When advanced customization through post-processing is possible

Technical Considerations

This experiment clearly revealed the differences between the LLM-based vision model (GPT-4) and the traditional OCR engine (Document Intelligence) approaches:

  • GPT-4: “Understands” images and performs intelligent processing considering context. Flexibly handles diverse layouts including vertical writing
  • Document Intelligence: Specializes in high-precision character detection and coordinate extraction. Capable of advanced processing when combined with programmable post-processing

Both services have distinct strengths, and it is important to choose the right one based on the use case. For special layouts like Japanese vertical writing, GPT-4 currently has the advantage, though Document Intelligence can also handle it through post-processing using coordinate information.

Future Outlook

  • Hoping for improvement in Document Intelligence’s support for Japanese vertical writing
  • Speed improvements and cost reduction for GPT-4
  • Possibility of hybrid approaches (GPT-4 for character recognition, DI for coordinate retrieval)