Overview
I had the opportunity to create a transparent text PDF from a PDF using Google Cloud Vision API, so this is a personal note for future reference.
Below is an example of searching for simple.

Background
This time, we target PDFs consisting of a single page.
Procedure
Creating the Image
Create an image to be used as the OCR target.
With the default settings, the resulting image was blurry, so I set the resolution to 2x and performed position alignment considering the resolution in the process described below.
Install the following packages.
PyMuPDF
Pillow
import fitz # PyMuPDF
from PIL import Image
import json
from tqdm import tqdm
import io
# 入力PDFファイルと出力PDFファイル
input_pdf_path = "./input.pdf" # 単一ページのPDFファイル
output_pdf_path = "./output.pdf"
# 入力PDFファイルを開き、単一ページを読み込み
pdf_document = fitz.open(input_pdf_path)
page = pdf_document[0] # 最初のページを選択
# ページを画像としてレンダリングし、OCRでテキストを抽出
# pix = page.get_pixmap() # 解像度300 DPIでレンダリング
zoom = 2.0
# 解像度を上げるためにズーム設定
mat = fitz.Matrix(zoom, zoom)
pix = page.get_pixmap(matrix=mat)
img = Image.open(io.BytesIO(pix.tobytes("png")))
img.save("./image.png")
Google Cloud Vision API
Apply the Google Cloud Vision API to the output image.

{
"textAnnotations": [
{
"boundingPoly": {
"vertices": [
{
"x": 141,
"y": 152
},
{
"x": 1082,
"y": 152
},
{
"x": 1082,
"y": 1410
},
{
"x": 141,
"y": 1410
}
]
},
"description": "Sample PDF...",
"locale": "la"
},
{
"boundingPoly": {
"vertices": [
{
"x": 141,
"y": 159
},
{
"x": 363,
"y": 156
},
{
"x": 364,
"y": 216
},
{
"x": 142,
"y": 219
}
]
},
"description": "Sample"
},
{
"boundingPoly": {
"vertices": [
{
"x": 382,
"y": 156
},
{
"x": 506,
"y": 154
},
{
"x": 507,
"y": 213
},
{
"x": 383,
"y": 215
}
]
},
"description": "PDF"
},
...
Save the output JSON file with a name such as ./google_ocr.json.
Then, retrieve the OCR results as follows.
json_path = "./google_ocr.json"
# JSONファイルからOCRテキストデータを読み込む
with open(json_path, "r") as f:
response = json.load(f)
texts = response["textAnnotations"]
Creating the Transparent Text
Apply the results to the PDF using the following script. The key point is that it was necessary to “adjust the font size to check if it fits.”
# ページサイズを取得
rect = page.rect
# OCRテキストを透明テキストとして追加
if texts:
for text in tqdm(texts[1:]): # texts[0]はページ全体のテキストなのでスキップ
vertices = text["boundingPoly"]["vertices"]
x_min = min([v["x"] for v in vertices if v])
y_min = min([v["y"] for v in vertices if v])
x_max = max([v["x"] for v in vertices if v])
y_max = max([v["y"] for v in vertices if v])
x_min = x_min / zoom
y_min = y_min / zoom
x_max = x_max / zoom
y_max = y_max / zoom
# バウンディングボックスを定義
bbox_rect = fitz.Rect(x_min, y_min, x_max, y_max)
content = text["description"]
# 初期フォントサイズ
fontsize = 10
fits = False
# フォントサイズを調整して収まるか確認
while fontsize > 0:
res = page.insert_textbox(bbox_rect, content, fontsize=fontsize, color=(0, 0, 0, 0), render_mode=3, align=1)
if res >= 0:
fits = True
break
fontsize -= 1 # フォントサイズを縮小
if not fits:
print(f"'{content}' could not fit in the rectangle.")
# 変更したPDFを保存
pdf_document.save(output_pdf_path)
pdf_document.close()
Results
We target the PDF available at the following link.
https://pdfobject.com/pdf/sample.pdf
As a result, we were able to create a transparent text PDF as shown below.

Summary
I hope this article serves as a useful reference when OCR is needed for specific pages only.