KotenOCR Android Development Log — Migrating from Flutter to Kotlin

This post documents the development of the Android version of KotenOCR, a classical Japanese text OCR app. We initially built it with Flutter, then migrated to native Kotlin. Along the way, we gained insights about framework selection when using AI coding tools (Claude Code).

Background

KotenOCR recognizes kuzushiji (classical Japanese cursive script) using ONNX Runtime models entirely on-device. The iOS version was already built in Swift, and we needed to add Android support.

We chose Flutter initially, expecting to share code between iOS and Android.

Challenges Encountered with Flutter

1. Image Processing Speed

We benchmarked converting a 6642x4990 pixel (~33 megapixel) image to an RGBA byte array:

Implementation	Time
Dart (`image` package `getPixel()`)	14,549ms
Kotlin (BitmapFactory + getPixels)	~500ms

A roughly 30x difference in this case. However, this was caused by calling getPixel() per pixel, which creates a Pixel object on each invocation — 33 million object allocations.

What we learned later: The image package v4.x provides image.getBytes(order: ChannelOrder.rgba) for direct buffer access, eliminating the per-pixel loop entirely. dart:ffi was another option for zero-copy native function calls. We built a Kotlin bridge before exploring these alternatives.

2. Parallel Inference with flutter_onnxruntime

The iOS version uses Swift’s withThrowingTaskGroup for 4-way parallel recognition. We attempted the same in Flutter:

await Future.wait(batch); // 4 recognition tasks "in parallel"

Result: no speedup (0.98x).

Investigation revealed that the flutter_onnxruntime plugin’s method channel handler runs on the platform (main) thread without using makeBackgroundTaskQueue() (available since Flutter 3.0), so native calls are serialized.

Alternative approaches we did not try: an FFI-based ONNX package (bypassing method channels), Dart Isolates with BackgroundIsolateBinaryMessenger, or forking the plugin to add background task queues.

3. Parallel Recognition via Kotlin Coroutines

We built a Kotlin bridge using Dispatchers.Default.limitedParallelism(4) with async/awaitAll, achieving a 5.46x speedup on the recognition step.

val dispatcher = Dispatchers.Default.limitedParallelism(concurrency)
return runBlocking {
    boxes.map { box ->
        async(dispatcher) {
            val tensor = cropAndPreprocess(rgbaPixels, imageWidth, imageHeight, box)
            recognizeSingle(ortEnv, sess, tensor)
        }
    }.awaitAll()
}

4. Large Data Transfer via Method Channels

Returning a ~12MB Float32 tensor through the method channel caused the app to freeze. While there is no hard size limit on method channels, StandardMethodCodec serialization on the platform thread blocks the UI for large payloads.

Workaround: Write the tensor to a file and return only the path. Other options included BasicMessageChannel with BinaryCodec or dart:ffi for direct memory sharing.

Overall Benchmark Results

Pixel 9a (Android 16), 6642x4990 image, 15 detected regions:

Step	Dart-only	Kotlin-optimized	Speedup
Image conversion + preprocessing	14,549ms	5,885ms	2.5x
Detection (RTMDet inference)	13,010ms	10,630ms	1.2x
Recognition (PARSeq x15)	45,675ms	10,266ms	4.4x
Total	~73s	~27s	2.7x

5. Platform-Specific UI Issues

Gesture conflicts in cropping — Android’s edge-swipe back gesture interfered with crop handle dragging
Status bar overlap — SafeArea behavior required adjustment
Adaptive Icons — Android-specific icon format configuration needed

Architecture in Retrospect

After optimization, the Flutter version’s architecture looked like this:

Image decoding → Kotlin (BitmapFactory)
Preprocessing (resize + normalization) → Kotlin (ByteBuffer)
ONNX inference → Kotlin (OrtSession direct calls)
Parallel recognition → Kotlin (Coroutines)
UI → Flutter (Dart)

In hindsight, we should have explored Dart-side optimizations (direct buffer access, FFI-based ONNX packages, Isolate parallelism) more thoroughly. The Kotlin bridge was a reliable solution, but Flutter-native approaches may have been sufficient.

Observations on AI-Assisted Development

This project was developed using Claude Code. Some observations on framework selection in that context:

Flutter’s General Advantages

Single language (Dart) — Lower learning curve
Hot Reload — Fast UI iteration
Cross-platform — Reduced effort for iOS/Android simultaneous development

What Changes (and What Doesn’t) with AI Tools

Aspect	Observation
Language learning cost	AI handles multiple languages equally well, reducing the “single language” advantage
Hot Reload	AI generates code but cannot see the screen — Hot Reload remains useful for human visual verification
Cross-platform	Even if AI can write two native versions, maintenance costs remain: dual bug fixes, feature parity, two CI/CD pipelines
Performance	Native code eliminates bridge layers in some cases, though this depends on the app’s nature

For compute-intensive apps like this one (ONNX inference, large-scale pixel processing), native development had clear benefits. For UI-centric or typical CRUD apps, Flutter’s cross-platform advantages likely hold even with AI-assisted development.

Value of Flutter for Prototyping

The Flutter version provided knowledge that directly informed the Kotlin implementation:

Rapid validation of the OCR pipeline design
Confirmation of UI state transitions
Discovery of ONNX Runtime input/output specifics (Int64 type issues)
Identification of performance bottleneck locations

Technical Notes

flutter_onnxruntime Tips

Use Int64List.fromList() explicitly when passing Int64 values to OrtValue.fromList — plain List<int> becomes Int32
Large binary data over method channels blocks the UI — consider BinaryCodec, file I/O, or dart:ffi
Future.wait does not parallelize inference with flutter_onnxruntime’s current implementation — FFI-based packages or plugin forks may help
Use inSampleSize to subsample large images before passing to native code

Parallel Inference with Kotlin + ONNX Runtime

// Limit intra-op threads per session to reduce contention
opts.setIntraOpNumThreads(1)

// Run 4 inference tasks in parallel
val dispatcher = Dispatchers.Default.limitedParallelism(4)
boxes.map { box ->
    async(dispatcher) { recognize(box) }
}.awaitAll()

DEIM Model Int64 Input

The DEIM detection model requires Int64 type for its second input (orig_target_sizes). Type mismatch causes inference errors:

// Kotlin: explicit Long type
val sizeValues = longArrayOf(inputHeight.toLong(), inputWidth.toLong())

// Dart: explicit Int64List
final sizeTensor = await OrtValue.fromList(
  Int64List.fromList([inputHeight, inputWidth]), [1, 2]);

Addendum: Re-verification After API Fix

We fact-checked the technical claims in this article and made an important discovery.

Replacing getPixel() with getBytes()

The getPixel() API creates a Pixel object per call. The same package offers getBytes(order: ChannelOrder.rgba), which returns a view of the internal buffer directly:

Implementation	Time
Flutter (getPixel loop)	73,000ms
Flutter + Kotlin bridge	27,000ms
Flutter (getBytes fix)	3,967ms

A single-line fix reduced 73 seconds to 4 seconds. Faster than the Kotlin bridge optimization (27s).

The root cause was not Dart or Flutter’s limitations, but how we used the image package API. getPixel() involved 33 million object allocations; getBytes() returns a buffer view at near-zero cost.

Final Comparison with Kotlin Native

We also implemented and benchmarked a Kotlin Native version on the same image (6642x4990, Pixel 9a):

Implementation	Time	Notes
Flutter (old: getPixel)	73,000ms	API misuse
Flutter + Kotlin bridge	27,000ms	Unnecessary optimization
Flutter (getBytes fix)	3,967ms	Single-line fix
Kotlin Native	2,523ms	Bitmap API + coroutines

Kotlin Native was 2.5s vs Flutter’s 4s — about 1.6x faster. The difference is approximately 1.5 seconds, modest compared to the 73s → 4s improvement.

Lessons Learned

Accurately identifying the root cause of performance issues matters. We concluded “Dart is slow” and built a Kotlin bridge, when the real cause was API usage
Explore optimizations within the same language/framework first. dart:ffi, direct buffer access, and Isolates were available options within Dart/Flutter
Repeated benchmarking and verification helps avoid assumption-based decisions

Going Forward

We decided to implement the KotenOCR Android version in Kotlin + Jetpack Compose. Beyond the 1.6x speed advantage, native UI gesture handling and platform integration were additional factors.

The Flutter development was not wasted — it served as valuable prototyping for pipeline design, bottleneck identification, and learning the importance of choosing the right API.

This development was carried out using Claude Code (Anthropic). The technical claims in this article were independently verified, and initially assertive statements have been revised accordingly.

Background#

Challenges Encountered with Flutter#

1. Image Processing Speed#

2. Parallel Inference with flutter_onnxruntime#

3. Parallel Recognition via Kotlin Coroutines#

4. Large Data Transfer via Method Channels#

Overall Benchmark Results#

5. Platform-Specific UI Issues#

Architecture in Retrospect#

Observations on AI-Assisted Development#

Flutter’s General Advantages#

What Changes (and What Doesn’t) with AI Tools#

Value of Flutter for Prototyping#

Technical Notes#

flutter_onnxruntime Tips#

Parallel Inference with Kotlin + ONNX Runtime#

DEIM Model Int64 Input#

Addendum: Re-verification After API Fix#

Replacing getPixel() with getBytes()#

Final Comparison with Kotlin Native#

Lessons Learned#

Going Forward#