This post documents the development of the Android version of KotenOCR, a classical Japanese text OCR app. We initially built it with Flutter, then migrated to native Kotlin. Along the way, we gained insights about framework selection when using AI coding tools (Claude Code).

Background

KotenOCR recognizes kuzushiji (classical Japanese cursive script) using ONNX Runtime models entirely on-device. The iOS version was already built in Swift, and we needed to add Android support.

We chose Flutter initially, expecting to share code between iOS and Android.

Challenges Encountered with Flutter

1. Image Processing Speed

We benchmarked converting a 6642x4990 pixel (~33 megapixel) image to an RGBA byte array:

ImplementationTime
Dart (image package getPixel())14,549ms
Kotlin (BitmapFactory + getPixels)~500ms

A roughly 30x difference in this case. However, this was caused by calling getPixel() per pixel, which creates a Pixel object on each invocation — 33 million object allocations.

What we learned later: The image package v4.x provides image.getBytes(order: ChannelOrder.rgba) for direct buffer access, eliminating the per-pixel loop entirely. dart:ffi was another option for zero-copy native function calls. We built a Kotlin bridge before exploring these alternatives.

2. Parallel Inference with flutter_onnxruntime

The iOS version uses Swift’s withThrowingTaskGroup for 4-way parallel recognition. We attempted the same in Flutter:

await Future.wait(batch); // 4 recognition tasks "in parallel"

Result: no speedup (0.98x).

Investigation revealed that the flutter_onnxruntime plugin’s method channel handler runs on the platform (main) thread without using makeBackgroundTaskQueue() (available since Flutter 3.0), so native calls are serialized.

Alternative approaches we did not try: an FFI-based ONNX package (bypassing method channels), Dart Isolates with BackgroundIsolateBinaryMessenger, or forking the plugin to add background task queues.

3. Parallel Recognition via Kotlin Coroutines

We built a Kotlin bridge using Dispatchers.Default.limitedParallelism(4) with async/awaitAll, achieving a 5.46x speedup on the recognition step.

val dispatcher = Dispatchers.Default.limitedParallelism(concurrency)
return runBlocking {
    boxes.map { box ->
        async(dispatcher) {
            val tensor = cropAndPreprocess(rgbaPixels, imageWidth, imageHeight, box)
            recognizeSingle(ortEnv, sess, tensor)
        }
    }.awaitAll()
}

4. Large Data Transfer via Method Channels

Returning a ~12MB Float32 tensor through the method channel caused the app to freeze. While there is no hard size limit on method channels, StandardMethodCodec serialization on the platform thread blocks the UI for large payloads.

Workaround: Write the tensor to a file and return only the path. Other options included BasicMessageChannel with BinaryCodec or dart:ffi for direct memory sharing.

Overall Benchmark Results

Pixel 9a (Android 16), 6642x4990 image, 15 detected regions:

StepDart-onlyKotlin-optimizedSpeedup
Image conversion + preprocessing14,549ms5,885ms2.5x
Detection (RTMDet inference)13,010ms10,630ms1.2x
Recognition (PARSeq x15)45,675ms10,266ms4.4x
Total~73s~27s2.7x

5. Platform-Specific UI Issues

  • Gesture conflicts in cropping — Android’s edge-swipe back gesture interfered with crop handle dragging
  • Status bar overlap — SafeArea behavior required adjustment
  • Adaptive Icons — Android-specific icon format configuration needed

Architecture in Retrospect

After optimization, the Flutter version’s architecture looked like this:

  • Image decoding → Kotlin (BitmapFactory)
  • Preprocessing (resize + normalization) → Kotlin (ByteBuffer)
  • ONNX inference → Kotlin (OrtSession direct calls)
  • Parallel recognition → Kotlin (Coroutines)
  • UI → Flutter (Dart)

In hindsight, we should have explored Dart-side optimizations (direct buffer access, FFI-based ONNX packages, Isolate parallelism) more thoroughly. The Kotlin bridge was a reliable solution, but Flutter-native approaches may have been sufficient.

Observations on AI-Assisted Development

This project was developed using Claude Code. Some observations on framework selection in that context:

Flutter’s General Advantages

  • Single language (Dart) — Lower learning curve
  • Hot Reload — Fast UI iteration
  • Cross-platform — Reduced effort for iOS/Android simultaneous development

What Changes (and What Doesn’t) with AI Tools

AspectObservation
Language learning costAI handles multiple languages equally well, reducing the “single language” advantage
Hot ReloadAI generates code but cannot see the screen — Hot Reload remains useful for human visual verification
Cross-platformEven if AI can write two native versions, maintenance costs remain: dual bug fixes, feature parity, two CI/CD pipelines
PerformanceNative code eliminates bridge layers in some cases, though this depends on the app’s nature

For compute-intensive apps like this one (ONNX inference, large-scale pixel processing), native development had clear benefits. For UI-centric or typical CRUD apps, Flutter’s cross-platform advantages likely hold even with AI-assisted development.

Value of Flutter for Prototyping

The Flutter version provided knowledge that directly informed the Kotlin implementation:

  1. Rapid validation of the OCR pipeline design
  2. Confirmation of UI state transitions
  3. Discovery of ONNX Runtime input/output specifics (Int64 type issues)
  4. Identification of performance bottleneck locations

Technical Notes

flutter_onnxruntime Tips

  1. Use Int64List.fromList() explicitly when passing Int64 values to OrtValue.fromList — plain List<int> becomes Int32
  2. Large binary data over method channels blocks the UI — consider BinaryCodec, file I/O, or dart:ffi
  3. Future.wait does not parallelize inference with flutter_onnxruntime’s current implementation — FFI-based packages or plugin forks may help
  4. Use inSampleSize to subsample large images before passing to native code

Parallel Inference with Kotlin + ONNX Runtime

// Limit intra-op threads per session to reduce contention
opts.setIntraOpNumThreads(1)

// Run 4 inference tasks in parallel
val dispatcher = Dispatchers.Default.limitedParallelism(4)
boxes.map { box ->
    async(dispatcher) { recognize(box) }
}.awaitAll()

DEIM Model Int64 Input

The DEIM detection model requires Int64 type for its second input (orig_target_sizes). Type mismatch causes inference errors:

// Kotlin: explicit Long type
val sizeValues = longArrayOf(inputHeight.toLong(), inputWidth.toLong())
// Dart: explicit Int64List
final sizeTensor = await OrtValue.fromList(
  Int64List.fromList([inputHeight, inputWidth]), [1, 2]);

Addendum: Re-verification After API Fix

We fact-checked the technical claims in this article and made an important discovery.

Replacing getPixel() with getBytes()

The getPixel() API creates a Pixel object per call. The same package offers getBytes(order: ChannelOrder.rgba), which returns a view of the internal buffer directly:

ImplementationTime
Flutter (getPixel loop)73,000ms
Flutter + Kotlin bridge27,000ms
Flutter (getBytes fix)3,967ms

A single-line fix reduced 73 seconds to 4 seconds. Faster than the Kotlin bridge optimization (27s).

The root cause was not Dart or Flutter’s limitations, but how we used the image package API. getPixel() involved 33 million object allocations; getBytes() returns a buffer view at near-zero cost.

Final Comparison with Kotlin Native

We also implemented and benchmarked a Kotlin Native version on the same image (6642x4990, Pixel 9a):

ImplementationTimeNotes
Flutter (old: getPixel)73,000msAPI misuse
Flutter + Kotlin bridge27,000msUnnecessary optimization
Flutter (getBytes fix)3,967msSingle-line fix
Kotlin Native2,523msBitmap API + coroutines

Kotlin Native was 2.5s vs Flutter’s 4s — about 1.6x faster. The difference is approximately 1.5 seconds, modest compared to the 73s → 4s improvement.

Lessons Learned

  1. Accurately identifying the root cause of performance issues matters. We concluded “Dart is slow” and built a Kotlin bridge, when the real cause was API usage
  2. Explore optimizations within the same language/framework first. dart:ffi, direct buffer access, and Isolates were available options within Dart/Flutter
  3. Repeated benchmarking and verification helps avoid assumption-based decisions

Going Forward

We decided to implement the KotenOCR Android version in Kotlin + Jetpack Compose. Beyond the 1.6x speed advantage, native UI gesture handling and platform integration were additional factors.

The Flutter development was not wasted — it served as valuable prototyping for pipeline design, bottleneck identification, and learning the importance of choosing the right API.


This development was carried out using Claude Code (Anthropic). The technical claims in this article were independently verified, and initially assertive statements have been revised accordingly.