/work · voiceflow

VoiceFlow.

System-level voice-to-text that turns speech into polished writing in any app.

year · 2025role · designed, built, shippedtags · AI, Desktop, Systemsread · ~1 minlive

§01 · architecture

How it moves the data.

§02 · problem

Voice input on desktop is broken. System dictation produces transcripts the writer disowns. Per-app integrations make the universal case impossible. Power users who think faster than they type have no good option.

§03 · approach

A system-level desktop app. Hold a hotkey, speak, release — polished text appears at your cursor in any application. Two-stage AI pipeline: Whisper for raw transcription, then GPToss for intelligent cleanup that removes filler words while preserving intent.

§04 · decisions

What was chosen.
What was rejected.

d/01

Rust native key listener for global hotkey

Electron globalShortcut

Electron's globalShortcut becomes unreliable when the app loses focus. Rust intercepts at the OS layer (IOKit on macOS, Win32 API on Windows), capturing 100% of hotkey presses regardless of which app is in front.

d/02

Two-stage pipeline (Whisper → GPToss)

Single end-to-end model

Whisper is excellent at transcription but outputs verbatim speech, fillers and all. GPToss handles contextual cleanup — it knows when 'like' is a filler vs. meaningful. Separating concerns lets each stage be tuned independently.

d/03

Clipboard-based text injection

Programmatic text-field insertion

Programmatic insertion behaves differently in every app. The clipboard approach (copy → simulate Cmd+V) works in any text field, anywhere. Original clipboard contents are saved and restored in <50ms.

§05 · tradeoffs

What this costs.

t/01
Electron adds ~150–200MB memory overhead vs. a native app. The cost of one codebase running on macOS, Windows, and Linux is paid once in RAM.
t/02
API-based Whisper adds ~500ms latency vs. a local whisper.cpp model. The latency buys consistently better accuracy on technical vocabulary and accents — non-negotiable for the writing use case.
t/03
Clipboard injection briefly overwrites user clipboard. Mitigated with atomic save-inject-restore in under 50ms, well below human perception.

§06 · impact

What this returned.

3–4×

typing-speed improvement

<1s

end-to-end latency

platforms supported

§07 · stack

ElectronRustWhisper-1TypeScriptReactNode.jsGPToss-120b

last edited · 2026-05-11~1 min read

← previous

Chat with PDF

back to all work

How it moves the data.

What was chosen.What was rejected.

What this costs.

What this returned.

What was chosen.
What was rejected.