Skip to content
Halopen

developers

Voice typing for Claude Code on Mac

Voice typing for Claude Code on Mac. Halopen captures your prompt verbatim — every constraint, every modifier, every file path — at the cursor in milliseconds.

Jesse Meria · · 17 min read

THE WORKFLOW

What I do every morning

Halopen is a native macOS dictation app that lands voice-typed prompts at the cursor in Claude Code — running in iTerm2, Apple Terminal, Warp, Ghostty, Alacritty, or kitty — through the macOS Accessibility API. Hold the function key, speak the migration spec or refactor prompt, release; the verbatim text appears at Claude Code’s prompt input. The wedge: spoken prompts to AI agents are denser than typed ones because the cost of every additional constraint clause drops to zero.

I’ve been building Halopen — a native Mac dictation tool — and using Halopen to write the prompts that ship Halopen. Voice into Claude Code, recursive but it works. Three weeks in, the thing I didn’t see coming is this: spoken prompts to an AI agent are better than typed ones, not just faster. The cost of full articulation drops, so you articulate fully. And the agent reads what you actually meant.

The morning routine is short. I open iTerm2, run claude in the project directory, and read CONTEXT_STATE.md so I know where I left off the night before. Coffee in one hand, fn key under the other. The first prompt is usually a migration, a refactor, or a feature spec — something with enough constraint shape that typing it would take five minutes and dictating it takes forty seconds. If you haven’t met the app yet, here’s what Halopen is, in two minutes.

Yesterday morning the first prompt was a Supabase migration. I had a meria schema that needed a customers table — id, email, name, timestamps, row-level security, and an updated_at trigger. I held the fn key and started talking. Halfway through the word meria the live partial showed Maria — phonetically reasonable, semantically wrong. So I spelled it out mid-sentence. Here’s what landed in the Claude Code prompt input:

The mid-prompt spelling moment is the one I want to dwell on. Two seconds before the word meria left my mouth I’d glanced at the live partial and seen Maria. I said “M-E-R-I-A” out loud, kept going, and the corrected text replaced the misread before anything reached Claude. That self-correction loop is the feature you don’t appreciate until you’re three weeks deep — once you trust it, you stop slowing down for proper nouns and project-specific identifiers. You just spell it out and keep moving. The page on Halopen for Claude Code goes deeper on the cursor-context biasing that catches camelCase function names too. Halopen works in any terminal where Claude Code runs — iTerm2, Apple Terminal, Warp, Ghostty, Alacritty — because it works everywhere a cursor goes on a Mac.

The migration above ran in about forty seconds end to end: twelve seconds of speaking, a few hundred milliseconds for transcription, one glance to verify, return. The first prompt sets the tempo for the morning. The thing I didn’t see coming — the deeper insight that took three weeks to notice — is in the next section.

THE THESIS

Why spoken prompts beat typed prompts

The second-order effect is the part I didn’t see coming. I expected voice to be faster — speech runs at roughly 150 words per minute and sustained typing for most engineers tops out at 60-80, so end-to-end voice prompts are 2-3x faster on the clock. That part I’d guessed. What I hadn’t guessed is what voice does to the shape of the prompt.

There’s a hidden cost in typing that nobody quotes. Call it the articulation cost: the friction between knowing the full constraint and writing the full constraint. Engineers know what they want — which table, which index, which test fixture, which architectural choice. But typing the long version takes four to six times longer than typing the abbreviated version. So they type the abbreviated version. The agent fills in the gaps, and the gaps don’t always match what the engineer would have written if writing were free. Half the back-and-forth in an AI-coding session is the engineer paying down articulation debt — re-prompting with the constraint they didn’t bother to type the first time.

Voice removes that debt at the source. The articulation cost drops enough that you stop weighing whether the third clause is worth typing. You just say the third clause. The prompt that reaches the agent is the prompt the engineer would have written if typing were free. That’s a different prompt than the one they would have typed.

Here’s what I mean. Yesterday afternoon I was investigating a slow Postgres query in our edge-function logs — a 1.2-second p95 on a single endpoint that should have been under 50ms. Cursor was in the Claude Code prompt input, my notes were open in another window, and I held the fn key:

That prompt would have taken me two and a half minutes to type. I’d have typed about a third of it — something like “add an index on orders(customer_id, created_at) for the orders-by-customer query, plus a perf test” — and Claude would have written a single-column index, or a two-column index with the wrong sort direction, or a perf test that didn’t actually pin the planner to the new index. Then I’d have paid down the articulation debt over three follow-up prompts.

Spoken, the whole instruction took thirty seconds. Claude got it right on the first pass. The prompt I dictated wasn’t a faster version of the prompt I would have typed — it was a better prompt. Higher-resolution. More specific. Constrained on both sides.

Looking back at my prompt history three weeks in, the pattern was clear: same kind of refactors and migrations as the year before, but the prompts were noticeably longer on average and Claude was getting them right on the first try far more often than before. The variable that changed wasn’t my skill. It was the input modality. Voice removed the cost of full articulation; I started articulating fully; the agent started reading what I actually meant.

The effect compounds with constraint count. Typing scales linearly with the number of clauses you’re willing to commit to. Speech doesn’t.

THE WEDGE

The verbatim wedge applied to AI coding

The thesis above only holds if the dictation tool actually preserves what the speaker said. Halopen does — verbatim is the contract, every word, every phrasing, every constraint clause. The voice in your prompt arrives at the cursor as you spoke it. The articulation-cost gain compounds, the agent reads what you actually meant, and the workflow holds under technical load.

The canonical Halopen demo is a kitchen-prep rant — contraction, bilingual code-switch, intensifier, all preserved. The engine that catches a kitchen rant verbatim catches your prompt verbatim. Same contract, different register: calm engineer instead of charged-up cook, and the engine doesn’t care which.

In an AI-coding context, “what you said” means the technical specifics. Variable names land as the names you used, not approximations. File paths land as the paths you spelled, not nearest-match guesses. CamelCase function names, snake_case identifiers, kebab-case CLI flags — they arrive as the technical idioms you spoke, not as English noun phrases that the agent then has to back-translate. When you say “use the existing apiClient utility, not raw fetch”, that exact constraint lands in the prompt — including the negative half. When you say “refactor the verify-token logic into a separate Limiter struct”, the proper-noun struct name arrives intact, capitalization and all. When you say “don’t change the query itself, only add the index and the test”, the guardrail clause lands in full.

The reason this matters more for AI coding than for any other dictation use case is that the agent is going to act on the text. A typed Slack message lives and dies as text — if a word lands slightly off, the human reader corrects it in their head. An agent prompt becomes code, migrations, deletions, deploys. A misread proper noun becomes a wrong file edit. A softened constraint becomes a missing guardrail. A paraphrased preference becomes a broken architectural choice. The verbatim contract is the difference between an agent that builds what you described and an agent that builds something adjacent to what you described and you find out at code review.

Verbatim is the default because verbatim is the contract that lets an agent act on what you said. The longer case for that choice lives in the verbatim manifesto, and the broader engineering case across every Mac surface lives on the best Mac dictation app.

UNDER THE HOOD

How Halopen + Claude Code works under the hood

Halopen holds the verbatim contract under AI-coding load because every layer runs on the first-party APIs Apple ships for exactly this kind of work. The pipeline is native Swift end to end, which is why the agent process gets the full machine when it needs it.

Audio capture is AVAudioEngine with a microphone tap. The moment you press the function key — captured by a CGEventTap at session level — Halopen opens the audio stream and starts buffering. The recording pill appears in the corner. The microphone is hot for exactly the seconds you hold the key, not a millisecond longer. Hold to talk, release to stop. That’s the whole behavioral surface.

Transcription runs as two engines in parallel, not in series. Apple’s SFSpeechRecognizer produces a live partial inside the recording pill while you’re still speaking — that’s the preview you glance at to catch a misread of meria or a camelCase function name before the prompt reaches the agent. At the same time, the audio streams to our edge function, which calls gpt-4o-transcribe for the higher-fidelity final pass. Whichever engine returns last gets stitched into the final transcript; if the cloud call fails, the live partial is what lands. Silent fallback. The user sees a slightly less polished transcript on the rare bad-network frame, never an error.

The vocabulary biasing is the part that earns the post’s title. Before each transcription call, Halopen reads the text adjacent to your cursor, your active app, and your personal vocabulary dictionary, then feeds that bias directly into the speech engine — so the right token shapes win at recognition time. In Claude Code with snake_case identifiers in the buffer above, the engine prefers those token shapes over English noun-phrase alternatives. In a Markdown editor, it biases toward prose. The same speech becomes the right text for the surface it lands on.

Text injection uses Apple’s accessibility APIs for native text fields and a single inserted run for terminals — which keeps shell autocomplete and tab-completion behaving exactly the way they did before you held the key.

Privacy is enforced by the architecture, not by a settings toggle: audio leaves your Mac only while you hold the key, only to our transcription edge function, only for the seconds you’re holding. Audio is not retained after transcription. There’s no screen-capture entitlement, no ambient listening loop, no background process eavesdropping for a trigger phrase. Every cloud call appears in a local audit log you can read yourself. The app idles in tens of megabytes of RAM and sleeps until you press the key.

The architecture is what makes the same hotkey work in any agent surface — terminal, editor, chat — without per-tool plumbing.

BEYOND CLAUDE CODE

Beyond Claude Code — Cursor, Aider, Continue, ChatGPT

The same workflow carries cleanly across every AI-coding surface I touch on the Mac. Hold the function key in any text input; the prompt lands verbatim; the agent acts. Halopen is system-wide via the macOS Accessibility API, not a per-tool plugin — there’s nothing to install on the agent side, nothing to configure per editor, no version-compatibility matrix to maintain. Claude Code is where I spend most of my mornings, but the muscle moves with me to whichever surface the day’s work asks for.

Cursor

Cursor’s Cmd+K modal, chat panel, and Composer are all standard Mac text inputs — Halopen lands the prompt the same way it lands one anywhere else. A recent one, dictated into Cmd+K on a SwiftUI view:

Add accessibilityLabel and accessibilityHint to every Button in this view.
The label is the visible text of the button. The hint describes what
happens after the tap, not the gesture. Don't change any of the layout.

Three sentences, twenty seconds spoken, an inline edit Cursor applied cleanly on the first try. The full per-tool walkthrough — including the Composer multi-file workflow — lives on Halopen + Cursor.

Aider

Aider is the AI pair programmer that runs in your terminal — aider opens a session bound to your repo, you describe an edit, it proposes a diff, you accept and it commits. Pure CLI, no UI to fight. The Aider prompt is a standard terminal text field, so Halopen works in iTerm2, Apple Terminal, Warp, Ghostty, Alacritty, and kitty without per-app setup. Where Aider asks you to be explicit about which files are in chat scope, voice makes the slash-command idiom comfortable to dictate:

/add src/cli/commands/init.py and src/cli/commands/build.py

Then a multi-file refactor instruction in the same hold: “add a --quiet flag to both commands that suppresses all stdout except errors, and route every existing print call through a shared logger helper in src/cli/utils/log.py.” The slash, the file paths, the flag name, and the helper path all land as the literal tokens Aider expects. Per-tool walkthrough on Halopen + Aider.

Continue

Continue lives inside the editor I already keep open — a VS Code extension and a JetBrains plugin with a chat panel (Cmd+L) and an inline edit surface (Cmd+I). The cursor lands in the panel; Halopen drops the prompt there. Where Cursor wins for AI-first editing and Aider wins for CLI-driven refactors, Continue wins for the moments I want the AI close to the file I’m already in without changing tools. A recent inline-edit prompt on a long Express handler:

Extract everything inside the try block of this handler into a separate
async function called processOrderRequest. It takes the request body
as input and returns the response payload or throws a typed error.

The function name, the typed-error contract, the input/output shape — all preserved. Continue runs the edit, I review the diff, accept. The same hold-to-talk pattern works in JetBrains. Per-tool walkthrough on Halopen + Continue.

ChatGPT

ChatGPT is the one surface in this section that isn’t an agentic coder — it’s a chat partner. Same dictation muscle, different agent shape. I use it when I want a thinking-out-loud session before I touch code: a SQL migration to draft, a tradeoff to talk through, an architecture sketch to pressure-test. The Mac app is Apple-native; chat.openai.com runs in any browser. Halopen lands the prompt in either. A recent one, drafting a migration:

Write a Postgres migration that adds a soft-delete pattern to the orders
table. Add deleted_at as a nullable timestamptz, a partial index on
(customer_id) where deleted_at is null, and a check constraint preventing
deleted_at from being set in the future. Idempotent — wrap in a single
transaction. PostgreSQL 15 syntax.

Long, well-formed, every constraint stated. Typing that prompt would have taken two minutes; speaking it took thirty seconds and the answer came back complete on the first turn. Per-tool walkthrough on Halopen + ChatGPT.

The four tools above are the surfaces I actually use; the same pattern works in VS Code, Xcode, and JetBrains directly when I’m writing code by hand. The persona overview at Halopen for developers maps the rest of the surface area — terminals, editors, chat clients, browser inputs — and is the right next read if you want the full ecosystem view.

THE BIGGER SHIFT

What this changes for developers

Two shifts are stacked on top of each other right now, and they compound. The first is that AI agents have moved the hard part of programming upstream — from typing the code to specifying the change. The second is that voice has become an accurate-enough input modality that specifying changes by speaking them is no longer an accessibility workaround or a power-user novelty. It’s the right shape for the work, and the two shifts together rewrite what a productive AI-coding morning looks like.

The implications are quiet but durable. Mornings stop being about typing speed and start being about how clearly you can think through a change before you ask for it. Long, multi-clause prompts stop feeling expensive, so you write them — which means agents start producing the result you actually wanted instead of an adjacent one you have to correct over three follow-ups. The keyboard stops being a constraint on prompt quality. The bottleneck moves from input speed to thought clarity, which is a much better place for the bottleneck to live; thought clarity is a skill, and skills compound.

There’s a quieter shift underneath that one. When the cost of articulation drops, the negative constraints — “don’t change the route signatures”, “don’t touch the get-order endpoint”, “keep the public API unchanged” — stop being the first thing the engineer leaves out. Those are the constraints that protect the codebase from the agent’s well-meaning over-edits. Voice puts them back in.

I think two years from now, voice-as-prompt-input is going to be a default in every serious AI-coding workflow on the Mac, the way syntax highlighting is a default in every editor. Not because anyone advertised it; because the alternative — typing every multi-clause prompt — feels slow once you’ve stopped doing it. The transition is already happening in pockets. The tools that survive it are the ones that hold the verbatim contract under technical load. Halopen is built for the version of this future where you trust the dictation enough to stop reading every transcription before you press send, because the transcription is what you said.

QUESTIONS

Questions worth answering

Does Halopen work with Claude Code?

Yes. Claude Code runs in any Mac terminal and Halopen lands text in any terminal via the macOS Accessibility API. Nothing to install on the Claude Code side. The full per-tool walkthrough lives on Halopen for Claude Code.

Will Halopen handle code symbols, file paths, and CLI flags correctly?

Yes. Halopen biases the transcription engine with the text adjacent to your cursor and your active app context, which prefers technical tokens in a terminal or editor. For unusual identifiers, the live preview catches the misread and you spell out the correction.

How accurate is voice typing for technical prompts?

Cursor-context biasing reads the text around your cursor and your active app before each call, so architectural language, library names, and project-specific identifiers usually land on the first pass. Where they don't, the live preview shows the misread and you spell out the correction in the same hold — the corrected token replaces the misread before anything reaches the agent.

Is voice typing actually faster than typing for AI-coding prompts?

Yes — and the gap widens with prompt length. Speech runs ~150 wpm; sustained typing tops out at 60-80. A four-clause prompt that takes thirty seconds to dictate would take ninety to type, and the spoken version usually carries more constraint detail.

Why are spoken prompts better than typed ones, not just faster?

Voice removes the cost of full articulation. Typed prompts get shortened because the long version is laborious; the agent fills gaps. Spoken, the third clause and the negative constraint cost nothing. The prompt that reaches the agent is the one you'd have written if writing were free.

Does Halopen work in iTerm2, Apple Terminal, Warp, Ghostty, and Alacritty?

Yes — and kitty too. Halopen delivers text into terminals via CGEvent keystrokes, which every macOS terminal accepts as standard input. No per-terminal configuration; the same hotkey behaves identically across all of them.

Does Halopen work in Cursor, Aider, and Continue alongside Claude Code?

Yes. The same hold-to-talk hotkey lands prompts in Cursor's Cmd+K and Composer, in Aider's terminal session, and in Continue's chat and inline-edit panels. System-wide; nothing installed on the agent side. See sections above for per-tool examples.

Can I dictate camelCase function names and snake_case identifiers?

Yes. Cursor-context biasing prefers the casing of code already in your buffer, so useUserData and created_at land in the right idiom. The live preview catches misreads on unusual identifiers; spell them out in flight.

Does Halopen send my code or my voice to anyone?

Audio leaves your Mac only while you hold the key, only to Halopen's transcription edge function, only for the seconds you're holding. Audio is not retained after transcription. Halopen does not capture your screen. Every cloud call appears in a local audit log.

What hotkey does Halopen use, and can I change it?

The function key (fn) by default — cleanest hand-position because it rarely owns any other shortcut. You can switch to Right Option, Control + Option, or any custom modifier-and-key combination in Settings. Halopen detects conflicts and warns before assigning.

Does Halopen work on Apple Silicon and Intel Macs?

Yes. Halopen ships as a Universal binary that runs natively on Apple Silicon (M1, M2, M3, M4) and Intel Macs. macOS 14 (Sonoma) or later required. Developer ID notarized; first launch goes through standard Gatekeeper.

How much does Halopen cost?

Free is 8,000 words a month, forever — no credit card, no trial timer. Pro is $19/month or $179/year for unlimited words and every feature. The free tier is enough to feel the workflow on real prompts before you decide.

Can I dictate multi-paragraph prompts?

Yes. A single hold runs up to ten minutes — long enough for a paragraph, a long thought, or a multi-step prompt. The transcript arrives in one block at the cursor on release. Pauses become sentence breaks; longer pauses become paragraph breaks. For longer drafting, release between thoughts and hold again; each take continues exactly where the last one stopped.

Does Halopen interfere with terminal autocomplete or shell completion?

No. Halopen delivers text as a single inserted run rather than a per-character keystroke storm, so completion engines see the prompt land at once and don't trigger mid-word. Tab-completion and shell history behave the same as before.

Does Halopen work in VS Code, Xcode, and JetBrains alongside terminal-based AI tools?

Yes. Halopen is editor-agnostic — the same hotkey works in VS Code, Xcode, IntelliJ, PyCharm, WebStorm, Sublime, Nova, BBEdit, and any other Mac editor. The integrated terminals inside those IDEs accept Halopen text the same way iTerm2 does.

Will Halopen still capture my words clearly if I have an accent or dictate fast?

Yes. The engine adapts per utterance, handles mid-sentence code-switching across most major languages, and tolerates fast speech in normal conversational range without accuracy loss. Accent calibration is automatic — there's no training step, no per-user voice model to build.

ONE MORE THING

Hold the function key. Speak the next prompt.

The first prompt that drafted this post was dictated into Claude Code while I held the function key. Three weeks ago I would have typed a third of it and corrected the result over four follow-ups. Now I speak the long version, the agent reads what I actually meant, and the morning moves at the pace of thought instead of the pace of typing. Download Halopen free, open Claude Code, hold the function key. Speak the next prompt.

Try Halopen

Hold the function key. Speak.

Halopen Free is 8,000 words a month, forever. Pro is $19/mo or $179/yr — unlimited.

Power-user cheat sheet

Take Halopen with you.

One short email, then the Halopen power-user cheat sheet — hotkeys, best-fit apps, custom vocabulary tips, voice patterns for prompt engineering. No spam. Unsubscribe in one click.