Skip to content
Halopen

developers

Voice typing for Cursor on Mac

Voice typing for Cursor on Mac. Halopen lands your prompt verbatim in Cmd+K, Chat, and Composer — every constraint, every file path — in milliseconds.

Jesse Meria · · 18 min read

THE WORKFLOW

What I do every morning in Cursor

Halopen is a native macOS dictation app that lands voice-typed prompts in Cursor’s three prompt surfaces — Cmd+K inline edit, Cmd+L Chat, Cmd+I Composer (including Agent mode) — and in Cursor’s integrated terminal. Hold the function key, speak the prompt at the shape the surface wants (one constraint for Cmd+K, four-part context for Cmd+L, multi-clause spec for Cmd+I), release; the verbatim text appears at the input cursor. No Cursor extension required — system-wide via the macOS Accessibility API.

Cursor was built to remove every friction between intent and shipped code. The remaining friction is the keyboard. Cmd+K wants a one-line constraint, Cmd+L wants a paragraph of context, Cmd+I wants the full multi-file spec — three different prompt shapes, three times the typing cost. Voice collapses all three. I hold the function key, speak the prompt at the shape the surface wants, release; Halopen lands the verbatim text in the Cursor input I had open. The loop runs at the speed Cursor was designed for instead of the speed I can type.

I use Cursor for the front-end of Halopen — the marketing site, the SwiftUI views, the Astro pages you’re reading right now — and Halopen is the dictation layer I open Cursor with. It’s a strange recursion the first time you notice it: the app I built to dictate prompts is the app I dictate prompts about, into the editor I’m building it inside. Three weeks of this and the recursion stopped feeling strange. It just feels like the morning.

The morning is short. I open Cursor on the Halopen website project, glance at CONTEXT_STATE.md to see where I left off, and read the first file the night-before-me wanted me to look at. Coffee in one hand, fn key under the other. The first move is almost always Cmd+K on a selection — a SwiftUI Button I want a hint added to, an .astro block with the wrong heading hierarchy, a CSS rule that needs a one-line constraint applied. I select the lines, hit Cmd+K, hold the function key, and talk.

That Cmd+K take is one move. The morning is dozens of them, stitched into the larger arc of the day’s work. Some take is a Cmd+L — I open the chat panel and think out loud about a tradeoff before I touch any code, because the cost of dictating a paragraph of context is small enough that I do it instead of holding the question in my head. Some take is a Cmd+I — I describe a multi-file refactor with file paths, a negative constraint, and a test the agent should write first, all in one continuous hold. Three different prompt shapes, three times during the same hour, one hotkey. The shape of what I say changes per surface. The way I deliver it doesn’t.

What I didn’t see coming is what this does to the shape of the prompts themselves. When typing the third clause is free, I include the third clause. When typing the negative constraint is free, the negative constraint stops being the thing I leave out. When typing the file path is free, I spell the file path instead of saying “that file.” The prompt that lands in Cmd+K, Cmd+L, and Cmd+I is the prompt I’d have written if writing were free — and it’s a different prompt than the one I would have typed. Cursor reads what I actually meant, not the abbreviated version of what I meant that I would have typed under articulation cost. The full case for that — the cost-of-articulation thesis, how it applies to AI coding generally — is in the Claude Code companion. What this post argues is the next move: Cursor isn’t one prompt input. It’s three. And voice is the only modality that matches the shape of all three at once.

The next section walks through the three surfaces one at a time — what each one wants, what voice gives each one, and what a real dictated prompt looks like at each shape.

THE THREE SURFACES

The three voice-shaped surfaces of Cursor

Cursor’s keyboard surface area is small on purpose. Three shortcuts — Cmd+K, Cmd+L, Cmd+I — cover almost every interaction with the model that lives behind the editor. Cmd+K runs an inline edit on the selection. Cmd+L opens Chat for the question that doesn’t fit into a single edit. Cmd+I opens Composer for the multi-file work where you need to talk through architecture before code moves. Three keys, three different conversations with the same model.

The thing nobody quite says out loud is that each of those three surfaces wants a different shape of prompt. Cmd+K rewards a single tight constraint named in five words. Cmd+L rewards a paragraph with the question, the failed hypothesis, and the constraint you don’t want to violate. Cmd+I rewards a multi-clause spec with file paths, negative constraints, and the test path that should still pass. Three different prompt shapes. Three different articulation costs if you’re typing them. And one hotkey if you’re not.

Below is what each surface actually feels like with the function key under your hand.

Cmd+K — Inline edit on selection

Cmd+K is the surface I use most. Highlight, press the key, the modal opens with the cursor in one text field, speak one constraint. The modal closes. The diff is in the file.

The prompt shape that wins on Cmd+K is one constraint, named in the smallest number of words that still leaves no ambiguity. “Make this a typed tuple.” “Switch this to optional chaining.” “Add an early return when the input array is empty.” The interaction is so fast that any prompt longer than eight words feels like the modal is in the way. Cursor built Cmd+K for the moment between recognizing a small change and writing it; voice keeps that moment small. The cost of typing even ten words into a modal you’ve opened forty times today is the friction that makes you skip the small fix and move on.

Cmd+L — Chat for the question that doesn’t fit into an edit

Cmd+L opens Cursor’s Chat panel. It’s the surface for the question whose answer isn’t a one-line edit. Why is this hook re-rendering? What’s the right way to model this state? Which of these three patterns is going to scale? Chat has more room than Cmd+K — multi-turn conversation, context from open files, the model thinking through the tradeoff with you instead of jumping to a fix.

The prompt shape that wins on Cmd+L is a full paragraph. The question, the hypothesis you’ve already ruled out, the constraint you don’t want to violate, and the shape of answer you’re hoping for — all in one breath. Typed, that paragraph compresses to a keyword fragment because the third sentence costs as much as the first two combined. Spoken, it costs nothing extra to include the third sentence, the negative constraint, the “don’t suggest a refactor of the parent component, I just need the smallest change to this hook.” You get the paragraph that maps to the surface.

Cmd+I — Composer for the multi-file spec

Cmd+I opens Composer — Cursor’s surface for work that touches more than one file. Refactors, feature work, multi-step migrations. Composer can read across the workspace, propose edits in several files at once, and run as Agent if you want it to take a longer-running task and come back when it’s done. The prompt input is wider than Cmd+L’s, and the input invites length the way a blank document does.

The prompt shape that wins on Cmd+I is the multi-clause spec. The change you want, the file paths it touches, the architectural constraint you don’t want violated, the test that should still pass after, and any negative — “don’t change the public API,” “don’t add a config flag,” “don’t touch the existing migration files.” That spec is the prompt that almost nobody writes by hand because the friction of typing every clause is too high. The three-clause version of a Composer prompt produces a three-clause Composer answer; the seven-clause version produces a seven-clause answer that lands. Voice removes the friction; the seven-clause version becomes the version you ship.

The same hotkey works in Cursor’s integrated terminal — Halopen is editor-agnostic, and the terminal inside Cursor accepts dictated text the same way iTerm2 or Warp does. The terminal-shaped sister tool walkthrough lives on Halopen for Claude Code.

VOICE PROMPT PATTERNS

How to write voice prompts that win in Cursor

The matrix above gives you the shape; this section gives you the patterns. Three prompt patterns, one per surface. Each one is the same intent in two phrasings — the shorter version that lands when you’re typing under articulation cost, and the fuller version that lands when you’re speaking and the third clause is free. The fuller version is not the better engineer’s prompt; it’s the same engineer’s prompt when articulation stops being expensive.

One-line constraint shape for Cmd+K

Cmd+K is the smallest surface and the place where prompt shape matters most. The modal is narrow, the fix is small, and the model has the entire selection plus the surrounding file already in context. The prompt’s job is to name the change so unambiguously that the inline edit lands on the first try.

The pattern that wins is verb + object + qualifier. The verb names the change (“convert”, “extract”, “rename”, “switch”, “add”). The object names what’s changing. The qualifier names the constraint that prevents the wrong-direction edit. Most typed Cmd+K prompts skip the qualifier. Most spoken ones include it.

The typed version asks for one change. The spoken version asks for the same change with the error-handling pattern named, the return type specified, and the negative constraint that protects every existing call site. Cmd+K’s edit modal applies the change inside the selection — the spoken version makes sure the change inside the selection doesn’t break the rest of the file.

The same shape works on every Cmd+K I run. The verb-object-qualifier triple takes seven seconds to dictate and the qualifier is the half that decides whether the inline edit is mergeable or needs a round of cleanup. The cost of articulation drops; the qualifier comes back; the inline edit lands clean.

Thinking-out-loud shape for Cmd+L

Cmd+L is Chat. The prompt’s job here is different — not to name a single change, but to bring Cursor’s model up to the point where it can answer the question that brought you to the panel. The pattern that wins is what I’m seeing + what I’ve already ruled out + what I’m asking for + what I don’t want.

That four-part structure is the pattern of how an engineer thinks about a hard question. It’s also the pattern of how engineers talk about hard questions to each other on a video call. It is not the pattern of how engineers type prompts into a chat input, because typing the second part — what you’ve already ruled out — feels like overhead when the chat is right there to ask follow-ups. So you skip it, the model proposes the thing you already ruled out, and the second turn of the conversation is the conversation that should have been the first.

The typed version gets you a generic answer about React re-render causes — closures, missing memoization, prop instability. You then have to type back “no, the props are stable” and “no, I can’t move state up” before the conversation reaches the answer. The spoken version skips both of those turns. Cursor’s Chat answers the actual question first.

The four-part structure looks long on the page but takes about thirty seconds to dictate. The first turn becomes the only turn that matters. The rest of the conversation is the model expanding on the answer, not the model and you negotiating toward the question.

Multi-clause spec shape for Cmd+I

Cmd+I is Composer, and Composer is at its best with prompts that look like a small specification document. The pattern that wins is change + scope + architectural constraint + test contract + negative constraints. Five clauses, sometimes more. Each one stops one class of misread.

That spec shape is the prompt that nobody writes by hand. Typing five constraint clauses into Composer’s input takes three to four minutes if you mean it. Most engineers type a one-sentence version, accept Composer’s first-draft diff, then spend the saved time reviewing the diff and re-prompting around the constraints they didn’t bother to type. The total time is roughly the same; the diff quality is worse; the negative constraints — “don’t change the public API”, “don’t add a config flag”, “don’t touch the migration files” — are usually the ones that get left out.

The typed version produces a Composer diff that’s plausible — but it might inline the drawing logic as a free function instead of a value type, change the public WaveformRendererView API to expose the new abstraction, and quietly refactor the audio-capture path while it’s in there. Three rounds of correction. The spoken version pins the architectural shape (value type, single method), pins the public API (unchanged), pins the test contract (the new snapshot path), and pins the no-go zone (audio-capture untouched). Composer’s first-draft diff is the diff that ships.

The pattern across all three surfaces is the same. AI-coding tools were built on the assumption that engineers would write detailed, multi-clause prompts. Most of the time, we don’t — because the typing tax compresses every prompt into a keyword fragment, and the agent then fills the gap with what it guesses we meant. Voice removes the typing tax. The prompts get longer; the constraints come back; the agents start producing the result on the first pass.

The other thing that compounds: every clause you successfully include in a prompt is one round of follow-up correction you don’t have to do later. The math on a Composer morning isn’t speech-versus-typing speed. It’s the spoken first-pass spec versus the typed-fragment-plus-three-correction-turns of work you do anyway. The spoken version is faster because the second, third, and fourth turn don’t need to happen.

That’s the insight that makes voice into Cursor different from voice into anything else. You’re not dictating prose that a human reader will gracefully interpret. You’re dictating a specification that an agent will execute on. The constraint quality determines the diff quality. Voice removes the cost of constraint quality. The diff shape changes.

Halopen captured the spoken sentence 'That sounds amazing! Hell yeah, let's do it all!' with the exclamation point intact
Halopen wrote this when said with energy. Punctuation, register, and emphasis come through without cleanup — the prompt that lands is the prompt as said.

UNDER THE HOOD

Why Halopen handles this

The reason all of this works in Cursor — every surface, every prompt shape, the same hotkey behaving identically in Cmd+K, Cmd+L, Cmd+I, and the integrated terminal — is that Halopen doesn’t know it’s Cursor. The architecture lives one layer below. Halopen is a native Swift menu-bar app that sits on macOS’s accessibility APIs and lands text wherever your cursor is, in any input field, in any app. Cursor is one of those apps. The integrated terminal inside Cursor is one of those input fields. The Composer panel is another. The Chat panel is another. None of them require a Cursor extension, a Cursor-specific configuration, or a version-compatibility matrix that tracks which Cursor build broke which surface — because the code path that lands the prompt doesn’t care which app the cursor is in.

We built Halopen on the Apple-shipped primitives for exactly this reason. Audio capture is AVAudioEngine. Hotkey detection is a session-level CGEventTap. Live partial transcription runs through Apple’s SFSpeechRecognizer so you see the words appearing inside the recording pill as you speak them. The higher-fidelity final pass streams to our edge function for gpt-4o-transcribe, and whichever engine returns last gets stitched into the transcript. What this means for Cursor specifically: the same biasing engine that prefers created_at in a snake_case Postgres buffer prefers accessibilityLabel in a SwiftUI buffer, prefers useEffect in a TSX component, and prefers OrdersRepository in a Composer prompt that mentions the file. Cursor-context biasing reads the buffer Cursor has open and shapes the transcription around it before the prompt ever leaves your machine. The full per-primitive walkthrough — live-preview self-correction, silent fallback when the cloud call drops, the cursor-context biasing internals — is in the engineering walkthrough in our Claude Code piece.

The Cursor-specific implications fall out of that architecture. One — works in every Cursor surface without an extension. Cmd+K, Cmd+L, Cmd+I, the integrated terminal, the global command palette, the rename dialog, the search-and-replace input. If your cursor is in it, Halopen lands text in it. Two — no version-compatibility matrix. Cursor ships fast; we don’t have to ship a matching update every time a panel changes or a new surface appears, because the surface is just another text input as far as macOS is concerned. Three — the audit log is local, on your machine, readable. Every cloud call Halopen makes appears in it, with timestamp and byte count. We ship the verbatim contract by default — what you say is what lands at the cursor, no smoothing pass — and the audit log is how you verify that the bytes leaving your machine are only the ones you held the function key to send. Privacy isn’t a settings toggle. It’s the architecture — no screen-capture entitlement, no ambient listening loop, no background process eavesdropping for a trigger phrase. The audit log shows every byte that leaves. The app idles in tens of megabytes of RAM and sleeps until you press the key, which is why the agent process gets the full machine when Composer is doing real work. The broader case across every surface a Mac developer touches — terminals, IDEs, browser inputs, chat clients — lives on the best Mac dictation app.

That’s the engineering substance. What it changes for a Cursor user — voice and AI compounding inside the editor where the loop is shortest — is the next section.

THE BIGGER SHIFT

What this changes for Cursor users

Two shifts are stacked on top of each other right now, and they compound. Cursor put the AI inside the editor — the model reads your repo, edits across files, completes the line you’re about to type, agents through a multi-step task while you watch. Voice put articulation back in the prompt — every constraint clause, every file path, every negative instruction, all of it free to say. Either shift on its own would be a real change. Together they collapse the loop between thinking and shipped code more than either does alone.

The way it shows up on a Tuesday afternoon is quiet. A cross-file refactor would have been a Cmd+I prompt I typed in eighty words, because eighty words is what felt worth typing. Now it’s two hundred words because saying the third constraint cost nothing, and Composer comes back with the diff I actually wanted on the first pass instead of the adjacent one I would have corrected over four follow-ups. The morning stops being about typing speed and starts being about how clearly I can see the change before I describe it. The bottleneck moves from the keyboard to the thought, which is a much better place for the bottleneck to live; thought clarity is a skill, and skills compound.

The negative constraints come back too. Don’t change the function signatures. Keep the public API stable. Don’t touch the orders endpoint while you’re in there. Those are the clauses an engineer leaves out first under typing cost — and they’re the clauses that protect the codebase from Composer’s well-meaning over-edits. Voice puts them back in.

The shift is already underway in the Cursor power-user slice — the engineers who fire fifteen Composer prompts a day, the indie builders shipping with Cmd+I as the primary surface, the teams who’ve turned the Cmd+L panel into a standing pair-programming partner. Once dictation lands at the cursor verbatim, the typed version of those prompts feels like writing a memo when you could be talking to a colleague. The compound that matters is voice + AI + the editor that was built to be used at speech speed; Cursor is the only editor where all three meet today, and the gap between Cursor + voice and Cursor alone widens with every Composer prompt that ships at full constraint resolution.

QUESTIONS

Frequently asked questions

Does Halopen work in Cursor's Cmd+K inline edit?

Yes. Cmd+K's inline-edit modal is a standard Mac text field; Halopen lands the prompt at the modal cursor on release. The pattern is verb + object + qualifier — speak the change, the object, and the constraint that prevents the wrong-direction edit, then let go.

Does Halopen work in Cursor's Cmd+L Chat panel?

Yes. Cmd+L opens Cursor's Chat (the AI Pane); the input is a standard Mac text field. Multi-turn conversations work the same way — hold for each turn, the transcript lands at the cursor, you send.

Does Halopen work in Cursor's Cmd+I Composer (and Agent mode)?

Yes, in both. Cmd+I opens Composer; Agent mode is a Composer setting that lets it run autonomously across files. Halopen lands the multi-clause spec at the cursor on release. A single hold runs up to ten minutes.

Will Halopen handle code symbols, file paths, and CLI flags correctly when dictating into Cursor?

Yes. Cursor-context biasing prefers technical tokens around your cursor. For unusual identifiers, the live preview shows the misread and you spell it out in flight — the corrected token replaces the misread before the prompt leaves your machine.

Can I dictate camelCase function names, snake_case identifiers, and PascalCase types into Cursor?

Yes. Cursor-context biasing reads the casing of the code already in your buffer, so useUserData, created_at, and OrdersRepository each land in the right idiom. For unusual identifiers the live preview catches the misread; spell them out and the corrected token wins.

How accurate is voice typing for Cursor's prompt inputs?

High enough that I trust it without re-reading every transcription. Cursor-context biasing handles the technical vocabulary; the live partial catches rare misreads before the prompt reaches the model.

Is voice typing actually faster than typing for Cursor prompts?

Yes — and the multiplier widens with surface depth. Cmd+K saves seconds. Cmd+L saves a sentence per turn because spoken paragraphs include the prior-check clause typed prompts skip. Cmd+I saves minutes — a five-clause Composer spec that takes thirty seconds to dictate is the kind of prompt most engineers compress to one sentence when typing, then pay back over three correction turns.

Why are spoken Cursor prompts better than typed ones, not just faster?

Cursor's three surfaces each reward a different prompt shape — Cmd+K wants one constraint, Cmd+L wants four-part context, Cmd+I wants a five-clause spec. Typing converges all three toward the shortest possible version because every clause costs articulation effort. Spoken, the cost flattens; the prompt arrives at the surface in the shape the surface was built for.

Does Halopen work in Cursor's integrated terminal?

Yes. Cursor's integrated terminal is a standard Mac terminal surface; Halopen lands text there the same way it does in iTerm2, Apple Terminal, Warp, Ghostty, Alacritty, and kitty.

Do I need to install a Cursor extension to use Halopen?

No. Halopen is system-wide via the macOS Accessibility API — it works in every Cursor surface without anything installed on the Cursor side. No extension to update on each Cursor release, no version-compatibility matrix.

Will Halopen slow Cursor down?

No. Halopen idles in the menu bar at near-zero CPU; the audio engine spins up only while the function key is down. Cursor's Composer agent runs uncontested on the rest of the machine. Memory footprint stays in the low tens of megabytes whether the agent is reading a fifty-file repo or rewriting a single function.

Does Halopen send my code or my voice to anyone when I dictate into Cursor?

Halopen never reads your codebase. Audio streams to Halopen's transcription edge function only while the function key is held — typically two to forty seconds per prompt — and is dropped after the transcript returns. The text Halopen lands at your cursor is the only artifact that crosses your machine. The audit log on disk shows every cloud call by timestamp and byte count, so you can verify.

What hotkey does Halopen use, and can I change it without conflicting with Cursor's shortcuts?

The function key (fn) by default. Cursor's shortcuts are all Cmd-based, so there's no conflict — fn isn't bound to anything in Cursor. You can switch Halopen to Right Option, Control + Option, or any custom combination in Settings.

Does Halopen work in Cursor on Apple Silicon and Intel Macs?

Yes — Universal binary, Apple Silicon and Intel, macOS Sonoma or later. Cursor's own Apple Silicon optimization compounds with Halopen's: the dictation pipeline runs on the Neural Engine where available, leaving the performance cores free for Cursor's local indexing and Composer's agent process.

How much does Halopen cost?

Free covers 8,000 words a month with no credit card and no trial timer — enough to feel the three-surfaces workflow on real Cursor prompts. Pro is $19/month or $179 a year for unlimited dictation. A Cursor power user firing dozens of Composer prompts a day blows through the free ceiling in the first week; Pro removes it for the price of one Cursor subscription tier.

Can I dictate multi-paragraph Composer prompts for multi-file refactors?

Yes. The hold-to-talk window is generous — long enough for a multi-clause Composer spec with file paths, architectural constraints, and the test contract that should still pass. The natural rhythm of explaining a refactor out loud — pause, qualify, pause again — comes through as sentence and paragraph breaks Composer reads correctly. For specs that exceed a single take, release between thoughts and hold again; the next take continues from the cursor.

Open Cursor. Hold the function key. Speak the next prompt.

Cursor exposes three voice-shaped surfaces; voice is the input modality that matches the shape of all three. The first prompt of tomorrow morning will fit Cmd+K’s shape — one constraint on a selection, ten seconds spoken, an inline edit Cursor applies cleanly on the first pass. The morning is dozens of prompts like that, stitched into the larger arc of the day’s work. The keyboard stops being the bottleneck the moment you stop holding it.

Try Halopen

Hold the function key. Speak.

Halopen Free is 8,000 words a month, forever. Pro is $19/mo or $179/yr — unlimited.

Power-user cheat sheet

Take Halopen with you.

One short email, then the Halopen power-user cheat sheet — hotkeys, best-fit apps, custom vocabulary tips, voice patterns for prompt engineering. No spam. Unsubscribe in one click.