Does on-device mode require a paid plan?

On-device mode is included with the free download today. The pricing tier placement is being decided alongside the launch — see Halopen's pricing page for the current answer.

How big is the model download?

About 626 MB the first time you turn on-device mode on. Halopen stores it under ~/Library/Application Support/Halopen/Models/. Subsequent launches use the cached file — no re-download.

Does on-device mode work on Intel Macs?

No. The on-device path runs against Apple's Neural Engine, which is only present on M1, M2, M3, and M4 Macs. Intel Macs continue to use Cloud mode without change.

Is on-device mode as accurate as Cloud?

Close, with one structural difference: Cloud uses a state-of-the-art commercial transcription model; on-device uses the best open-source model that fits in 600 MB. Cloud also runs vocabulary biasing against your personal dictionary today; on-device does not (a WhisperKit library bug we're working around — we'll restore on-device biasing when it's fixed upstream). For everyday English dictation without uncommon proper nouns, the two are comparable. For multilingual code-switching or content with brand names and technical terms, Cloud is still ahead.

How do I turn on on-device mode?

Open Halopen Settings (⌘,), pick the Transcription section, switch the mode to On your Mac, and accept the one-time model download. Switching modes takes effect on your next dictation.

Why Halopen runs on your Mac, not in the cloud

Halopen’s On-your-Mac mode runs Whisper Large v3 locally on Apple’s Neural Engine — on every M1, M2, M3, and M4 Mac — so audio is mel-spectrogrammed, encoded, and decoded without leaving the device. The model is a 626 MB one-time download cached at ~/Library/Application Support/Halopen/Models/. Halopen’s audit log surfaces every cloud call the app makes; in On-your-Mac mode, dictations produce no Transcription entry because no network call happens.

A dictation app that sends your voice to a server has a contract with you. The contract is short: we’ll capture it, transcribe it, give you the text, and discard the audio. Most reputable services hold that contract.

But a contract you can verify is worth more than a contract you have to trust.

That’s what on-device mode is. It’s not a different feature so much as a different ground truth. When Halopen’s transcription runs on your Mac, the question “did the audio leave the device” has a mechanical answer. You can read the audit log. You can pull your Ethernet cable. You can run Little Snitch. The answer doesn’t depend on us being honest. It depends on whether the network call ever happened, and you can see for yourself that it didn’t.

This post is about why that mode exists in Halopen now, what it actually is, what it costs you, and what it costs us.

The category problem

For most of the last two years, every voice typing tool that anyone has ever heard of has been built around a single architectural assumption: the transcription happens in a data center.

There are good reasons for this. The best speech-to-text models are large — multi-gigabyte multi-billion-parameter neural networks. Running them on a laptop in 2023 meant pegging the CPU, draining the battery, and waiting three seconds for the first word to appear. Putting them on a fast GPU in a server farm and shipping audio over the wire is a faster, cheaper user experience. Every major commercial player picked that path.

The catch is that the architectural choice is also the privacy posture. If your transcription runs on someone else’s machine, your audio is on someone else’s machine — for as long as the transcription takes, at least. The promise that the audio is discarded after transcription is exactly that: a promise. It’s not something you can audit from your own computer.

For most use cases this is fine. People send much more sensitive material to cloud services every day. Your bank statements, your emails, your calendar — all of it. Voice dictation is not categorically different from those things. But “categorically not different from email” still isn’t the same as better. There’s a class of dictation — the things you’d write down before typing, the password you’re about to read out loud, the conversation you’re documenting from memory, the IDE you’re working in under NDA — where the difference between “we promise we deleted it” and “the audio physically never reached us” is the entire game.

What changed

Apple Silicon changed it. Specifically: the Neural Engine inside every M-series chip is now fast enough to run open-source speech models — Whisper Large v3, specifically — at usable speed. On a Mac with an M4 Max chip we measured on-device transcription returning in under a second for typical short utterances, often faster than the Cloud round-trip when local network latency is high. We have not yet measured the M1 / M2 / M3 range ourselves; we expect the same shape, slower in absolute terms. The accuracy on clean English dictation is comparable to Cloud; for multilingual code-switching, Cloud is still ahead, which is why we ship both.

So it’s now possible — for the first time in the history of consumer voice typing — to ship a dictation app where the transcription happens entirely on your laptop and the response time is still good enough that you’d want to use it. That’s the change. Halopen is among the first apps to take advantage of it.

What on-device mode is, mechanically

When you switch Halopen’s transcription mode to On your Mac, three things happen:

The app downloads a ~626 MB model file the first time. The file lives under ~/Library/Application Support/Halopen/Models/. After that it’s cached locally; subsequent launches are instant.
The model loads into your Mac’s Neural Engine. This takes about four seconds the first time per app launch; afterwards transcription is fast.
Every fn-key hold from that point forward routes through the on-device path. Your audio is captured by the same AVAudioEngine that’s always running. It’s mel-spectrogrammed locally, encoded locally, decoded locally, and the text lands at your cursor — without a single network packet leaving your machine for the purpose of transcription.

You can verify this is true. Halopen’s audit log records every cloud call the app makes. In Cloud mode, every dictation produces a log entry with the destination URL, the response time, and the bytes transferred. In on-device mode, your dictations produce no audit-log entries at all, because there’s nothing to log. The contract is a mechanical fact, not a promise.

There are still cloud calls the app makes — license checks, software updates, sign-in. These all appear in the audit log, marked clearly, and you can inspect each one. But your audio never leaves.

What it costs you

On-device mode isn’t free in every dimension. Three honest tradeoffs:

Disk space. The model file is real and stays on your drive. About 600 MB after decompression. If you run out of disk space or change your mind, you can remove the model from Settings → Model → Remove and Halopen reverts to Cloud-only behavior. The model is downloaded again on-demand if you ever switch back.

Apple Silicon. Intel Macs cannot run on-device mode at usable speed. Halopen detects your chip at launch and disables the toggle on Intel; Intel users continue to use Cloud mode without change. If you’re shopping for a new Mac and on-device transcription matters to you, any M-series chip is enough. The M4 and M4 Max are notably faster but the M1 is still good.

Accuracy on a narrow set of utterances. For most everyday English dictation Halopen’s on-device transcription is comparable to Cloud. Where Cloud has an edge today: (1) vocabulary biasing against your personal dictionary — Cloud passes your saved terms to OpenAI’s transcription engine before the audio is processed, which means proper nouns, brand names, and technical jargon land on the first pass; on-device cannot do this in v1.5.0 due to a bug in the WhisperKit library’s prompt-token handling for Whisper Large v3 (we’ll restore on-device biasing when WhisperKit ships a fix); (2) multilingual code-switching — mid-sentence Spanish-English or Hindi-English transitions; (3) unusual acoustic conditions. If you frequently dictate proper nouns, brand names, or code-switch between languages, Cloud is still our recommendation. The Halopen team is working on closing these gaps; they aren’t closed in the v1.5.0 ship.

Battery. Running the Neural Engine has a power cost. We have not published a measurement yet — until we do, the honest recommendation for users who dictate heavily on a MacBook Air running on battery is to stay on Cloud or switch back to Cloud when battery is low. We’ll publish numbers as soon as we have them.

One-time download time. The ~626 MB model download takes 30-60 seconds on a fast home connection, a few minutes on cellular tethering. Plan accordingly.

What it costs us

It’s worth being honest about the business reality.

A cloud-only dictation product has an attractive economic shape. The infrastructure scales with usage, the user is locked into the network call, and the model improvements ship server-side without users having to update an app. A cloud-only product is also straightforwardly subscription-friendly — every dictation is a billable event that the company has visibility into.

An on-device product has none of those properties. Halopen has zero visibility into your on-device dictations. The model isn’t easy to update — we’d have to push a new app version. The infrastructure shifts from us to you (your disk, your CPU, your battery). And we still have to maintain the cloud path for users who prefer it and for the modes the local model doesn’t yet match.

We’re shipping on-device mode anyway because it’s the right thing for users who want it. We think a meaningful fraction of professional dictation users — lawyers, clinicians, engineers under NDA, journalists working on sensitive sources, founders dictating M&A correspondence, anyone who handles regulated data — will use it as their default, and the rest will use it occasionally for sensitive moments. We don’t need everyone to use it. We just want it available for the people for whom it matters.

This is, structurally, the kind of feature an indie team can ship and a venture-funded competitor can’t. A team optimizing for revenue per user needs every dictation to be metered server-side. A team optimizing for the right product for its users is happy when the metering disappears. That’s not a strategic critique of the category leaders. It’s a structural one. They built their cloud architecture before this was possible. Rewriting it now means undoing what made them successful.

We didn’t have that architecture to undo. So we built both paths from the start.

How to use it

Open Halopen. Hit ⌘, to open Settings, or pick Settings from the menu bar. Pick the Transcription section. Switch the mode to On your Mac. Halopen will offer to download the model — accept it, wait the minute, and the toggle becomes live.

From that point your dictations happen on your Mac. The audit log will be quiet — that’s the point. Your menu bar pill, your hotkey, your text injection — all unchanged. The product feels the same. The difference is what isn’t happening: nothing is being sent anywhere.

Switching back to Cloud is the same gesture in reverse: pick Cloud in the Transcription section. The toggle is immediate; no download involved.

You can flip between the two modes for as long as it serves you. There’s no penalty either way. Use Cloud at the cafe when you’re sending email; use On your Mac when you’re documenting something you wouldn’t say out loud in a public space. The product treats both as first-class.

The privacy posture, all together

Halopen has always made these promises:

We never store your audio. Audio is sent to a transcription service while you hold the key, transcribed, and discarded. We hold the contract.
We never capture your screen. Halopen does not have Screen Recording permission; it does not need it.
Halopen does not phone home unless it has a transactional reason — sign-in, license check, transcription, update check. Every cloud call appears in the audit log.

On-device mode adds one more line to that list, and it’s the one that’s mechanically verifiable rather than a promise to trust:

When the transcription mode is On your Mac, your audio never leaves your device. You can verify this from the audit log. You can verify it with Little Snitch. You can verify it by pulling your Ethernet cable. The dictation will work anyway.

That’s the thing we wanted to be able to say. That’s why on-device mode is in Halopen.

Why Halopen runs on your Mac, not in the cloud

The category problem

What changed

What on-device mode is, mechanically

What it costs you

What it costs us

How to use it

The privacy posture, all together

Hold the function key. Speak.

More from Halopen Learn

Halopen — writes what you said

Voice typing for Claude Code on Mac

The Halopen audit log: what it is and how to read it

Take Halopen with you.