Most of the AI conversation in healthcare is happening at 30,000 feet: about platforms, contracts, and enterprise rollouts. Meanwhile, a quieter shift is happening on individual clinicians' laptops. Models small enough to run entirely on-device, without ever sending a byte of audio or text to a cloud server, are becoming genuinely useful. For physicians worried about HIPAA, two-party consent laws, and the general unease of typing patient details into a browser tab owned by someone else, this matters more than the marketing cycle suggests.
In Offcall's second AI Residency webinar, Dr. Graham Walker and Dr. Michael Hobbs spent a meaningful chunk of time on local models, not as a theoretical concept but as something both of them use every day. The technical detail in that segment got compressed in the overview article. It deserves its own walkthrough.
This session is part of Offcall's AI Residency series. The previous session covered AI fundamentals. Sessions 3 and 4 cover cutting through the hype and vibe coding for clinicians.
The mental model most clinicians have for AI is a query that travels from their keyboard to a data center and back. That is how ChatGPT, Claude, and most ambient scribes work. As Walker explained during the session, the speed feels instant precisely because it is being processed across enormous infrastructure: "It's being all outsourced to, you know, millions of computers."
A local model inverts that. The model itself, a compressed version of the same family of large language models, sits on your hard drive. Your microphone audio, your typed prompts, your dictation, all of it stays on your machine.
The trade-offs are real:
The trade-offs in your favor are also real, and for clinicians, they are substantial: privacy, no internet dependency, no per-token costs, and no third-party logging.
Walker's daily-use example was Spokenly, a free Mac dictation app he uses "easily 50 times a day." It runs Nvidia's Parakeet model locally and, in his side-by-side experience, beats Apple's built-in dictation on both speed and accuracy. Hobbs uses Whisper Flow for the same purpose.
Here is the part that matters clinically. Because the audio never leaves the machine, you can dictate things you would never feel comfortable speaking into a cloud-connected tool. Walker put it bluntly: "You can talk about your finances or say your bank password or whatever, because it's not going to the internet."
Translate that to clinical work and the implication is straightforward. Drafting a note, working through a differential out loud, dictating a letter to a referring physician, none of that requires a cloud round-trip if your dictation engine is local. The transcript becomes text on your screen, and what you do with it next is a decision you control.
The intimidating part of local models is the ecosystem around them. Hobbs and Walker named three resources clinicians should know:
For models specifically, Hobbs recommended starting with Google's Gemma family or Alibaba's Qwen family. Both are free, both come in multiple size variants, and both are recent enough to be genuinely capable.
To make the point that local models are not toys, Walker described an experiment from a few weeks earlier when he lost his voice before recording a podcast episode. The team used a free local text-to-speech model (Qwen 3.5 TTS) and eight seconds of his recorded voice to generate something "freakishly lifelike."
That is the level of capability now sitting on consumer hardware. Whether you find it exciting or unsettling depends on the application, but the underlying point is that "local" no longer means "underpowered."
One audience question during the webinar pointed at exactly the right issue. In California, a two-party consent state, Sutter Health is reportedly facing a lawsuit over alleged use of an ambient AI scribe without patient consent. The question raised in the chat: would a local model solve that?
Walker was careful not to give legal advice, and rightly so. But the architectural answer is interesting. A fully local ambient scribe, one where audio is captured, transcribed, and structured entirely on the device, with no cloud transmission at any point, sidesteps a meaningful chunk of the data-handling concerns that plague cloud-based scribes. It does not eliminate the consent question (recording is recording). But it shrinks the surface area of who has the data and where it lives.
Hobbs is already tinkering on this himself: "I've been working on building an AI scribe just to tinker with on my own and kind of refining that."
For clinicians who want to experiment, the lowest-risk path looks like this:
The clinicians who will get the most out of the next two years of AI development are not the ones chasing every new SaaS launch. They are the ones who understand the difference between a model that runs in someone else's data center and a model that runs on their own machine, and who know when each is the right call.
For the full discussion, including the live tool demos, watch the complete webinar here:
Offcall Team is the official Offcall account.
See what your colleagues are saying and add your opinion.