The most quietly subversive moment in Offcall's second AI Residency webinar was not the polished vendor demo or the heads-up comparison of frontier models. It was Dr. Michael Hobbs walking through, in real time, how he builds his own clinical decision support tools inside Claude, and how any clinician watching could do the same, that afternoon, without writing a line of code.
The overview article mentioned that Hobbs had a custom-built pediatric tool. What it did not capture is the method: the iterative loop of prompt, test, critique, refine, save, that turns a blank chat window into a personalized, reusable specialty consultant. That loop is genuinely teachable. Here it is.
This session is part of Offcall's AI Residency series. The previous session covered AI fundamentals. Sessions 3 and 4 cover cutting through the hype and vibe coding for clinicians.
The starting point is simpler than most clinicians expect. Hobbs's opening move was not a sophisticated prompt engineered over weeks. It was, essentially, a conversation. He told Claude that he was a pediatrician with some emergency medicine shifts coming up, that he wanted a clinical decision helper for that context, and that he wanted the output formatted in a way that worked for him.
Hobbs's framing of the right mindset was the most important part:
"I have no expectation that this is going to be perfect right out of the gate. And so we just see what it comes up with."
That expectation calibration matters. The clinicians who get stuck on AI tools are usually the ones expecting a finished product on the first try. The ones who get value are the ones treating the first output as a draft to argue with.
This is the move that changes the workflow entirely.
After Claude produced a starting prompt for the clinical decision tool, Hobbs did not sit down and manually test it against a dozen cases. He asked the model to do it. "Hey, test that for me." Claude generated a sample clinical case, ran it through the prompt, and showed what the output would look like.
This is functionally equivalent to having the model grade its own work. But because you can see both the test case and the response, you can grade it. The clinician's job becomes critique, not generation.
The critique loop in Hobbs's demo went like this:
That is the entire loop. State a preference. The prompt evolves.
Once the prompt produces output the clinician actually wants, the next move is to make it permanent. Inside Claude (and inside ChatGPT, with similar mechanics), Projects let you save a system prompt as a persistent context that applies to every conversation in that workspace.
The mechanics, as Hobbs walked through them:
The result is a tool that lives in your account indefinitely. Every new chat inside that project starts with the prompt already loaded. No re-pasting. No reconstructing context.
Walker described the same pattern from his own workflow: "I'll first say, hey, give me a summary of this paper. And then it'll do that, and I'll then ask the model, hey, can you take my prompt and make it like so much better, and you'll see it will give you a prompt that is way more thoughtful than you were."
That last move is worth highlighting. The model can improve its own instructions. A two-line prompt becomes a thirty-line prompt with explicit rules about uncertainty, formatting, sourcing, and tone, and the clinician did not have to write any of it.
The single most important set of instructions to include, the ones that separate a useful tool from a dangerous one, are about uncertainty.
Hobbs was direct about this:
"Lots of people asking about hallucinations in the chat. This is where you can give it instructions. Don't fabricate things. If you don't know, say so."
A custom prompt should include explicit language along the lines of:
These instructions do not eliminate hallucinations entirely. They reduce the rate substantially and, more importantly, they shift the failure mode from confident wrongness toward visible uncertainty, which is the failure mode a clinician can actually catch.
The most sophisticated version of Hobbs's setup, the one that drew genuine reactions from the audience, was not a single specialty tool but a multi-consultant panel. He had configured Claude to simulate a panel of subspecialists who debate the case and arrive at a consensus.
"I actually even created a subspecialty panel who can argue with each other about what the diagnosis is. And this is one of my favorite tools."
The mechanism is the same as everything else in this workflow. The system prompt instructs Claude to adopt the perspectives of, say, a hematologist, an infectious disease specialist, and a general pediatrician, to surface what each would consider, where they would disagree, and what the strongest synthesis looks like.
This is not magic. It is structured prompting. And it produces a different kind of clinical aid than a single-voice response: one that surfaces alternative interpretations the clinician might otherwise miss.
None of this changes the rules about protected health information. A custom Claude project is still a third-party tool. Pasting unmodified clinical notes into it is the same risk it was before you saved it as a project.
The right way to use these tools clinically is with cases stripped of identifiers: chief complaint, age range, relevant history, lab values, the clinical reasoning question, without names, MRNs, dates of birth, or any other direct identifier. The tool's value is in the reasoning support. The patient's identity is irrelevant to that reasoning and should stay out.
The dominant narrative around clinical AI is enterprise: hospitals select platforms, IT departments roll them out, clinicians use what they are given. That narrative is partially true and probably becoming more true.
But it misses the parallel track that Hobbs's demo represents. The tools to build personalized, specialty-specific clinical aids are now sitting inside a $20-a-month Claude or ChatGPT subscription. The skill required is not coding. It is the same skill clinicians use every day: stating clearly what you want, recognizing when you didn't get it, and iterating until you do.
The clinicians who develop that skill in the next year will have a meaningful advantage over the ones who wait to be assigned a tool. Not because the custom tool will replace clinical judgment, but because building it forces a clinician to articulate exactly how they reason, and that articulation, on its own, is worth the afternoon.
To watch Dr. Hobbs build the tool live, including the iteration loop and the subspecialty panel demo, watch the complete webinar here:
Offcall Team is the official Offcall account.
See what your colleagues are saying and add your opinion.