Reading the Chart with a Public LLM, Safely

A patient chart with PHI fields masked before being sent to a public LLM, then re-identified on return, with an audit log

Sector

Healthcare & pharma · multi-state network

Engagement

Privacy-preserving LLM access to EMR data

Timeline

5 months to production

Scope

2.3M records processed · 12 downstream use cases

Platform

PHI detection & masking · tokenization · public LLM · audit

Status

In production with a full audit trail

0

identifiable records exposed

64%

less time on chart review

$2.1M

saved vs a private build, yr 1

2.3M

records processed safely

Everyone wanted the frontier model. No one could send it the chart. This multi-state network saw what large public LLMs could do with clinical text: summarize a sprawling chart, surface what matters, draft the first version of a dozen documents. The catch was non-negotiable. You cannot put protected health information into a public API and stay compliant, full stop.

Building a private model for every use case was expensive and slow. The network wanted frontier capability without frontier risk, and without a multi-million-dollar infrastructure project to get there.

The challenge

Could a public LLM read the chart and do real work without ever seeing a piece of information that could identify a patient? The answer had to satisfy not just the engineers but the compliance team, on demand, in writing.

The approach

We separated the medicine from the identity. Before anything reaches the model, a detection layer finds every piece of PHI in the record and replaces it with a consistent token. The masked text, still clinically complete, goes to the public LLM. When the answer comes back, the same tokens are swapped for the real values inside the network. The model does the reasoning; it never sees a name, a number, or a date that points to a person.

01

PHI detection that doesn't miss

A layer tuned to catch identifiers in messy clinical text, from the obvious fields to the ones buried in a free-text note.

02

Consistent tokenization

Each identifier becomes a stable placeholder, so the model can still reason about the same patient without ever knowing who that is.

03

Re-identification inside the walls

The mapping between tokens and real values never leaves the network. Answers are restored to full fidelity only after they return.

04

An audit trail for every request

Every call is logged and reviewable, so compliance can prove, on demand, that no identifiable data was ever transmitted.

The model read the medicine. It never met the patient.

Privacy pipeline: detect PHI, tokenize, send masked text to a public LLM, then re-identify the result, with a full audit trail — FIG.02Detect, tokenize, send masked text to the public LLM, then re-identify on return. Frontier capability on the chart, with no identifiable data ever leaving your control.

The outcome

The network got the capability it wanted at a fraction of the cost and time of a private build. Chart review and summarization that used to eat hours got far faster, millions of records were processed safely, and compliance had a clean answer to the only question that mattered: nothing identifiable ever left.

You don't have to choose between the best model and the patient's privacy. Mask the identity, keep the medicine.

The same pipeline now sits in front of a dozen use cases, each inheriting the same guarantee. As stronger public models arrive, the network can adopt them without touching its privacy posture.

Want frontier AI on sensitive data?

Start with a focused 90-minute AI readiness assessment. It's a candid read on what you can do safely with public models, and what will actually reach production.

Take the assessment →

Frontier LLM capability on the chart, without exposing a single patient

The challenge

The approach

The outcome

Want frontier AI on sensitive data?

More work & thinking.