Agetech · V2D hero · dashed gold blocks mark assets still needed Read the editorial memo →
Fortune 500 insurer (innovation team) · 2026 UX & service design lead

The prototype that proved the brief wrong

A working AI service prototype that replaced a sequential intake interview with eight task-shaped tools — and made a funded beta defensible.

The case in 7 moves

  1. Making the concept testable
  2. Designing what the AI asks
  3. Why v1 failed seven caregivers
  4. Rebuilding v2 around tasks
  5. What the executive proved
  6. What we got wrong
  7. The model that stuck
12
caregivers tested
v2 model
production specification
Beta
greenlit and funded

An innovation team had spent eight months developing an AI caregiving service. The concept had legs. But a concept is not a service, and positive reactions are not signal from real users. The prototype exposed a design assumption so fundamental that fixing it required rebuilding the interaction model from scratch, mid-study, after seven sessions. The rebuilt version converted the executive holding the budget. The test of a prototype isn't whether it proves the concept works. Sometimes the test is whether it finds the thing that doesn't.

Chapter 01

Coming in to make the concept testable

The team had spent eight months developing a concept for an AI-powered caregiving benefit for working family caregivers. Foundational research existed. Archetypes had been built. Concept testing had shown positive reactions. The question: was there enough signal to justify building the actual technology, running a two-month beta, committing real resources?

Static mockups can't tell you whether caregivers will trust an AI, engage with its tone, or find its questions useful rather than invasive. The prototype had to behave like the service.

I came in to design and build the prototype, test it with real caregivers, synthesize what we learned.

Service blueprint — full caregiver arc
The service blueprint — designed to run on a custom GPT, not a wireframe. The prototype had to behave like the real thing.

Chapter 03

Why v1 failed seven caregivers, before being rebuilt

The original plan was twelve caregivers, one version of the prototype. Several sessions in, that wasn't going to work. The v1 system moved through six sequential phases: welcome, questionnaire, more questionnaire, forecast, guidance, return. By the time participants reached the forecast, they had given a great deal and received nothing.

One participant said it before the forecast even appeared: "I just gave you all that information." The disappointment was already in the room.

A second problem was structural. The GPT presented a numbered list of screening questions. The conversation continued. A second numbered list appeared further down. A participant answered "1" — responding to the second list — but the GPT interpreted it against the first and surfaced a condition that didn't apply. The conversational format had a vulnerability that sequential design couldn't fix.

The last cohort 1 session pointed at something deeper. A participant's mother was in the final stage of dementia. The system kept moving through the questionnaire: what kind of help was her mother getting, who else was involved, what had changed recently. Each question took her further from what she needed in that moment.

We aborted the session. v1 had no graceful path for a user already most of the way through a caregiving journey. The design had assumed a user looking forward.

Questionnaire tiers — three-tier architecture
v1 architecture — six sequential phases, value deferred to the end. The structure that failed seven caregivers before being redesigned.

Chapter 05

What the executive's session actually proved

The second cohort ran on v2. The difference was immediate. Participants leaned in. In almost every session, we had to prompt them to wrap up because they kept asking the AI more questions. Nobody asked what they were getting in exchange for sharing personal information. The value exchange resolved itself.

After the study, the executive with budget authority asked to try the rebuilt prototype. He used his own family's experience from several years earlier. Within minutes he stopped evaluating and started using it — moving through modes, asking for a forecast, asking follow-up questions about doctors, about situations beyond memory loss. We had to tell him we needed to wrap up. A recruited participant engaging deeply is signal. The person controlling the budget engaging with it as a real user is different evidence.

He also exposed the forecast's biggest remaining content problem: his family's situation had involved ambiguous early symptoms, and the forecast projected a five-year trajectory of progressive deterioration without acknowledging that ambiguity. I flagged it as a content problem the next phase has to solve at the model level.

Executive session — v2 in use
Cohort 2 sessions ran on v2 — same study, redesigned prototype.
This would have freaked us out at the time.

— Executive, on the forecast's missing acknowledgment of ambiguity

The reframe

What we got wrong about what caregivers needed

We built v1 assuming an AI caregiving service should work the way a thorough intake interview works: gather everything first, then deliver value at the end. Caregivers are not patients about to receive care; they are people already managing a crisis, carrying a daily cognitive and logistical load, perpetually short on time and emotional reserves. An onboarding that asks them to give before it gives back doesn't just underperform — it replicates the dynamic they're already exhausted by.

v2 fixed this at the model level, not the flow level. Re-sequencing v1 would have produced a less frustrating version of the same wrong thing.

What stays behind

How the v2 model became the production spec

Three things came out of this work that weren't in the original brief. The interaction model from v2 — modal, artifact-first, value before intake — is now a design principle for the service going forward. The mode definitions and example outputs are serving as the experience specification for the production AI system the client is building. When the client suggested skipping user testing on the next version, the argument had already been made.

The prototype made the beta defensible. A two-month beta required proof that caregivers would engage, that the interaction model would hold, and that the service concept was more than positive reactions to an idea. That proof came from building the real thing, discovering it was wrong, and rebuilding it before the study ended.

v2 mode definitions — handoff to production team
The v2 interaction model is now the experience spec for the production AI build.