What Apple's AI Health Coach Gets Right, and What It Will Probably Get Wrong
By Dr David Bell, Specialist Anaesthetist (Retired), Software Engineer, and Founder of Align AI Fitness, NSW, Australia
Apple's Health+ project was supposed to be the moment AI fitness coaching went mainstream. An AI health coach built into your Apple Watch and iPhone, trained on data from Apple's on-staff physicians, capable of analysing your sleep, heart rate, activity, and nutrition to deliver genuinely personalised guidance. It was ambitious, clinically informed, and (according to Bloomberg) recently scaled back after leadership changes, with Eddy Cue reportedly concluding it was not yet compelling enough to challenge Whoop and Oura.
As someone building exactly this kind of platform, I find the whole saga instructive. Apple identified the right problem. They had the right data. And they still could not get it across the line. That tells you something important about how hard clinical-grade AI coaching actually is.
What Apple Got Right
Credit where it is due: the original Health+ vision addressed the single biggest failure in consumer fitness technology. Most wearables collect enormous volumes of physiological data and then do almost nothing useful with it. Your Apple Watch knows your resting heart rate, your HRV, your sleep stages, your VO2 max estimate, and your daily movement patterns. It tells you to stand up and close your rings. The gap between data collection and actionable coaching is vast.
Health+ aimed to close that gap. The reported plan included an AI coach that would explain why your energy felt low on a given day, suggest lighter workouts after poor sleep, and flag signs of overtraining before you ran yourself into the ground. It would integrate meal logging with nutritional feedback, and even use the iPhone's rear camera to analyse workout form in real time.
These are the right problems to solve. If you track HRV and sleep but never adjust your training based on what the data shows, you are essentially wearing an expensive pedometer.
Where the Difficulty Lies
Having spent the past few years building AI coaching systems, I can tell you the engineering challenge is not in collecting the data or even in running inference on it. The hard part is bridging the gap between what a model can predict and what constitutes safe, useful advice for an individual human body.
Consider a seemingly simple scenario. A user's HRV drops 15% overnight. Their sleep score is below average. They have a high-intensity session scheduled. What should the AI recommend?
A naive system says: "You seem tired. Consider a rest day." A slightly better system cross-references the pattern with historical data and says: "Your HRV has been trending down for three days. Swap today's intervals for zone 2 work."
But a clinically informed system needs to consider whether that HRV drop is from overtraining, alcohol, dehydration, onset of illness, medication changes, menstrual cycle phase, or ambient temperature. Each of those has a different optimal response. Getting this wrong is not just unhelpful; it can reinforce poor training habits or, worse, mask early warning signs of genuine health problems.
This is the core challenge Apple reportedly struggled with. Training an AI on physician knowledge does not make it a physician. Pattern recognition at scale is powerful, but human physiology is not a recommendation engine problem. It is a clinical reasoning problem, and those are fundamentally different.
The Data Quality Problem Nobody Talks About
There is a second, more technical issue that rarely gets coverage. Wearable sensor data is noisy. Heart rate monitors skip beats. Wrist-based optical sensors lose accuracy during movement. Sleep staging algorithms disagree with polysomnography (the clinical gold standard) roughly 20 to 30 percent of the time, depending on the device and the study.
When you build coaching logic on top of noisy inputs, errors compound. A misclassified sleep stage leads to a wrong recovery score, which triggers an inappropriate training recommendation. The user follows the advice, gets a poor workout, and loses trust in the system. I have seen this failure mode repeatedly in early versions of our own platform, and solving it required building explicit uncertainty handling into every layer of the pipeline, something most consumer apps skip entirely.
Apple has best-in-class hardware, but even the Apple Watch Ultra's sensors produce data that requires careful probabilistic interpretation before it can drive coaching decisions. The gap between "sensor reading" and "clinical insight" is where most AI fitness products fall apart.
What a Good AI Coach Actually Needs
From my experience building Align AI Fitness, there are three things an AI health coach needs to get right before it is worth using:
Contextual awareness beyond the sensor data. The system needs to know about your training history, your goals, your injury history, your life context. A 15% HRV drop means something different for a 25-year-old competitive cyclist than for a 55-year-old returning to exercise after a cardiac event. Most AI coaches treat all users as the same archetype with slightly different numbers.
Transparent reasoning. When I was practising anaesthesia, I would never administer a drug without being able to explain why, what I expected it to do, and what I would do if the patient responded unexpectedly. AI coaches should operate the same way. "Take a rest day" is not coaching. "Your HRV has declined 12% over three sessions while your training load increased 20%, which suggests incomplete recovery; here is a modified session that maintains your weekly volume at lower intensity" is coaching.
Graceful failure modes. The system needs to know what it does not know. If the data is ambiguous, the right answer is not to guess confidently. It is to say so, and to escalate when appropriate. This is where Apple's physician-trained approach had real potential, but also where the liability and regulatory questions become genuinely complex.
Why This Matters Beyond Apple
Apple shelving Health+ does not mean AI coaching is a dead end. It means the bar is higher than most people assumed. Nike is building its own fitness LLM. Google is pushing Gemini deeper into the Fitbit ecosystem. The AI fitness app market is projected to exceed $24 billion by the end of this year. The race is very much on.
But the companies that will win are the ones that treat this as a clinical engineering problem, not a consumer feature. The hard work is not in the interface or the marketing. It is in the probabilistic reasoning, the uncertainty handling, the contextual awareness, and the willingness to say "I don't have enough information to advise you on this."
That is what I am trying to build. And that, I suspect, is what Apple discovered is harder than it looks.
Dr David Bell is a specialist anaesthetist (retired), software engineer, and founder of Align AI Fitness, based in NSW, Australia.