Which Fitness Wearable Actually Improves Your Training?
By Dr David Bell, Specialist Anaesthetist (Retired), Software Engineer, and Founder of Align AI Fitness, NSW, Australia
The wearable fitness market is splitting into two distinct camps. On one side you have the general smartwatch (Apple Watch, Samsung Galaxy Watch), designed primarily as a smartphone companion with fitness features bolted on. On the other you have dedicated training and recovery monitors (Garmin, Whoop, Polar), built ground-up around physiological data. The question worth asking is not which one has the best screen or the longest battery life. It is which one actually changes how you train.
Having spent years on both sides of this, as a clinician interpreting physiological data and as a software engineer building AI fitness systems, I have a particular interest in what these devices are actually measuring, how they derive their numbers, and whether any of it meaningfully improves outcomes. The answer is more nuanced than the marketing suggests.
How These Devices Actually Measure Your Body
Before comparing outputs, it is worth understanding what is going on under the hood. Every mainstream wearable uses one of two sensing methods for heart rate and related metrics.
Optical heart rate (PPG): Photoplethysmography uses LEDs (usually green, sometimes red/infrared) to shine light into the skin and measure changes in blood volume as the heart beats. This is how Apple Watch, Garmin, Whoop, and every other wrist-worn device measures your heart rate continuously. The accuracy is reasonable at rest and during steady-state exercise, but degrades significantly during high-intensity intervals, strength training, and any movement that introduces motion artefact. The wrist is a mechanically noisy site.
Electrical heart rate (ECG): A single-lead electrocardiogram measures the actual electrical signal of the heart. Apple Watch has this capability, though it requires deliberate activation (touching the crown) rather than continuous monitoring. Garmin's newer devices have a similar on-demand ECG feature. Single-lead ECG is useful for detecting atrial fibrillation but cannot replace a clinical 12-lead ECG for diagnostic purposes.
This distinction matters because every derived metric (VO2max estimates, HRV, training load, recovery scores) is built on top of the raw heart rate signal. Garbage in, garbage out.
VO2max: How It Is Calculated and How Much to Trust It
VO2max, the maximum rate at which your body can consume oxygen during intense exercise, is one of the most clinically meaningful fitness markers available. It correlates strongly with all-cause mortality and cardiovascular disease risk. So how does your watch estimate it?
Garmin uses the Firstbeat Analytics algorithm, which analyses the relationship between heart rate and speed during running (or power output during cycling). The algorithm models oxygen consumption from pace and heart rate data collected during normal training runs. A 2019 validation study found Garmin's VO2max estimates had a mean absolute percentage error of approximately 5% against laboratory testing, which is reasonable for a non-invasive field estimate. Accuracy improves with more training data and with calibration runs at different intensities.
Apple Watch uses a similar algorithmic approach for its Cardio Fitness score (which is its VO2max estimate), derived from outdoor walk and run data with heart rate. Apple's own validation data, published in peer-reviewed literature in 2021, showed estimates within one VO2max unit of lab values in many participants. The estimates are less accurate than Garmin's in the high-performance range, partly because the Apple Watch is less frequently worn during structured training.
Whoop does not estimate VO2max. This is actually a principled decision rather than a gap. Whoop's philosophy is recovery monitoring rather than fitness assessment, and VO2max estimation requires data from submaximal to near-maximal efforts that Whoop's passive monitoring approach does not reliably capture.
The practical implication: if tracking fitness progression matters to you, Garmin gives you the most validated VO2max estimate available outside a laboratory. Apple Watch is reasonable for general trend tracking. Whoop is not the right tool for this metric.
HRV: The Most Misunderstood Number on Your Wrist
Heart rate variability (HRV) is the variation in time between successive heartbeats. It is a proxy for autonomic nervous system tone, and by extension, a marker of recovery status, training adaptation, and sometimes illness. The concept is solid. The implementation in consumer wearables requires careful scrutiny.
True HRV (measured in clinical and research settings) is derived from continuous ECG with precise millisecond-level RR interval detection. The gold standard measurement is typically a 5-minute supine ECG recording taken under controlled conditions.
Garmin measures overnight HRV continuously using its optical sensor. The values are averaged across the night and presented as a 7-day rolling baseline. This approach reduces the noise inherent in single-reading PPG-based HRV, and the overnight average correlates reasonably well with morning ECG-based measurements in research settings. Accuracy is best during slow-wave sleep when heart rate is stable and motion artefact is absent.
Whoop takes a different approach, measuring what it calls HRV during a brief window during sleep (typically the last slow-wave sleep period before waking). Whoop measures pulse rate variability (PRV) from its optical sensor, not true HRV from ECG. PRV and HRV track each other reasonably well but are not identical, particularly at higher heart rates. Whoop's recovery score is derived primarily from this PRV measurement combined with sleep duration and resting heart rate. A 2021 validation study against polysomnography showed Whoop's sleep staging accuracy was roughly 70%, comparable to other consumer wearables but well below clinical polysomnography.
Apple Watch added a sleep tracking HRV feature in WatchOS 9. It takes spot measurements during sleep rather than continuous monitoring. The values are less stable than Garmin's overnight average because they represent shorter sampling windows.
The practical implication: HRV is genuinely useful for detecting non-functional overreaching and acute illness, but only when tracked as a trend against your own baseline. Single readings are almost meaningless. Garmin's overnight rolling average is the most practically useful implementation for regular training monitoring. Whoop provides a more polished recovery narrative but should be understood as PRV-based rather than true HRV.
Where Each Device Actually Changes Behaviour
The measure of a training tool is not its feature list but its effect on outcomes. Based on the evidence and my own observations building AI fitness systems, here is an honest assessment of where each platform actually changes training behaviour.
Garmin is strongest for pacing and training load management. Its Performance Condition metric, which adjusts VO2max estimates in real time based on HR-pace relationships during a run, is genuinely useful for modulating effort when conditions are suboptimal (heat, fatigue, illness). Its Training Status feature (categorising training as productive, maintaining, detraining, or overreaching) uses a legitimate model of acute and chronic training load. Cyclists using Garmin with a power meter get the most value, as power is a direct measurement of work rather than a physiological estimate.
Apple Watch is strongest for habit formation and consistency. The move/exercise/stand ring system is a remarkably effective behavioural nudge. For general population health goals, closing those rings reliably likely matters more than any other metric. The ECG feature adds genuine clinical value for detecting paroxysmal atrial fibrillation, which is often asymptomatic and carries significant stroke risk.
Whoop is strongest for recovery monitoring and sleep behaviour change. Its sleep coaching features and the framing of training through a recovery lens have demonstrably changed how some athletes approach rest. The main limitation is that Whoop provides no workout GPS, no pacing, and no real-time coaching. It is a passive monitor, not a training tool.
A Framework for Choosing
Rather than recommending a single device, the more useful question is: what do you most need to change?
If your main problem is inconsistency (you do not exercise regularly), the Apple Watch and its ring system is probably the best behavioural intervention available in wearable form.
If your main problem is overtraining or poor recovery (you train hard but do not improve, or you are frequently ill or injured), Whoop or Garmin's recovery features are worth the investment. The overnight HRV and sleep data can surface patterns you would not otherwise detect.
If your main problem is suboptimal training intensity (you do not push hard enough, or you pace incorrectly), Garmin combined with a chest strap HR monitor (more accurate than optical during high-intensity work) and ideally a power meter for cycling will give you the most precise feedback.
If you are managing a health condition alongside training, particularly cardiovascular risk, the Apple Watch's ECG and fall detection features add clinical value that the training-focused devices lack.
What None of Them Can Do
No consumer wearable currently provides accurate real-time lactate threshold detection (which requires blood testing), continuous glucose monitoring integrated with training load (though this is coming), or clinically valid VO2max measurement (which requires a metabolic cart). The best devices are getting closer, but it is important to understand what they are estimating and what they are measuring directly.
The other limitation worth naming: data without interpretation has limited value. A daily HRV readout tells you very little unless you understand what it means in the context of your recent training, sleep, stress, and health. This is precisely the problem AI coaching platforms are being built to solve. Whether the current generation of AI coaches (including Apple's reported Health+ service) can provide genuine clinical-grade personalisation remains to be seen. My view, having built in this space, is that the data quality from consumer wearables is now sufficient for meaningful personalisation, but the interpretation models are still catching up.
Choose the device that fits the gap in your training. Then use the data it gives you.
Dr David Bell is a specialist anaesthetist (retired), software engineer, and founder of Align AI Fitness, based in NSW, Australia.