Your wearable tells you three different sleep stories

Last Tuesday night you slept 7 hours 12 minutes.

Open your Garmin in the morning. Sleep score 84. Solid night.

Open your Apple Watch. 38 minutes deep, 1h 22m REM, the rest light and awake. Total in bed 7h 38m.

Open your Whoop. Recovery 47%. Yellow. Sleep performance below your monthly average.

Three apps. One night. Three different sleep stories. None of them are technically wrong. None of them agree.

This is the part of wearable ownership that almost nobody talks about, because admitting it makes the whole category sound less reliable than it has been sold as.

Why They Disagree

Every wearable has its own definition of deep sleep.

Garmin uses a movement-and-heart-rate model with a specific threshold for what counts as deep. Apple’s algorithm has been retuned twice since launch and pulls different signal weights. Whoop has more aggressive boundaries for stage classification and is biased toward classifying borderline minutes as light rather than deep.

So when you see 38 minutes of deep on the Watch and 52 minutes on the Garmin for the same night, neither device is lying. They are answering different questions with the same input.

There is also the total problem. Some devices count time in bed. Some count time asleep. Some count time from sleep onset to final wake. The differences are usually 20 to 40 minutes, but compound when you compare a sleep score that is calculated from a total versus one calculated from a stage breakdown.

And then the normalisation. Whoop compares your night to your own 30-day baseline. Garmin compares it to a population average for your demographic. Apple does not explicitly score the night at all; it just shows you the breakdown and lets you decide. Three different anchor points.

The score is not the night. The score is the night, filtered through that vendor’s particular set of assumptions about what a good night looks like, for someone they think looks like you.

Why You Stop Noticing

Athletes stop checking after a while.

You wake up. You feel tired. You check the apps. They all say something slightly different. You pick whichever one most matches how you feel and you trust that one for the day.

This is a sensible defence mechanism. It is also exactly the wrong thing to be doing.

The whole point of having data is to surface things you would not have noticed on feel. If you only trust the score that confirms your felt sense, you have spent two hundred dollars on a watch to tell you what you already knew.

The interesting question is not which app got the night right.

The interesting question is which signal is causing them to disagree.

Which Signal Is the Swing Factor

If Garmin shows 84 and Whoop shows 47% on the same night, the disagreement is almost always about one or two underlying numbers.

Usually it is one of these:

Heart rate during sleep. Your average sleep HR drives recovery scoring more than any other single signal. If one device is reading your sleeping HR five beats higher than another, that one device thinks you are stressed and the other does not. Look at the raw HR-during-sleep chart in each app. The one with the higher number is the one giving you the worse score. Decide which one’s optical sensor you actually trust on your wrist.

HRV. Same story. Whoop is famously HRV-heavy in its recovery score; if your HRV came in below your baseline, your recovery score will be amber even with full sleep duration. Garmin weights HRV less. Apple does not score it into a single number at all. The result is that one bad HRV reading can move Whoop’s number significantly while leaving Garmin’s untouched.

Total time asleep. Some devices are aggressive about marking the wake at the end. If you lay in bed for 25 minutes scrolling your phone before getting up, one device counts that as light sleep and one counts it as awake. 25 minutes is enough to swing a score.

You are looking for the underlying number that is different across the apps, not for the surface score.

What We Did

Inside P247 we had our own version of this problem.

The same night was being narrated in three places on the dashboard. The sleep coaching list. The sleep detail screen. The recovery score breakdown. Each one was re-deriving which sleep signals to surface from the raw data, and each one made slightly different choices about wording, ordering, and which signals to suppress.

A low-daylight signal would appear in the coaching block but not in the detail view. An aggressive sleep efficiency number would show up in one screen with a warning tone and in another with a neutral one. The same night, three different stories. The same problem the wrist-worn industry has, replicated inside one app.

We fixed it by building a single canonical list of sleep signals. The list is computed once per night. Every screen renders from it. A signal can still be hidden on one surface for layout reasons. It cannot be worded differently or appear differently across surfaces, because they are all pointing at the same list.

We cannot fix this across vendors. Garmin and Whoop are not going to agree on the deep-sleep boundary. But we can pull all of an athlete’s wearable data together and tell them which underlying signal is the swing factor when the apps disagree.

What to Actually Do

When two of your apps tell you different things about the same night, the right move is not to pick the one you trust.

The right move is to figure out which underlying signal is causing them to disagree.

Sleep HR. HRV. Time asleep. One of those three is moving differently between the devices, and that is the thing worth thinking about. Either one of your devices is reading it inaccurately, in which case you should know that, or both are reading it accurately and the disagreement is about how each app weights the signal, in which case the signal itself is the story.

The score is a summary.

The signal underneath the summary is the data.

P247 reads across all your wearable data and tells you what the swing factor was each night. One narrative for the night, not three. Built so that when the apps disagree, the question gets clearer instead of muddier.

X Thread

1/ Garmin says 84. Apple shows 38m deep, 1h 22m REM. Whoop says recovery 47%. Same night. Three different stories.

2/ None of them are wrong. They are answering different questions with the same input. Different stage boundaries. Different baselines. Different normalisation.

3/ The score is not the night. The score is the night filtered through that vendor’s assumptions about what good looks like for someone they think looks like you.

4/ The interesting question is not which app got it right. It is which underlying signal - sleep HR, HRV, time asleep - is causing them to disagree.

5/ P247 reads across your wearable data and tells you the swing factor each night. One narrative, not three.