The May forecast was locked on April 30, before the month began. May has now closed and the actuals are in. This is how the model did, where it missed, and what the system has already done about it.
The May forecast badly under-shot. The model called 7,863 units / $57,203. Actual was 14,531 units / $104,616. The forecast captured only 54% of units and 55% of revenue, a miss of +6,667 units and +$47,413 above the median. Actual did land inside the model's p10 to p90 band (1,552 to 24,527 units), but that band is so wide it was never decision useful.
Units forecast (p50)
7,863
vs actual 14,531 54% capture
Revenue forecast (p50)
$57,203
vs actual $104,616 55% capture
Units miss
+6,667
actual ran 185% of forecast
Forecast band (units)
1.6k–24.5k
actual at roughly p65–p70 band too wide
Weekly pace every week ran over forecast
Week
Forecast
Actual
Ratio
05/01–05/07
2,010
3,208
1.60x
05/08–05/14
1,913
3,455
1.81x
05/15–05/21
1,510
3,765
2.49x
05/22–05/28
1,830
3,124
1.71x
05/29–05/31
601
979
1.63x
The miss was not a one week spike. The under-bias is steady across the whole month, with the mid-month week (05/15 to 05/21) the worst at 2.49x.
Where the 6,667-unit miss came from
Under-forecast on tracked items
~5,081 units
168 ASINs the model forecast and that sold. It captured only 61% of their actual (7,852 forecast vs 12,933 actual). The bulk of the miss, concentrated in the top sellers.
Cold-start blind spots
1,598 units
52 ASINs that sold but had a forecast of exactly zero. New SKUs with no relaunch history, including the RAVE-205xx line. $13,746 of revenue the model never saw. About 24% of the miss.
Phantom over-forecast
12 units
57 ASINs had a forecast but did not sell, totaling just 12 units. Negligible. The model is not wasting forecast on dead inventory. The bias is purely directional under.
Top 15 ASINs by actual units
ASIN
SKU
Actual
Forecast
Actual rev
Read
B0C37Y1K1D
MOM-10845-175
3,765
1,282
$22,495
under 66%
B0BF45FMDX
MOM-10263-075
2,284
1,611
$15,308
under 29%
B0GKFL2TH9
RAVE-20523-150
1,151
0
$9,159
blind spot
B0756T5VV1
MOM-10721-100
534
506
$3,110
on target
B07DWRRYG9
MOM-10564-350
260
412
$2,537
over
B079T9PWXR
MOM-10073-210
254
150
$1,951
under
B07J35Z66B
RAVE-20058-080
246
329
$1,713
over
B0FQBBHS31
MOM-10248-150
215
178
$1,287
on target
B0BF4MLNFB
MOM-10276-075
212
65
$1,906
under 69%
B077XPQKDV
MOM-10790-100
197
76
$1,548
under 61%
B0756V9SCL
MOM-10719-100
179
135
$1,045
under
B0C37Y7BL9
MOM-10855-175
178
63
$1,149
under 65%
B01MV02GXC
MOM-020717-10408-225
147
81
$1,321
under
B0DC1CL2BS
MOM-10976-75
147
69
$1,133
under
B0GKFLQNQS
RAVE-20522-150
140
0
$1,377
blind spot
One SKU, B0C37Y1K1D, drove the single biggest dollar miss: 3,765 sold against 1,282 forecast. Two RAVE items forecast to zero account for $10,536 of unforecast revenue between them.
Stock and ad spend were not in the model
The forecast was pure demand. It did not know about inventory or our own ad decisions, and that contaminated the comparison. The tote line (all RAVE SKUs) is the clearest example: 11 of 15 totes were out of stock the entire month (available 0 on all 21 May snapshots), and the one fast mover, B0GKFL2TH9, sold 1,151 units while we cut its ad spend roughly 82% (from $853 the week of Apr 27 to $154 the week of May 25) to protect low stock. So its recorded demand was deliberately suppressed.
Across the whole catalog, 81% of SKU-snapshots in May were out of stock (18,935 of 23,316). Those zero-sales days were teaching the model zero demand, and the accuracy score was penalizing the model for sales it was never allowed to make.
The reassuring part: stockouts did not cause the 46% May miss. On the active forecast set, out-of-stock days were only 128 of 3,195 scored days and were mostly under-bias anyway. The big miss was on in-stock items selling more than predicted. If anything, true May demand was higher than 14,531, because the out-of-stock totes had demand we could not fill.
Shipped today: the system is now stock-aware. (1) Scoring tags every day with its stock status (out of stock, low, ok, unknown) and reports a stock-fair accuracy that excludes days the SKU could not sell its forecast. (2) Training now uncensors out-of-stock days, imputing the trailing in-stock run-rate instead of a false zero, so the model learns real demand. At today's data this lifts the forward 245-day forecast +3.4% (419 out-of-stock days across 39 SKUs). Both go live on the next nightly run. It deliberately does NOT touch deliberate ad cuts or days where stock was available, so we never fabricate demand.
My read Woz
The baseline is the Chronos-Bolt-mini model, which validated on an April holdout at 32.6% weighted error with a known 13 to 15% portfolio under-bias. May came in at 46% under, three times the validated bias. The reason is straightforward: April was not a seasonal peak and May is. These are party and event products (koozies, yard signs, suspenders, bow ties), and May stacks graduation, wedding season, Mother's Day, and Memorial Day on top of each other. The per-ASIN May seasonal multipliers were set too low.
There are two distinct failure modes, and they need different fixes:
1. Systematic under-bias on tracked items. The model knew these SKUs and still under-called them, worst mid-month. This is a seasonal-multiplier problem, and it is self-healing (see below).
2. Cold-start blind spots. Brand-new SKUs with no relaunch history forecast to exactly zero. The RAVE line is the example. This will not self-heal, because there is no history to learn from. It needs a structural fix.
The quantile band is directionally fine, actual was in-band, but it is too wide to act on: p90 was three times p50. A band that says "somewhere between 1,600 and 24,500 units" does not help anyone cut a PO.
Good news, already verified: the system self-calibrates on the 1st of each month. The June 1 run scored the May residuals and raised the May multiplier for 179 of 199 ASINs, average 0.894 up to 1.099 (a +23% lift). So the directional under-bias on tracked items is genuinely correcting itself. The blind-spot problem is the part that still needs a code change.
What I recommend
1
Add a cold-start floor.
Brand-new SKUs with no history should inherit a category or sibling-ASIN level forecast instead of zero. This is the one failure the monthly calibration cannot fix on its own. It cost $13,746 in unforecast revenue in May alone.
2
Tighten the quantile band.
A p10 to p90 spread of 1.6k to 24.5k is not actionable. Calibrate the band width per ASIN so it tracks real observed variance, not model uncertainty.
3
Validate on a seasonal peak month, not just April.
The 13 to 15% validated bias gave false confidence. Hold out a known peak month so the reported accuracy reflects the months that matter for purchasing.
4
Re-score next month to confirm the +23% lift landed.
The June calibration raised the May priors. The proof is whether the June and next-May forecasts come in tighter. I will track it.