6 min read

Why your betting model lies to you in-sample

Every model looks profitable in backtest. The gap between backtest ROI and live ROI is where the whole industry loses money — and it's almost always the same three bugs.

  • Backtesting
  • Model Validation

The gap

You build a model. Backtest says +14% ROI over 4 seasons. You stake it live and after 300 bets you’re down 3%. This is not bad luck. It’s one of three things, in descending order of frequency:

  1. Lookahead leakage in your features.
  2. Closing-line bias in your training labels.
  3. Survivorship bias in your dataset.

1. Lookahead leakage

The classic form: you compute a team’s “form” as last-10-game rolling average, but your data source updated the stat after the game finished. When you train on that row, your model sees the future.

Subtler forms:

  • Injury reports that the public didn’t have at tip-off but your scraper picked up from a post-game recap.
  • Implied probability from a closing line you accidentally joined on instead of the opener.
  • Referee assignments published 48h in advance but backfilled into rows dated a week earlier.

The fix: every feature must have a timestamp. Every training row must assert feature_timestamp < game_start - T_known_to_public. If you can’t prove that, the feature is contaminated.

2. Closing-line bias

The sharpest available price is the closing line (CLV). But the closing line already incorporates most of your model’s signal — that’s why it’s sharp. If you backtest against opening prices you assume you can beat the close, but live you’ll take the best price available at bet time, which is usually somewhere between open and close.

Calibrate against the price you’ll actually get. For most syndicate operations that’s ~30–60 minutes before tip-off, with a 2–5% worse line than the opener.

3. Survivorship bias

Datasets from Kaggle, football-data.co.uk, and most paid aggregators have a subtle filter: games that were played to completion. Abandoned matches, postponements that shifted odds, and injury replacements mid-game are silently dropped. Your model never learns to handle them because they don’t exist in training.

In live operation they’re 0.5–2% of your bet universe. That’s enough to eat a thin edge.

The test that saves you

Before you go live, run walk-forward validation:

  • Train on 2020–2022.
  • Predict every game in 2023, in chronological order, refitting only on data available at prediction time.
  • Compare the walk-forward ROI to the IID cross-validation ROI.

If walk-forward ROI is 30%+ lower than cross-validation ROI, you have leakage. Find it before you stake real money.

The ugliest graph in sports modeling is the one where in-sample Sharpe is 2.1 and out-of-sample Sharpe is 0.3. That graph is your portfolio.

What we do

On every Edge engagement, before the model touches real capital it passes:

  • Walk-forward ROI within 15% of IID cross-val ROI.
  • CLV-adjusted expected value positive across three independent sportsbooks.
  • Draw-simulated bankroll path: 1,000 Monte Carlo replays of the live season; we require 95% of paths to avoid a 20% drawdown.

Three gates. Miss any one and the model doesn’t ship.