How Rithmm's MLB Backtesting Actually Works

Published on
May 14, 2026
Sean Ramsey
Make Better Betting Decisions with AI
We do the math, you make the play. Rithmm helps you use predictive models to make better bets and trades.
Start Free 7-Day Trial

A Question Worth Answering Properly

A question came up in the Rithmm community recently that deserves a real answer. The app shows the MLB models running against roughly 1,450 games, but the backtesting documentation references 3+ years of data. Those two things seem to conflict. Here's what's actually going on.

The Full Dataset Is About 3,000 Games

The total backtest pool for MLB is approximately 3,000 games. That pool gets split into two separate groups before any analysis begins. The first group is used exclusively to calibrate the recommendation windows, meaning the ranges the models use to identify where there's value in a given matchup. The second group is used exclusively to calculate the ROI and win rate numbers you see reported in the app.

The 1,450 figure represents one half of that split. It's the set used for performance measurement, not the full dataset.

Why the Split Matters

If you use the same data to both build a strategy and then measure how well it performed, the results will almost always look good. The strategy was tuned on those exact games. That's not a meaningful performance signal, it's circular reasoning dressed up as a track record.

Keeping the calibration set and the measurement set completely separate ensures that the ROI numbers reflect performance on games the models were never optimized against. That distinction is what makes the reported numbers trustworthy.

Which Games Are in the Sample?

The backtest doesn't pull strictly from the most recent seasons. Rithmm samples across the last several years so that recent seasons are represented in both halves of the split. The goal is a dataset that reflects how the game plays today while still having enough historical depth to build reliable patterns. For most sports Rithmm covers, the backtest spans roughly three years. For MLB specifically, given the high volume of games per season, meaningful depth is reached with fewer full seasons.

What This Means in Practice

When you see a pattern with a reported win rate and ROI in the app, those figures were measured on games that had no role in shaping the recommendation windows. The methodology is designed to give you an accurate picture of historical performance, not an optimistic one.

STOP GUESSING.
START KNOWING.