Method
How these rankings are computed.
The engine implements the iterative maximum-likelihood method Roy Bethel published in 2005, built on the Bradley–Terry–Ford model for paired comparisons. A team's strength st is the positive number that, together with every opponent's strength, best explains the observed wins and losses. Concretely, the probability that team A beats team B on a neutral site is
P(A beats B) = sAsA + sB
Strengths are estimated by iteration. At each step, every team's strength is recomputed from its wins and the current strength estimates of its opponents; when the estimates stop changing (to within 1e-10), the algorithm terminates. Real-team strengths are then normalized so their geometric mean is 1.
The Bayesian prior
Undefeated teams drive the pure maximum-likelihood estimate to infinity; winless teams drive it to zero. The engine applies a small Bayesian prior — one fictional win and one fictional loss for every team, against an anchor of strength 1 — which bounds both extremes without materially moving the interior of the ranking. Bethel discusses this adjustment in §8 of the paper.
Connectivity
The paper's only unsolved issue is graph connectivity: if two teams have no chain of opponents linking them, their relative strength is not defined by the data. The site flags this rather than hides it. On the Compare page, any pair of teams that never played directly shows the shortest chain of shared opponents. When the chain is empty — when the two teams live in different graph components — the comparison is refused rather than guessed.
Per-game contribution
For teams where it has been computed, the team page shows how each individual game moved that team's strength. The method is leave-one-out: the engine rebuilds strengths from the graph without a single game and reports the difference. A win against a strong opponent contributes more than a win against a weak one; a loss to a weak opponent hurts more than a loss to a strong one.
What this engine does not do
- It does not adjust for home-field advantage. The paper flags this as future work in §8; we respect that boundary.
- It does not weight recent games more heavily than early-season games.
- It does not cap margin of victory or incorporate score differential at all — winning percentage doesn't know margin, and Bethel's method is a winning-percentage-compatible estimator.
- It does not attempt to match any other ranking system's output. The validation target is predictive accuracy on held-out games, not agreement with a different method.
Validated on 2026 games
Trained on games before April 1, 2026 and scored on all games from April 1 onward (n ≈ 1,000), this engine picks the winner 76% of the time and has the lowest log-loss of any method tested — including naive winning-percentage baselines and a classical RPI-as-probability proxy. Full methodology in the findings writeup.