RoboNiner is a multi-layer career projection system covering MLB players and all levels of the minor leagues. It generates Y+0 through Y+18 (or until career-end) projections for every hitter and pitcher in the affiliated minor league system.
Design Goals
Understanding the major public systems clarifies RoboNiner's design choices.
PECOTA (Baseball Prospectus)
Method: Comparables-based. Finds 25 historical players most similar to the target via Mahalanobis distance (age, level, skill rates, body type, handedness), then blends their actual career trajectories. Produces percentile bands (10th/90th).
ZiPS (Dan Szymborski, FanGraphs)
Method: Marcel-based (3-year weighted average with regression), modified by Tango/Lichtman aging research. Comps inform aging curves rather than trajectories directly.
Steamer (Jared Cross, FanGraphs)
Method: Component-based regression with aging. Separately models AVG, OBP, SLG, K%, BB%, BABIP using independent regression weights. Consistently the most accurate single-year system in public evaluations.
THE BAT X (Derek Carty)
Method: Neural network / ML approach on Marcel foundation. Most aggressive Statcast integration of any public system.
OOPSY (Dave Fleming)
Method: Similar to ZiPS with validated young-for-level adjustments. Key insight: a player 2+ years younger than the average at their level has demonstrated elite talent relative to peers.
Projections are built sequentially. Each layer adds information the prior layer lacks.
Layer 1: Marcel+ Foundation
Base: Tom Tango's Marcel system (5/4/3 year weights) extended with stat-specific regression. K% and BB% regress quickly (high year-to-year correlation); BABIP regresses heavily (noisy, environment-sensitive):
| Stat | r² (yr-to-yr) | Regression PA |
|---|---|---|
| K% | 0.82 | 50 PA |
| BB% | 0.73 | 100 PA |
| HR/PA | 0.70 | 150 PA |
| AVG | 0.46 | 270 PA |
| BABIP | 0.44 | 280 PA |
Young-for-level bonus (from OOPSY research): a player 2+ years younger than the league average at their level has demonstrated elite talent relative to peers. Regression shrinkage is reduced up to 80% for players 3+ years young for their level:
ageRelativeBonus = 1.0 + min(0.80, yearsYoung × 0.28) averageAges: ROK=19.0, A=21.5, A+=22.5, AA=23.5, AAA=26.0, MLB=28.5
Minor league regression multiplier: minor league stats are noisier than MLB. Regression is amplified by level:
| Level | Hitter mult | Pitcher mult |
|---|---|---|
| MLB | 1.0× | 1.0× |
| AAA | 2.0× | 2.5× |
| AA | 3.5× | 4.0× |
| A+ | 4.5× | 6.0× |
| A | 5.5× | 8.0× |
| Rookie | 7.0× | 10.0× |
Layer 2: Component Rate Decomposition
Rather than aging AVG/OBP/SLG directly, we decompose to rate components and age each independently. Hitters: K_rate, BB_rate, HR_rate, BABIP, 2B_rate, 3B_rate, SB_rate, HBP_rate. Pitchers: K_rate, BB_rate, HR_rate, H_rate, LOB%.
MLEs convert minor league stats to their MLB equivalent. A .310 AVG in AAA is worth roughly .297 in MLB. A .310 AVG in Low-A is worth roughly .240.
How MLEs Are Computed
For each year/level, we collect players who appeared at that level and MLB within the same season (called-up players), then compute the ratio of their rate stats at each level.
⚠ The Selection Bias Problem
| Level | Empirical HR_rate | Population Reality |
|---|---|---|
| AAA | 1.50× | ~0.92× |
| AA | 1.42× | ~0.78× |
| A+ | 1.42× | ~0.68× |
Population-Level Caps
We apply maximum multiplier caps reflecting what the average player at each level would translate to:
| Level | AVG cap | SLG cap | HR_rate cap |
|---|---|---|---|
| AAA | 0.960 | 0.940 | 0.920 |
| AA | 0.930 | 0.880 | 0.780 |
| A+ | 0.910 | 0.820 | 0.680 |
| A | 0.880 | 0.760 | 0.550 |
| Rookie | 0.840 | 0.680 | 0.420 |
Park factors are applied at the component level, not just summary stats. Coors Field boosts HR differently from BABIP.
Computation Method
MLB + AAA: PF = (home_stat / home_PA) / (away_stat / away_PA) × 100 using homeAndAway splits from the MLB Stats API. Three-year weighted smoothing (5/4/3) reduces season noise.
AA/A+/A/ROK: Single-regression approximation: PF = (team_stat_rate / league_avg_rate) × 100. Less accurate but adequate at the aggregate level.
Application: parkAdjusted = observed / (1 + (PF - 100) / 200). The /200 denominator (not /100) reflects that players play ~50% of games at home.
Notable Parks (Current Season)
| Park | HR PF | Notes |
|---|---|---|
| Coors Field (COL) | 112+ | Highest elevation, enormous outfield |
| Great American (CIN) | 109 | Short RF porch |
| Petco Park (SD) | 91 | Marine layer, deep CF |
| Oracle Park (SF) | 88 | Pitcher's park, McCovey Cove |
Beyond overall level factors, we compute per-league adjustments within each level. The PCL (Pacific Coast League, AAA) is historically hitter-friendly; the IL (International League) runs closer to MLB environments.
getMLEFactor(levelKey, leagueId) first attempts a league-specific factor, then falls back to the level default. A pitcher who spent two years in the hitter-friendly PCL won't have his ERA unfairly inflated when translated to MLB.| League | Level | Character |
|---|---|---|
| Pacific Coast League | AAA | Hitter-friendly (desert parks, altitude) |
| International League | AAA | Pitcher-friendly, closer to MLB run environment |
| Texas League | AA | Hitter-friendly (heat, wind, small parks) |
| Eastern League | AA | More pitcher-friendly |
Step 1: Find Historical Comparables
For each projection target, find 25 players from the Lahman database (1955–2025) most similar at the target's current age. Similarity measured via Mahalanobis distance:
| Dimension | Weight |
|---|---|
| Age at entry | High |
| K_rate, BB_rate | Medium |
| ISO (isolated power) | Medium |
| BABIP | Medium |
| Position group | High |
| Body type (height/weight) | Medium |
| Modern era bonus (2000+) | Low |
Step 2: Extract Comp Aging Curves
For each comp, trace year-over-year stat changes weighted by: sample size (PA/BF) and comp distance (1 / (0.5 + distance)). Result: per-stat, per-age aging deltas for K_rate, BB_rate, BABIP, HR_rate, ISO, SB_rate.
Step 3: Blend Comp vs. Baseline
if compConfidence >= 0.5: delta = comp_delta × blend + baseline_delta × (1 - blend) blend = min(0.80, compConfidence) else: delta = baseline_delta # fall back to research-backed fixed curves
Career endpoints from attrition: At each age, we track what fraction of the comp pool was still active. When fewer than 20% of comps are still playing AND OPS < .550 (or ERA > 6.50), the player retires.
Baseline Aging Parameters (Fallback)
Sources: Tango/Lichtman (The Book, 2006), FanGraphs aging studies, Lichtman's OOPSY research.
| Stat | Peak Age | Notes |
|---|---|---|
| K_rate | 28–29 | Rapid improvement 22–24, stabilizes by 28 |
| BB_rate | 26–28 | Discipline keeps improving through late 20s |
| HR_rate | 27–28 | Bat speed peak, then decline |
| BABIP | 25–27 | Sprint speed peaks at 23–24; BABIP lags slightly |
| SB_rate | 23–24 | Linear speed decline from early 20s |
Position-Specific Aging Multipliers
| Position | Mult | Rationale |
|---|---|---|
| C | 1.25× | Highest physical wear |
| SS | 1.05× | Significant defensive wear |
| CF | 1.02× | Speed-dependent |
| LF/RF | 0.95× | Less physical |
| 1B | 0.90× | Least position wear |
| DH | 0.85× | No defensive wear |
Multipliers ramp from 1.0 at age 22 to full value at 27 (development phase shouldn't apply a wear penalty).
Velocity Aging (Pitchers)
Pitchers begin losing ~0.25 mph/yr of fastball velocity after age 25. Each mph lost reduces K_rate:
veloLost = (age - max(startAge, 25)) × 0.25 // mph kRateLoss = veloLost × 0.0008 // K/BF reduction per mph
This compounds with comp-driven curves: the curves capture the empirical K decline; velocity aging explains the mechanism.
Statcast calibrates the Marcel+ foundation with objective physical measurements more predictive than rate stats alone.
Hitter Inputs
| Field | Weight | Impact |
|---|---|---|
xwOBA | 40% | Holistic calibration — best single predictor of offensive value |
barrel_pct | 20% | ISO / HR_rate — optimal EV + launch angle |
hard_hit_pct | 15% | ISO supplement — 95+ mph EV, independent signal |
avg_ev | 25% | BABIP — each mph above 88.5 ≈ +0.003 BABIP |
max_ev | 15% | HR ceiling confirmation |
sprint_speed | 40% | SB_rate + BABIP — each ft/s above 27.0 ≈ +2.5 SB attempts/yr |
ba_diff (BA−xBA) | 25% | BABIP luck correction — positive = lucky, negative = unlucky |
whiff_rate | — | Direct K predictor (when available) |
chase_rate | — | K + BB predictor (zone control) |
Pitcher Inputs
| Field | Weight | Impact |
|---|---|---|
xwOBA (against) | — | Holistic H_rate + HR_rate calibration |
xISO_against | 35% | HR predictor — quality of extra-base contact allowed |
barrel_pct (against) | 20% | HR_rate — barrels allowed strongly predict HR allowed |
whiff_rate | 45% | K_rate — implied K = whiff_rate × 0.88 |
chase_rate | — | K + BB (zone control predictor) |
xera | 35% | ERA anchor from expected contact quality |
fb_velo | — | Multi-year velocity aging baseline |
ratio = min(1.20, max(0.85, xwOBA / projectedwOBA)) — but only when the ratio deviation exceeds 2%.OAA (Outs Above Average)
Multi-year Marcel-weighted OAA from Baseball Savant (3-year, 5/4/3 weights), regressed toward 0 using position-specific reliability:
| Position | OAA r² | Notes |
|---|---|---|
| SS, CF | 0.62 | Range is the primary skill |
| 2B | 0.58 | |
| 3B | 0.55 | |
| RF, LF | 0.52 | |
| C | 0.42 | Limited range opportunities |
| 1B | 0.38 | Very few range plays |
Catcher Framing
Framing runs from Baseball Savant runs_extra_strikes, regressed 50% toward 0. Age curve: improving through 28, peak 28–30, steep decline after 30 (~1.2 runs/yr).
Position Transitions
As players age, premium defenders move to easier positions. Interpolated over the transition window, not as a cliff:
| From | To | Ages |
|---|---|---|
| SS | 3B → 1B | 28–32 to 3B, 36–39 to 1B |
| 2B | 1B | 31–35 |
| CF | LF | 30–34 |
| 3B | 1B | 34–38 |
Sprint Speed Modifier
speedRetention = max(0.70, min(1.15, 0.70 + (sprint_speed / 27.0) × 0.30))
Elite speedsters (30+ ft/s) retain defensive value longer. Slow players (24 ft/s) lose range value faster.
Hitter PA Arc
| Age Range | Typical Level | PA |
|---|---|---|
| 18–20 | ROK / A | 300–400 |
| 21–24 | A+ / AA | 350–500 |
| 25–26 | AAA / MLB debut | 500–530 |
| 27–32 | MLB prime | 550–575 |
| 33+ | MLB veteran | declining ~15–20 PA/yr |
Comp PT blending (when ≥5 comps available):
blendedPA = formulaPA × 0.40 + compAvgPA × 0.60 blendedPA × compAttrition.pctActive // adjusts for career-end probability
Career Endpoints
Career ends when:
1. Fewer than 20% of comparables still active at that age AND OPS < .550 (hitters) or ERA > 6.50 (pitchers)
2. Hard ceiling: age 48
WAR = wRAA + BsR + Fld + Frm + Pos + Rep
// wOBA (FanGraphs linear weights) wOBA = (BB×0.690 + HBP×0.720 + 1B×0.880 + 2B×1.245 + 3B×1.575 + HR×2.015) / PA // wRAA (value above average) wRAA = (wOBA - lgwOBA) / wOBAscale × PA // BsR (baserunning) netSB = SB - (CS × 2) wSB = netSB × 0.2 // ~0.2 runs per net steal // Replacement level hitter: 20.0 replacement runs per 600 PA pitcher: 18.0 replacement runs per 200 IP // Runs per win RPW = 10.0 // FanGraphs scale
| Feature | ZiPS | Steamer | PECOTA | RoboNiner |
|---|---|---|---|---|
| Per-stat regression weights | ✓ | ✓ | ✓ | ✓ |
| Comp-driven aging (individual curves) | Partial | — | ✓ | ✓ |
| MLE selection-bias correction | — | — | — | ✓ |
| Per-league MLE factors | Partial | — | — | ✓ |
| Component-level park factors | Partial | Partial | — | ✓ |
| avg_ev as BABIP signal | — | — | — | ✓ |
| xISO_against (pitcher HR predictor) | — | — | — | ✓ |
| Sprint speed → defensive aging | — | — | — | ✓ |
| OAA + framing multi-year Marcel | Partial | — | — | ✓ |
| Position transition modeling | Partial | — | — | ✓ |
| Young-for-level regression discount | Partial | — | Partial | ✓ |
| Full career arc projections | Some | — | ✓ | ✓ |
| Full minor league coverage (A thru MLB) | ✓ | Partial | Partial | ✓ |
| Percentile forecasts | — | — | ✓ | ✓ |
Key Innovations
xISO = xSLG − xBA measures the quality of extra-base contact allowed, not just the rate. A pitcher who allows loud doubles even when getting outs is more HR-prone than one whose extra-base hits are weak.