Understanding the xG Models

Expected goals, or xG, is one of the core ideas in modern hockey analysis. The goal of xG is simple: estimate the probability that a shot becomes a goal. That sounds straightforward, but the real value of xG is not the probability assigned to one shot. The real value is what happens when you add those probabilities up across shifts, periods, games, players, teams, and seasons. In this app, xG is used in multiple surfaces, and the app exposes different xG model choices because different event definitions can be useful for different analytical purposes. What xG is trying to measure At a high level, xG asks: If we saw this exact shot or shot-like event many times, how often would it become a goal? That lets you separate chance quality from actual finishing results. A team may score four goals on 1.8 expected goals because they finished well, got a few bounces, or faced weak goaltending. Another team may score once on 3.1 expected goals because they generated better chances than the final score suggests. That distinction is critical if you care about repeatability. The three xG model options in the app The app exposes three xG model views: - xG_S - xG_F - xG_F2 These are not three completely unrelated philosophies. They are three related model variants. xG_S This is the shot-based model. It is trained on shot events and uses contextual features like: - venue - shot type - score state - rink context - strength state - box or ice location - last event context This is the cleanest traditional shot-model view in the app. If you want a direct shot-quality lens, this is the natural place to start. This is probably the best model for goaltender analysis - Depending on whether or not you believe can impact shot misses... And to what extent. xG_F This is a Fenwick-based model. Fenwick excludes blocked shots and focuses on unblocked attempts, which often gives a broader attacking-process view than shots on goal alone. This model includes the same core contextual structure as the shot model, but also adds event-sequence context such as the last event. That can help the model capture how a chance developed, not just where it ended. This is the preferred model for describing results. xG_F2 This is also Fenwick-based, but with a slightly different feature set than xG_F. It uses venue, shot type, score state, rink context, strength state, and box location, while leaving out the last-event input. This is the preferred model for predictive analysis. A rebound shot have a much larger chance of becoming a goal, but they are also quite random. This is why excluding rebound effects increases the predictiveness of the model. How the models are trained The xG scripts train gradient-boosted tree classifiers using rolling multi-season windows. In plain English, that means the model is trained on recent historical seasons rather than treating all hockey history as equally relevant forever. That matters because the league changes: - shot habits change - team systems evolve - tracking and event recording can drift - scoring environments move over time Using rolling windows is a practical way to keep the models current without overreacting to only one season. The model inputs are categorical and contextual features that are converted into machine-readable form, then fed into XGBoost classifiers that estimate goal probability. How to interpret xG in the app A few practical rules help: - xG is about chance quality, not certainty. - Over one game, xG is descriptive, not definitive. - Over larger samples, xG becomes more informative. - xG is stronger when combined with usage and context, not read in isolation. - Differences between xG and goals can reveal finishing, goaltending, luck, or short-term variance. If a player consistently beats xG over many seasons, that may reflect real finishing talent. If a team beats xG for two weeks, that might just be a hot streak. The app is built to help you tell those apart.