clock menu more-arrow no yes

Filed under:

Rangers Analytics Primer or: How I Learned to Stop Worrying and Ruin My Day Job Productivity

New, 99 comments

An introduction to #fancystats, a glossary of terms, and other ramblings. Let's go down the rabbit hole together, shall we?

its-mr-towel design

So basically here is how it all went down. The good people at Blueshirt Banter confronted me inside of this internet thing. They told me it isn't all in my head. There's something more going on. That there is something wrong with the world. Something I can't put my finger on, but that's there, like a splinter in my mind. That there is a world that has been pulled over our eyes to blind us from the truth. That only if we fight these entities, this construct that they have built for us, can we free ourselves of the shackles of unqualified, unverified nonsense that was once called analysis. That there is more out there; people just waiting to be unplugged from the system. That I could help them fight their battle, and win it. They made me an offer.

blue pill red pill w credit

So I took the Red Pill.

Yes folks, I am your self-fashioned Neo. And I have been asked to take you all on a journey this season, deep into the world of Analytics. Or #fancystats, if the name suits you.

After this wild "summer of analytics," rather suddenly, analytics are everywhere. I want to arm you with the basics so that you may, if you choose, begin to debate, argue about, denounce, and ultimately hate me for them. With the growth in popularity of non-traditional numbers behind the games you view, there is no denying the fact that some fans want to either add to their knowledge-base about what they are seeing, or simply give context and statistical backing to the conclusions they make while watching the team play.

As an aside, I don't particularly like calling this stuff "advanced stats," nor do I care for the #fancystats moniker. It creates this air of inaccessibility which, if you actually get into this stuff, shouldn't be there. A lot of the underlying numbers really aren't all that advanced or fancy.

I prefer "analytics" or "metrics," but it's tough to change the lexicon, or resulting attitudes, overnight. Basically, it's about having the puck, and trying to score with it. Pretty simple.

Why Should I Care?!

You might care because possession stats correlate closely to winning games, and Stanley Cups.

You might care because possession stats have been shown to have more predictive value of a team's future performance than wins.

You might care because possession stats predicted the crash-and-burn of the 2013-14 Toronto Maple Leafs well before it happened. And then the Leafs hired a stat-savvy assistant GM and an analytics department as a result.

Maybe you just care because some of this stuff, like Corsi, gives you some water cooler fodder. If used right, it can help give some idea of overall performance. It's one more tool to help explain the game you love.

YOU MIGHT CARE BECAUSE POSSESSION STATS CORRELATE CLOSELY TO WINNING GAMES, AND STANLEY CUPS.


Some will give absolutely zero cares about this stuff. Which is totally fine. Ultimately, nobody should tell you how to enjoy your fandom. But whatever your interest or disinterest in the numbers, they are certainly relevant to what you are watching. And they are becoming increasingly unavoidable for even the most casual or numbers-averse fan. Even if some of the best statistical thinkers have been snapped up by NHL teams with an eye on closing off public data and idea sharing.

It isn't expected that everyone will develop a taste for this stuff. However, if you do find yourself face-to-face with a Corsi-wielding hockey hipster, I hope that this introductory guide, and the tracking that myself and others will be taking on all season, will help you to at least dabble when needed or desired. Hey, maybe you will start really looking at this stuff, weighing the numbers against each other, and fend off those who pretend to know what's going on. Maybe you're the next great NHL stats guy, and you didn't even know it!

Ok, ok. Let's not get carried away.

Basic Introduction

This article is not meant to be an in-depth and all-inclusive discussion of underlying numbers and their value alone and together. Instead, it's just an introductory road map and glossary to get you acquainted with some of the most prominent terms and ideas you will hear about throughout the season. The basics you will need to skate by. I'll also provide a FAQ which will be continuously updated.

Throughout the season, we will go in-depth with a number of different analytics features, which will all be archived at hhttp://www.blueshirtbanter.com/analytics in addition to their normal feature position on the Blueshirt Banter homepage.

Rangers By The Numbers:

These articles will feature game by game stat-packs along with analysis utilizing conventional and "advanced" metrics. The goal being to provide readers with a greater understanding of the who, what, where and why of Rangers performance These won't always follow an exact format. Some will be more in depth than others, and they may combine several games or give a cross-section of a particular trend of play. RBTN will feature some of the things below, which I'll be tracking and manipulating all season:

Possession and Shooting Stats: I will be tracking the full battery of possession metrics...all the hits. Corsi, Fenwick, With or Without Yous (WOWY), shot charts and more. All the hits. A glossary of common stats below. These will be housed within RBTN on a fairly regular basis, and I will also from time to time post a season update that shows season totals to date, or analyze a particular trend or topic of interest.

Zone Tracking: I will be manually tracking zone entries and exits for the Rangers and their opponents. This will be utilized to get a better understanding of individual and team performance and tendencies based on their movement and decision-making in each zone. For a more in-depth understanding of zone tracking, check out Corey Sznajder's (@shutdownline on the twitter) work from last season here.

You hear coaches talk about the 50/50 battles and winning loose-puck battles all the time, but what is their actual value?


Loose Puck Battle Project:
To date, one area of the game that has been woefully under-analyzed has been determining the value of winning loose puck battles (and how they are won or what is done with the puck after winning them). You hear coaches talk about the 50/50 battles and winning loose-puck battles all the time, but what is their actual value? It is a standard part of the discussion between players and coaches on a daily basis, and valued greatly within the game.  It's also what many in the game, eye-test proponents and more traditional scouts or analysts often refer to as "compete level." This season, I will be tracking loose-puck battles (primarily 50/50s between opposing players) and trying to quantify it. Just like zone tracking, this is manual labor. It is arduous and time-consuming. Please visit my post on this project for more details (to be linked soon, once I, uh, write it).

Setup Pass Tracking: This statistic was originally introduced by Rob Vollman (@robvollmannhl) as part of his seminal work in Hockey Abstract. It is best explained by him as follows:

Setup passes are defined as those passes which result in shots on goal, and can help determine which players are most adept at setting up shots in a way that's independent of shooting percentages.

Unfortunately this statistic is not recorded by the NHL (or anyone else). It is therefore available only as an estimate based on the player's (primary) assists and the average shooting percentage of his linemates. To some people that's a tremendous turn-off, even among those who think the concept is excellent. If you are of that mind, think of it only as a handy re-presentation of primary assists and on-ice shooting percentage.

I will be manually tracking actual setup passes for the Rangers and their opponents this year. In this way, I can turn the representation into a reality and we can see just how great certain passers are. Zuuuuuc!

Goaltender Tracking: I will provide game-by-game info on performance, and rolling tracking of performance, utilizing traditional statistical measures along with some of the new measures like EVGSAA/60, which I previously introduced on another rival blog here.

Goal-Scoring Play Tracking: I'll be manually tracking goal-scoring plays from first controlled touch to goal for the Rangers, and possibly their opponents (time permitting). The purpose being to get a better understanding of how individual and team offensive performance and systems work to create goals. As the season progresses, we may see patterns. Maybe we won't. But I'll post many of the more interesting plays and provide takeaways for you all to ponder. I would be remiss not to point out that the genesis for this project was inspired entirely by @kid_ish, and his better half @_wordgirl. Kid Ish is a wonderful hockey mind covering the Ducks for Anaheim Calling. @_wordgirl does design that I only hope to emulate. Gonna have to up my MS Paint skillz!

Their project and more on why goal-tracking could be significant can be found here. My charts only hope to be as nice as this one - courtesy of @kid_ish w/ design by @_wordgirl:

Goal Tracking Example

Terms You Need To Know (#fancystats CliffsNotes):

Here's a short and entertaining crash course on advanced stats from Russian Machine Never Breaks. A glossary of terms after the jump:

Corsi:

Corsi measures all shot attempts (on net, missed, blocked) and can be applied to either a team or an individual player on the ice. An individual's measure of Corsi takes into account the shot attempts of all 10 players on the ice. While it does not measure possession in and of itself, it is considered a proxy because a team must have the puck to shoot the puck (or, conversely, allow the other team to have it in their D zone in order to give up shots).

Corsi = Shot Attempts FOR - Shot Attempts AGAINST

It is important to consider the context that shapes Corsi.


In a single game it is often represented as a +/- (D. Sedin was a +3 Corsi). It is often displayed as a percentage for the season (CF% or Corsi%). If you see a number above 50%, it means that while the player is on the ice his team is producing more shot attempts for his team than against. You should care because there is a correlation between players who have higher Corsi and their ability to outscore the competition when on the ice.

It is important to consider the context that shapes Corsi. That is why it is valuable to calculate Corsi by situations such as Even Strength (EV Corsi% or 5v5 Corsi%). 75% of all shots come at even strength and doing so, in the least, removes special teams, which tend to skew shot attempt numbers for obvious reasons. Click here for a solid read on the uses and limitations of Corsi from our friends over at Arctic Ice Hockey.

Fenwick:

Fenwick is the same thing as Corsi, but without blocked shots. The logic is that a blocked shot is not really a scoring chance, as the shot never has a chance of getting to the net. It is thus possible that blocking the shot is a skill, and not just a random event. Fenwick is commonly displayed as a percentage (FF% or Fenwick%), and similar assumption can be made for a player falling above or below 50%.

Fenwick = (SOG FOR + Missed SH FOR) - (SOG AGAINST + Missed SH AGAINST)

Because Fenwick reduces the number of total events in the calculation, it isn't quite as useful in smaller sample sizes such as individual games or partial seasons. IE: we want the most data possible, and Corsi gives us more data. For more on Fenwick vs Corsi, this is a good starting point.

Relative Corsi

Measures the difference in even strength Corsi between a player's on-ice performance and his team's performance when he's on the bench. If the player is +5.5 Corsi Rel, it generally means that he is driving possession 5.5% better than his team can without him.

"Luck Stats" - On-Ice Sh% For, On-Ice Save %, and PDO:

On-ice shooting percentage shows what percentage of shots are going into the net when a player is on the ice. Not just his shots, everyone on his team, or the opposition's team. On-ice save percentage shows what percentage aren't going in your net. PDO is simply both of those things combined together (note: PDO stands for nothing at all, it's just named after someone like Corsi and Fenwick).

it's really all about whether the PDO is substantially lower/higher than the individual player is accustomed to experiencing over time.


The reason these are called "luck" stats is because the elements of each aren't totally in the individual's control or necessarily affected by the player's talent or skill. And because sometimes the puck will bounce for you, or against you. So it isn't all luck, but it does incorporate it. Don't fall into the trap of misconstruing PDO one way or the other and missing out on its true value.

The stat is often represented as a percentage or a 4 digit number.(100% or 1000). It is best measured at 5v5 to reduce variables.

PDO = On-Ice Sh% + On-Ice Sv%

The league averages a tick under 8% even strength sh%. And therefore it averages a tick over 92% even strength sv%. This means that if a player has a PDO above 1000 (or 100%), they might be getting a bit "lucky." Or if below, a bit "unlucky." However, it is important to note that it's really all about whether the PDO is substantially lower/higher than the individual player is accustomed to experiencing over time. Some players are always above 1000. So if in one season they are suddenly at 980, we can assume that something has gone awry while he has been on the ice. It could be his goaltending behind him. It could be the sh% of himself and his teammates. For a great article on percentages and shot quality, click here.

Score Effect:

Basically, score effect means that a team will play differently depending on how far ahead/behind they are in a game. Sometimes (often), a team that is up by several goals will sit back and play with the puck less (whether rightly or wrongly is a different discussion).

For this reason, Corsi is often broken down to Corsi Tied or Corsi Even (when game is tied), Corsi Ahead, Corsi Behind, and Corsi Close (game is +/- 1 goal or tied during the first two periods, and tied in the third). This helps eliminate, or illuminate, score effect biases in the numbers. It is especially worthwhile early in the season, when the sample size is still small and more affected by such a bias.

Zone Starts:

This is another context stat. Offensive Zone Starts (OZS% or sometimes simply ZS%) shows what percentage of the player's on ice deployment to start a shift is in the offensive zone. Defensive Zone Starts (DZS%) shows what percentage of shifts start in the defensive zone. Basically, if a player gets less offensive zone deployment, we should expect that it will negatively impact his offensive production.

Quality of Teammates:

Again...context. QualTeam (QoT) is the quality of teammates the player is deployed with on-ice, usually based on their on/off ice +/-. You can also do QoT Corsi%, QoT EV TOI, and other ways to measure the teammates on-ice with the specified player. If a player is deployed with weaker possession players, his performance will likely suffer (or conversely, he may raise the performance of those around him). Whereas, if he is consistently playing with strong teammates, his own Corsi may be stronger than it would be without those teammates.

Quality of Competition:

Same thing, but QualComp (QoC) measures the quality of the players the individual faces. If he is consistently deployed against stronger possession-driving or goal producing players, it makes sense that his own possession or GA stats may suffer.

With or Without Yous - WOWYs:

These are my favorite comparative stats, using the stuff above. David Johnson's (@hockeyanalysis) site stats.hockeyanalysis.com is an outstanding resource for comparing how players perform with certain teammates on-ice, vs. without them. You can also do so on a game-by-game basis at hockeystats.ca.

GF/60 and GA/60:

goals for/against while a player is on the ice per 60 minutes of ice time. Even strength and non-empty net situations, only.

Context Is Everything

Most of the stats we calculate are based on information that is tracked and provided to us by the NHL's official scoring reports. Sean McIndoe astutely points out one of the major issues with this in his excellent Grantland piece on the "Analytics Awakening" -

The NHL uses multiple people at every game to input data in real time, and there’s a degree of between-periods quality control. But hockey turns out to be an enormously difficult game to track. Unlike baseball or football with their frequent breaks, hockey can go long stretches without a pause in the action. In the time it takes a tracker to look down and press a button on an iPad, something else could happen and get missed. And a lot of what’s being tracked is subjective, leading to significant rink bias that skews the results even further.

Though the data is input in "real-time" it is not actually real-time in the sense that it is manually tracked. In a really fast game. So stuff gets missed. I'm looking at you, hits, giveaways, and takeaways. Also, much of what is tracked is what I consider to be "result-based" stats. IE: Nash takes a shot, shot is recorded. I want to know more about how he got into the zone, who setup the shot, etc. We simply don't have the resources, outside of manually tracking every game.

The fact is, we are at the tip of the proverbial iceberg in statistically analyzing the sport. While sports like soccer and basketball benefit from technologically advanced tracking and motion-capture devices, such as SPORTVU, hockey is still in many respects operating in the dark ages.

True analysis still requires that the analyst watches the game and does some form of analysis beyond the numbers. In order to further make sense of them. Or qualify/disqualify them. This is the only way to develop greater context. But it is important to know that while the numbers are objective, they can sometimes be given to misinterpretation, error, or confirmation bias. There is subjectivity. And there will be subjectivity in my own analysis of what the numbers convey.

But the point is to have every resource at your disposal, if you want it. So that is what we will be trying to provide this season.

Advanced Stats Resources (in no particular order):

nhl.com - the old stand-by, which has a growing range of player and team stats

war-on-ice.com - Wonderful and growing set of statistics and resources, including live game tracking and the Hextally shot tracking system

stats.hockeyanalysis.com - the preeminent advanced stats site on the web. Includes WOWYs and several proprietary stats

puckalytics.com - new project by David Johnson of hockeyanalysis.com

hockeystats.ca - live game tracking

naturalstattrick.com - game tracking

progressivehockey.com - new stats site, featuring several goaltending stats

behindthenet.ca - one of the original stat sites, and still a valuable resource

hockey-reference.com - a growing database of traditional and advanced stats

nicetimeonice.com - TOI tracker and other resources

somekindofninja.com - player usage tracker and shot tracker

sportingcharts.com - a variety of charts and graphs, and shot heat maps

This Sounds Like a Massive Undertaking. Why Bother With Any Of This?

Ha. Yea. Weeeeee! Every manually tracked game will likely take me several additional hours of rewinding and fast-forwarding to get through. Sometimes, I'll do it in close to real-time. Those instances, you may see a game stat-pack up within hours of the game ending. Other times, I'll have things to do or simply want to, you know, enjoy the game. If so, it may take me a day or two or three to put together. Or I may wait until the end of the week and put several games into one package. It will be a fluid process.

The reason I am doing all this is because (1) It is interesting to me, and (2) what we currently have at our disposal is woefully incomplete. I think it's worthwhile. I think it's informative. I love the Rangers and I love analyzing hockey. Given my background playing, coaching and thinking about the game...I am just really into it all.

This is a hobby of mine. I enjoy it. I know that some of the voices out there have a keen eye on getting scooped up by a main stream media outlet or an NHL team. I think that if people are doing this stuff for the wrong reasons, they are just as likely to make colossal errors in judgment as they are to uncover the next big thing.

The true purpose of tracking, calculating, analyzing and tinkering is to, hopefully, learn more about what is going on. If this past summer is any indication, teams agree that there is some utility in it and are using it as another resource to help inform decision-making. If we can get a leg up on understanding something, why not inform ourselves? Hell, we might end up knowing as much or more than what the organization does. And that's a pretty neat thing. At least I think so.

Want more information on a particular topic of interest? See something you still don't understand? Need another stat defined which doesn't appear here? Contact Nick Mercadante at @nmercad on twitter or by email at nick.mercadante [at] gmail [dot] com.