So this might be pretty obvious, but I’m going to spend 2 minutes posting this so we’re all on the same page.
Chris Tamas of Illinois reminds his team of the only goal: “three sets, by two points” and while this may hold true in a practical sense – from a statistical sense, beating a team by 15 points is not the same as beating them by 2. When I first went down the rabbit hole of “predicting the future” I think I got stuck using the same coaching lens as Chris – that winning is technically a binary outcome. You either do it or you don’t.
But when you get into modeling and trying to give weight & importance to some variables over others in terms of their ability to predict the future – having the granularity that differentials provide becomes critical. We can always say after the fact, that the model predicted you would win by 4 points, therefore classify this result as a W. But we want to build the model itself incorporating the granularity of reality, that beating a team by more points might help you hone in on why you were able to win by so much. And that’s really the goal of the model, explaining the variance in the data – in order to predict the future.
The model & the data we show above is just a simple linear regression. The equation for the line is as simple as what you’ve learned in your first algebra class: y = mx + b. In this case, y is the difference in set score, x is the difference in attack efficiency, m is the slope of the line (which we can see is positive, meaning a larger difference in attack eff means a larger butt kicking of your opponent), and finally b is a constant. What this actually looks like is y = 28.5 * attack eff diff + ~0. Since b is basically 0, you could roughly estimate that the set score difference will be 28.5 * difference in attack eff. In the example, 28.5 * 0.400 = ~ 11 point difference.
Understanding set score differentials will be important in the coming posts as it will be the “outcome variable” that we are trying to predict given a set of data. The goal is still to predict the future – taking information you know before the first serve and predict the set score. The key mystery we’re trying to solve for is what is x? What is the variable that has the most explanatory power? We know that attack efficiency difference has an R2 of 80% meaning it explains 80% of the set score differential. But this is after the fact. We already know what the attack efficiency differential is from the set when we calculate the result.
The goal is to tell you who is going to win – not to explain why the winner has won.
We’ll see if we can accomplish this in upcoming posts.