So this might be pretty obvious, but I’m going to spend 2 minutes posting this so we’re all on the same page.

Chris Tamas of Illinois reminds his team of the only goal: “three sets, by two points” and while this may hold true in a practical sense – from a statistical sense, beating a team by 15 points is not the same as beating them by 2. When I first went down the rabbit hole of “*predicting the future*” I think I got stuck using the same coaching lens as Chris – that winning is technically a binary outcome. You either do it or you don’t.

But when you get into modeling and trying to give weight & importance to some variables over others in terms of their ability to predict the future – having the granularity that differentials provide becomes critical. We can always say after the fact, that the model predicted you would win by 4 points, therefore classify this result as a W. But we want to build the model itself incorporating the granularity of reality, that beating a team by more points might help you hone in on * why* you were able to win by so much. And that’s really the goal of the model, explaining the variance in the data – in order to predict the future.

The model & the data we show above is just a simple linear regression. The equation for the line is as simple as what you’ve learned in your first algebra class: y = mx + b. In this case, * y* is the difference in set score,

*is the difference in attack efficiency,*

**x****is the slope of the line (which we can see is positive, meaning a larger difference in attack eff means a larger butt kicking of your opponent), and finally**

*m***is a constant. What this actually looks like is y = 28.5 * attack eff diff + ~0. Since**

*b***is basically 0, you could roughly estimate that the set score difference will be 28.5 * difference in attack eff. In the example, 28.5 * 0.400 = ~ 11 point difference.**

*b*Understanding set score differentials will be important in the coming posts as it will be the “outcome variable” that we are trying to predict given a set of data. The goal is still to predict the future – taking information you know before the first serve and predict the set score. The key mystery we’re trying to solve for is what is ** x**? What is the variable that has the most explanatory power? We know that attack efficiency difference has an R

^{2}of 80% meaning it explains 80% of the set score differential. But this is

*after the fact*. We already know what the attack efficiency differential is from the set when we calculate the result.

**The goal is to tell you who is going to win – not to explain why the winner has won. **

We’ll see if we can accomplish this in upcoming posts.