step 3: attack efficiency

The short answer is, no, attack efficiency doesn’t help us predict the future either. (However, it does help explain the past). Let’s follow the same steps we used for PS% and see why.

1. Find the attack efficiency average for each team

2. Grab all the different sets that were played + the two teams and who won the set

3. Merge these two dataframes and create a column for attack eff difference. Again we use difference because we care about the matchup between two teams specifically.

3a. If we only used the data from one team to predict the winner of the set our pseudo-R2 would drop to 0.084 (accounting for 8.4% of variance…not super helpful)

4. We have our independent (attackeff_diff) and dependent (won_the_set) variables ready to go, time to plug them into a logistic model.

4a. If you need some clarification of what all this stuff is, there’s more detail on this post. I’ll likely just cover the sparknotes version here and moving forward.

ggplot(predicted, aes(x=attackeff_diff, y=won_the_set)) +
geom_point() + geom_smooth(method = “glm”,
method.args = list(family = “binomial”), se = FALSE)
logi.hist.plot(predicted$attackeff_diff,
predicted$won_the_set,boxp=FALSE,type=”hist”,col=”gray”)

5. So we build our model – which you can see from the blue/red curves. It’s basically the same as the PS% model in that if your historical attack efficiency is higher than your opponents, we expect you to win the set.

6. Unfortunately, also like the PS% model, when it comes to actually making predictions on data where the model doesn’t already know the outcome…it’s not very useful. We see this in two places.

6a. First – the confusion matrix called by the “table” function right above the summary. 634 TN (true negatives, our model predicted a loss and actual result was a loss) + 596 TP (true positives, our model predicts win and result was win). This gives us an accuracy of 70%. This doesn’t sound too bad, but again, would you bet money on this game?

6b. Second – the psuedo-R2 value of 0.187 (McFadden). This suggests that our model only accounts for about 19% of variance in the outcome of the set. Still pretty bad. Better than randomly guessing…but still pretty bad.

7. So why isn’t this model any better? For PS% the issue was the variability of the data. Teams PS with a pretty wide range. Let’s see if that’s a similar explanation for attack efficiency.

8. Looks similar to the wide range we saw when looking at PS%. So while we can see the average for these teams hovers just below 0.300, in any given set that ranges from hitting negative to hitting north of 0.700. Again, the data seems to be too noisy to serve as a great predictor of the future.

9. Well crap. Attack efficiency is one of those metrics that coaches cling to. They’ve done some loose correlations that say teams that hit better win more. Yes. This is true. But I can’t just go tell my guys to hit better…so what do I actually tell them to do? How does I use statistics to help reinforce some behaviors while discouraging others?

10. In the next post, we take a quick sojourn in the world of passer ratings. What they are, why they suck, and how the solution to them opens up our eyes to logical evaluation of performance – and maybe, just maybe helps us inch closer to predicting the future.

Step 4.

1 Comment

Comments are closed.