for the RMD version: https://rpubs.com/chadgordon09/step3
The quick answer is, no, attack efficiency doesn’t help us predict the future either. (However, it does help explain the past). Let’s follow the same steps we used for PS% and see why.
1. Find the attack efficiency average for each team
2. Grab all the different sets that were played + the two teams and who won the set
3. Merge these two dataframes and create a column for attack eff difference. Again we use difference because we care about the matchup between two teams specifically.
4. We have our independent (AtkEff_diff) and dependent (won_the_set) variables ready to go, time to plug them into a logistic model.
4a. If you need some clarification of what all this stuff is, there’s more detail on this post. I’ll likely just cover the sparknotes version here and moving forward.
5. So we build our model – which you can see from the red curve. It’s basically the same as the PS% model in that if your historical attack efficiency is higher than your opponents, we expect you to win the set. But that being said, there’s again a lot of overlap: i.e. you could have historically had a lower Atk Eff than your opponent and still have a 40% chance to win.
6. Unfortunately, also like the PS% model, when it comes to actually making predictions on data where the model doesn’t already know the outcome…it’s not very useful. We see this in two places.
6a. First – the confusion matrix (cm) called by the table function. 1542 TN (true negatives, our model predicted a set loss and actual set result was a loss) + 1586 TP (true positives, our model predicts a set win and set result was win). This gives us an accuracy of 32.2% (accuracy = TP + TN / Total). This is still pretty bad – and you’d again have a better chance of success by picking the winner blindly.
6b. Second – the psuedo-R2 value of 0.151 (McFadden, right above the confusion matrix info). This suggests that our model only accounts for about 15% of variance in the outcome of the set. Still pretty bad…
7. So why isn’t this model any better? For PS% the issue was the variability of the data. Teams PS with a pretty wide range. Let’s see if that’s a similar explanation for attack efficiency.
8. Looks similar to the wide range we saw when looking at PS%. So while we can see the average for these teams hovers just below 0.300, in any given set that ranges from hitting negative to hitting north of 0.700. Again, the data seems to be too noisy to serve as a great predictor of the future.
9. Well crap. Attack efficiency is one of those metrics that coaches cling to. They’ve done some loose correlations that say teams that hit better win more. Yes. This is true. But I can’t just go tell my guys to hit better…so what do I actually tell them to do? How does I use statistics to help reinforce some behaviors while discouraging others?
10. In the next post, we take a quick sojourn into the world of passer ratings. What they are, why they suck, and how their solution might unlock a method for logical evaluation of performance – and maybe, just maybe helps us inch closer to predicting the future.