To be fair, predicting the past is easier. It’s already happened. We’re just looking for which metrics help explain something that has already happened. To it’s credit, attack efficiency differences between two teams in a single set is wildly indicative of who won that set. So yes, we do care about attack efficiency, definitely don’t ignore it…but due to set by set variation even amongst the top teams, we cannot accurately rely on this metrics as means of predicting the future.
Here’s how we explain the past, using attack efficiency:
These first few steps gather team1 and team2’s attack efficiencies that actually occurred in every unique set they played in. (a, b, c)
From there, we need to know who actually won each of those sets, so that’s what “d” is getting for us. We then merge c & d into a data.frame which has each team’s attack eff & whether they won or lost the set in the same row. We create attack eff diff & then correlate that w/ whether they won or lost, as shown below: 0.73 correlation – pretty strong.
In terms of an actual model, because we are looking to classify the outcome as a win or a loss (a binary outcome), we want to use logistic regression here. We build the model and find that the psuedo-R2 is 0.64 – meaning about 64% of variance in a team winning vs. losing a set can be attributed to the differential in attack efficiency between the two teams.
This is shown visually in the chart below – where the red logistic curve is our model, and the histogram at the top represents all the attack eff differentials at which the team won the set. Whereas the histogram at the bottom represents all the differentials at which the team lost the set.
So that’s really all I have to say about that. Don’t ignore attack efficiency – it’s clearly an important indicator when you look back and reflect on why you’ve won or lost, but don’t make the mistake of thinking your team’s attack efficiency going into your next set has the same ability to predict the future as it does to explain the past.