***Steve Aronson is an assistant coach for girl’s high school and club volleyball in Massachusetts. He is passionate about volleyball and now content to coach, analyze, and watch. With a background in statistics, data science, and machine learning, he strives to raise the analytics bar in volleyball and help others along this journey.***

If we look at a team’s performance metrics over a season, we will see quite a bit of variation (see Figure 1 below). Intuitively, we attribute some of this variation to the strength of the opponent, but can we quantify this effect? That is the goal of this post.

## Assigning opponent strength

I am going to use the Pablo Ratings for the NCAA Women’s D1 2019 Season. The Pablo system uses match results to assign points to each team. Team points are accumulated through the season. The Pablo Rankings are determined by these accumulated points. For Opponent strength, I will be using the Pablo points, but I will compare a team’s points to its opponent’s by taking the difference:

*PointDiff(team,match) =Point(team) − Point(opponent)*

With this definition, the *PointDiff *is > 0 when you are rated higher (stronger) than your competition and visa-versa. Creighton’s *PointDiff *for the 2019 season is shown in Figure 2. Since most of the values are > 0, Creighton was primarily playing lower ranking teams. The constant black line refers to the mean *PointDiff *between Creighton and their opponents – as you can see, Creighton averaged just shy of a +1000 *PointDiff *in their matchups.

## How Well Does PointDiff Correlate to Performance?

Now that we have a metric that represents opponent relative strength, let’s see what performance metrics correlate. With this data set, we will look at:

- Match Kill%
- Match Attack Error%
- Match Hitting Efficiency
- Aces Per Set
- Earned Points Per Set (PTS/S)
- Receive Errors Per Set (RErr/S)

Performance definitely improves (increase in Kills, Efficiency, Aces, and PTS and decreases in errors) as PointDiff increases. Let’s build a regression model for each metric and evaluate the fits. I will make a few assumptions for this model:

- Each team has an intrinsic (average) performance for each metric
- These mean values will be calculated and subtracted out before running the regression

- All teams have the same relative response to opponent strength
- Each metric will have one set of regression coefficients for all teams
- This improves the confidence in regression by using all the data

- Fit each metric seperately, but using the same model structure
- Use a second order fit to capture the curvature seen in some of the plots in Figure 3
- Future work will include seperate fits for each team to look at teamt to team variations

## Model Explanation

Using hitting efficiency (HE), the regression model looks like:

*HE(team,i) =HE(team) + he1∗PointDiff(team,i) + he2∗PointDiff(team,i)^2 +ϵ(team,i)*

Parameter | Description |
---|---|

HE(team,i) | team hitting efficiency for match i |

HE(team) | team inherent hitting efficiency (average) |

he1, he2 | model coefficients to fit |

PointDiff(team,i) | Difference in rating between team and opponent for match i |

ϵ(team,i) | variability of team for match i |

A similar equation will be used for the other metrics, replacing *HE *and *he *with appropriate variables

## Regression results

Let’s see how our regressions represent the data. In Figure 4, each plot contains:

- Individual metrics with the mean removed (subtracted)
- Regression fit curves
- Coefficient of determination (𝑅2R2)

The 𝑅2R2 values tell us how much performance variation can be attributed to *PointDiff*

The fits look reasonable, but the *R^2*s are small (5%- 20% of total variation). Opponent strength accounts for some variation, but not the majority.

## What can we do with these correlations?

Looking at an individual team across a season, we can account for some of the performance variation due to the opponent’s strength. The remaining variation we will attribute to your own team. By comparing the variances (I will use standard deviations) of these 2 components, we get a sense of which contribution has greater impact.

Reminder of our hitting efficiency model:

*HE(team,i) =HE(team) + he1∗PointDiff(team,i) + he2∗PointDiff(team,i)^2 +ϵ(team,i)*

Let’s break this out into three components:

- Base – this is your team’s average performance =
*HE(team)* - Opp – contribution due to your opponent =
*he1∗PointDiff(team,i) + he2∗PointDiff(team,i)^2* - Team – your team’s variation =
*ϵ(team,i)*

For each match, we can calculate these 3 components and plot this over a season. Then we calculate the standard deviation of each component to weigh its importance.

Note: Standard deviations do not add linearly but by sum of squares: sigma(hit eff) = sqrt(sigma(opp)^2 + sigma(team)^2)

## Team Season Residual Plot

Continuing to use Creighton as an example, recall the season hitting efficiency plot. In addition to plotting match hitting efficiencies (blue circles) and the team average (black line), I will add residuals (difference between each point and the average). This residual is split into two parts:

- Opponent contribution (purple arrows) – portion that the model predicts based on the opponent strength
- Team contribution (green arrows) – remaining portion that is attributed to factors within your team

On the right are histograms and statistics (means and standard deviations) for:

- Actuals
- Opponent contribution
- Team contribution

Looking at the standard deviations, the opponent contribution is still a small portion of total variation

## Distribution of Variances Across All Teams

Now we have a method to separate the contributions and calculate the standard deviations of each factor, let’s plot the standard deviations for all teams and see how team vs. opponent variations compare. We will do this for all of the statistics.

Each histogram in the plot below contains team level standard deviations – from the team average – as calculated above. I keep the same color scheme where:

- Blue is standard deviation of the acutal metric.
- Purple is standard deviation of the opponent contribution (per the regression model).
- Green is the standard deviation of the remaining contribution which is labeled team contribution.

# Conclusion

Looking at Figure 6:

- The opponent strength contribution to team performance (purple histograms) exists and is measurable
- Opponent strength only accounts for a small percentage of a teams performance variation. The larger remaining variation (green histograms) comes from internal factors – ones that are not visible on a box score.
- The opponent effect is most present in Kill%, Hitting Efficiency and Earned Points.

A viable method to quantify opponent strength impact on a team’s performance has been developed. The effect is small (smaller than I expected). This leads into trying to understand what other measurable factors impact performance variation. Can we quantify other factors that allow coaches to help manage the peaks and valleys of their team’s performance.