Point Scoring% in the Big Ten

pspercentage

Not a sexy topic, but I just figured out how to do these ‘joy division’ charts in R so I’m kinda pumped to share.

What you see is a histogram of each team’s point scoring % in every individual set they played (only against the teams you see listed, so Purdue v. OSU but not Purdue v. Rutgers).

They’re ordered in ascending fashion by their average PS% in these sets. Something which interested me was the shape of top vs. medium teams. Nebraska and Minnesota seem pretty consistent set to set in how they PS – yet as you work down the chart, you’ll notice some teams flatten out or even have multiple peaks. The latter is especially comical because teams in the middle of the Big Ten could often be described as “dangerous” – sometimes they’re red hot and other times they’re pretty self-destructive. Multiple peaks would certainly play into this narrative and I would be interested to see if other metrics manifest in these patterns, specifically amongst the middle teams in the conference.

And to answer the question nobody asked, yes, Nebraska had a single set where they point scored at 0% (OSU set4) and one where they PS’d at 73% (PSU set5) – that’s why those outliers give the Nebraska chart wings.

Advertisements

Quick thoughts; serving

Was just messing around with some numbers this afternoon and wanted to share.

I looked at a few things related to serving, specifically serve error%, point score%, and serve output efficiency. I ran some correlations between these stats and themselves as well as with winning the set overall.

As with my last post, I'm only using data from the top 9 in the Big Ten from 2016 so the calculated efficiencies are based on these matches alone.

Serve error% and winning the set came out to -0.150, pretty weak – and a disappointment to parents and fans everywhere who'd like nothing more than for you to quit missing your damn serves.

Winning the set and serve output eff (like pass rating but using the actual efficiencies off each possible serve outcome) clocked in at 0.323

And serve error% and serve output eff correlated at -0.546, the highest result I found. This seems to reiterate that terminal contacts skew performance ratings. So quit missing your damn serve! but at the same time, it's unlikely you'll have missed serves to blame on their own for losing a set.

Point score% and serve output eff came in at 0.474, which makes a lot of sense – it would be interesting to see if serve output eff is the largest factor in whether you point score or not.

Finally, because everyone likes service errors, I did SE% and point score% which resulted in -0.220. Again, pretty mild – suggesting that while the association is negative, as we'd expect, teams can still point score well even if they're missing some serves.

Anyway, just wanted to jot these numbers down before they get lost in a notebook somewhere.

Actionable – which rotation to start?

neb

Explaining the past and predicting the future are two wildly different concepts. Outcome bias runs rampant in articles and ESPN talking heads – where rationalizing previous behaviors based on results seems to be what these folks are paid to do. After last night’s game 3 loss to Boston, the commentators criticized Lebron’s play – but had the Cavs pulled it off, they’d be praising him for allowing Love/Kyrie to get their shots and build confidence moving deeper into the postseason. This is why explaining the past isn’t always helpful to coaches because unless the variables are the same, the situation is inherently different than the past.

Our goal shouldn’t be to rationalize successes/failures but to make the most informed decision and understand and live with the fact that you’re playing the percentages. To that extent, we should focus on things we have control over – things that are actionable.

I started writing down a list of things that are actionable in volleyball; the first thing you get to do in the match is decide which rotation you want to start in, given the knowledge that you will be serving or receiving.

I took the 2016 Big Ten conference matches and calculated Point Score and Sideout percentages for each team, in each rotations (points won / all points in PS or SO phase). Next, I used these numbers to build 12 unique sequences with the first option being serve in Ro1 all the way to starting by receiving in Ro6 – and everything in between.

I built a script in R to use the probability of winning the point (given the PS or SO data) to determine an outcome (won point, lost point) and then take the appropriate next step (win while serving? serve again using same PS%. win while receiving? rotate to next percentage, PS% in next rotation). This continued until the team had 25 points then divided by the total number of points played to determine the points won %. This simulation ran 10000 times and the average points won % was calculated.

rutgers

What you see on the y-axis is this points won %, just looking at all phases of the game, what percentage of total points did the team win when they started in a specific rotation and serve/receive. On the x-axis, the numbers 1-12 represent the 12 potential starting rotation combinations. 1 is serve in Ro1, 2 is receive in Ro1 – all the way to 11, serve in Ro6 and 12, receive in Ro6.

maryland

What you may notice, is that for some teams, serving first is actually better. I couldn’t believe this at first, so I re-ran the simulation. I honestly still don’t get it. One thing noticeable is that for lower teams in the league (Maryland/Rutgers), their PS and SO numbers are closer together than the Michigan State and Nebraska numbers.

msu

At the end of the day however, if you look at the y-axis, the difference between starting in serve receive in Ro4 vs. Ro6 for Michigan State is only a third of a percentage point difference in the set. This was pretty common amongst all the teams, as you can see.

So there you have it, the first real actionable choice a coach can make in the match – where do we start. A follow up question could naturally be, well what about matching up against strong/weak rotations of your opponent?? Fair point. When I went back and looked at where teams in the Big/Pac started in a set, they actually varied more than I thought. Wisconsin spins the dial all the time to get what they feel are favorable matchups, so if you’re playing them, you wouldn’t dare presume to know where they’re starting. Without this knowledge, you can’t line your own team up appropriately. So unless you want to really roll the dice, stick to what works best for your own team, regardless of your opponent.

Recap. Simulated a single set using PS and SO data from Big Ten 2016 conference matches. Repeated experiment 10000 times and took average percentage of points won in the set. Odd numbers are Ro1 through 6 when you start serving, even numbers are receiving. Teams with PS and SO percentages close together may benefit from serving? Better teams, keep receiving if you can. Since you’re likely to cycle 2.5x through your lineup each set on average, does the extra 1/2 a percent matter? 40-50 points played each set? so that’s not even 1 full point extra? I’d say no.

p.s. when coaches try to get the “right” matchups, do they ever go back on Sunday and look at the data for how that choice played out? did our blocker really slow down that hitter? did our different serve receive pattern handle their high PS’ing server? trick question. outcome bias. if you used data and made the best decision, then it doesn’t matter.

Offense/Defense by Shot Type

by

So what you see above is how well each team in the Big Ten attacks when they use a variety of shots. Originally I was interested in looking at how well teams both used and defended against the tip, so that’s why they’re ordered in this manner.

As you might expect when looking at full swings, teams like Minnesota, Wisconsin, and PSU really excel. The former two via speed and range while the latter team goes slower and hits harder. But overall, the order of teams in this category isn’t shocking.

Something I wasn’t expecting was how poorly teams score on off-speed rolls. It would be interesting to look at the percentage of these that are off the block for tools rather than just being slow, poor attempts at swings that go awry.

And leading the pack in tipping efficiency, the Fighting Illini. Not necessarily sure what to make of it, but there it is. If you’re playing Illinois, you should probably have your tip coverage figured out. If you’re playing Maryland, don’t worry about it.

against

Here’s the viz I was even more intrigued by – how well do teams defend against each speed of attack. Nebraska separates themselves in this field. They handle both hard driven attacks and tips at the best rate, and roll shots at the third best (behind PSU and Minnesota). Moral of the story. Don’t tip on Nebraska? People who have played them recently understand the scrappiness of their backrow, spearheaded by JWO who departed the program this spring for a summer in Anaheim.

One interesting takeaway from this is the mediocrity of Minnesota’s tip coverage. Personally, I expected a defense built with the understanding that the off blocker dedicates herself to rolling under to cover the tip to perform better than average.

You could of course expand of this concept to turn it into something more actionable. How does Ohio State handle the tip on the Go when you run Go-Slide at them. Or how successful is Wisconsin at tipping on the Slide when they throw Go-3-Slide at you and force your middle to front the quick. Or even from a personnel basis, what’s the average dig quality when you tip against their setter versus their opposite. Having these ideas in the back pocket might be a get out of jail free card if you find yourself stuck in a tough rotation.

Thesis. Is. Done.

Image result for anchorman jump

Thesis was officially “deposited” yesterday afternoon and boom, it’s all done.

Chad Gordon Thesis-FINAL

I’ll copy and paste the abstract below and you can decide on your own if you’re intrigued enough to read it cover to cover. If I’m honest, I’ve only read it front to back once as so much of the work was just piecing it all together – spending hours with the formatting and citations and all that jazz.

MODELING COLLEGIATE STUDENT-ATHLETE SPORT PERFORMANCE VIA SELF- REPORT MEASURES

ABSTRACT

Purpose: Optimizing athlete performance is the central focus of players, coaches, and support staffs alike. For years, monitoring the stressors encumbering athletes has focused on the injury-risk dimension and has failed to look at sport-specific performance, the ultimate end goal. Self-report wellness measures have shown great promise in this realm and were implemented to track a wider range of metrics, including subjective performance. This study focused on mapping a combination of variables to each athlete’s performance data to better understand the key indicators of our outcome variable on an individual basis. A secondary aim of this study was to uncover trends amongst the team in which certain variables behaved similarly in their relationships with performance.

Methods: Female collegiate volleyball student-athletes (N=16) completed daily wellness monitoring via an online questionnaire. Data from the fall competitive season was collected via Qualtrics© and later regression analysis was performed using R.

Results: Performance of the regression models ranged from an explained variance (i.e., adjusted-R2) of 0.23 to 0.90 (i.e., 23-90%) indicating poor to strong results, dependent on the specific athlete as expected. Match-specific players averaged an explained variance in performance (adjusted-R2) of 0.66 (66%) while practice-specific players averaged 0.44 (44%). Sleep duration appeared in half of all athlete models though with both positive and negative coefficients. RPE-based training load metrics, daily locus of control, and physical fatigue appeared at the next highest frequencies, respectively, though again the coefficients were not uniformly positive or negative for every athlete. Heart rate variability (HRV) was projected to play a prominent, positive role in athlete performance yet only appeared in two of the regression equations.

Conclusions: As expected, the regression models were quite varied across the athletes. The approach worked better for match-specific players, with nearly two-thirds of the variance in match performance explained by the models on average. This study supports the adoption of a wider range of stressor metrics with specific emphasis on adding a locus of control dimension to monitoring systems. An expanded list of questions may be required to better encapsulate and map second order markers of athlete performance and this work provides additional rationale for tracking stressors outside of the sport-specific context as well as deeper use of cost-effective monitoring tools such as self-report measures to model performance in collegiate student- athletes.

Attackers’ Trends + Visualizing Development

attacker trends.jpg

Here are how four of the key outsides with the top teams in the Big Ten looked from the start of conference play until their respective seasons ended. Output Efficiencies are calculated using data from both the Big & Pac 2016 seasons and look at not only the kills/errors/attempts, but also the value of non-terminal swings. In this case OutputEff differentiates between a perfect dig by the opponent and a perfect cover by the attacking team – or a poor dig versus a great block touch by the opponent – etc. In this sense it’s better than traditional “true efficiency” in that it’s not just about how well your opponent attacks back after you attack – but it also appropriately weights different block touch, dig, and cover qualities as to their league-average value.

What you see above is the trends of these outsides over the course of the season. Foecke continuously improves as the season, as does Haggerty for Wisconsin. Frantti is interesting in that she actually declines up until early November then turns it on as PSU approaches tournament time. Classic Penn State. If Wilhite didn’t hit for over “.600” early in the season, she wouldn’t look like she’s trending down – but you have to keep in mind that her average (just north of .300) kinda blows people out of the water when you look at her consistency.

Personally, while I think this type of stuff is mildly interesting and you can definitely spin a story out of it, it’s not actionable in the sense that it’s going to help a coach make a better decision. However, this same principle could and probably should be applied on an in-season basis to look deeper at the development of players and specific skills. For example, high ball attacking:

swatk.jpg

You could build something like this for every day in practice. If you goal is to pass better, cool, let’s take your data from practice and graph it for the week and see if whatever changes we’re trying to implement have had their desired effect. Or let’s see if the team is improving as a whole as we make these specific changes:

mnpass.jpg

*the asterisk on 10/29 is because volleymetrics coded both MN matches from that week on the same day, so the date on the file for both says 10/29. That’s why we use Avg. Output Eff.

Anyway, there are thousands of ways to implement something like this – and then turn it into some digestible and actionable for the coaching staff.

Which type of serve is best?

Bears. Beets. Battlestar Galactica.

serve type.jpg

What you see above is the distribution of serving performances per player per match, broken down by type of serve. This chart is built using Big Ten and Pac 12 conference matches and serving performances with fewer than 5 serves in a match were excluded. 1st ball point score efficiency is the serving team’s wins minus losses when defending the reception + attack of their opponent. It’s basically FBSO eff from the standpoint of the serving team, which is why most of the efficiencies are negative, as the serving team is more likely to lose on that first attack after they serve.

You’ll see from the viz that the natural midpoint for all types of serves is around -.250. So the argument then becomes, well if you’re going to average about the same result regardless of what serve you hit, what does it matter? What matters here is the deviation from the mean. If you look at jump floats, it looks like the classic bell-shaped normal distribution graph and if you searched for specific players, you could see how their performances shake out relative to the average of the two leagues. If a player consistently fell below this average, maybe it’s time to develop a new serve or dive deeper into her poor performance.

Jump serving, as you might expect, definitely has a good percentage of players with performances above the mean. However, there’s also a wider distribution in general and because of this (likely due to increased service error when jump serving) many performance fall far short of league averages. The takeaway here is that while it can be beneficial, the larger standard deviation means you might only want to be jump serving if you need to take a chance against a stronger team.

Standing floats are interesting. Close and far just indicate where the server starts, relative to the endline. Molly Haggerty with Wisconsin hits a “far” standing float while Kathryn Plummer out of Stanford hits her standing float just inches from the endline. Not only is the average for standing floats farther from the endline a little higher (-.243) than standing floats from close to the endline (-.257) but as you can see from the chart, these far away floats are more narrowly distributed, indicating more consistent performance.

While jump floats have the highest average (-.229) and jump serves (-.264) may provide the appropriate risk-reward for some servers, it may actually be these standing float serves from long distance that provide a great alternative if you have a player lacking a nicely developed, above-average serve.

False. Black bear.