Step 14 – Computer Vision Intro

Computer vision already powers self-driving cars, the gaming world, and numerous other products. Within sport, this technology has already made its way into basketball, football, baseball, soccer, etc. – and has further fueled their respective analytics movements.

One of my favorite articles that encapsulates what this work allows for is from Brian Macdonald. Published in the Harvard Data Science Review, it’s called Recreating the Game, and a single visual from the article closely summarizes what we’re trying to do. You can find his article here:

Now for a quick backstory. I’ve been playing some pick-up beach volleyball for the last 10 years or so at West Valley College here in the Bay Area. There’s a group of guys who have been coming out for decades and so last month, one of the regulars, a guy named Andrew Tao is just chatting about his week at work. He casually mentions he works in computer vision at NVIDIA – and my spidey-senses go off. I shoot him a note, sharing the above article, asking how hard it would be to do something like this in volleyball…. “yeah this is definitely possible. It’s just a bit of engineering work to figure out the easiest way to get reasonable results” – and so we’re off and running!

First and foremost, let me be clear, Andrew is the smart one with the technical chops making all of this happen – I am doing some grunt work where I can be useful. I just happen to have a blog that is read by like…tens of people so I am posting what is overwhelmingly his work. So here we go…


Can we use modern Computer Vision techniques to detect, track, and identify player locations through a match?

If so, could we use that tracking data to inform better strategies of playing and/or coaching the game? Better inform how we quantify player performance?

For example: assess defender quality by how much court they can cover – determine setter quality by being able to track the ball and see who is keeping attackers in rhythm – see what defensive formations have worked the best against specific attackers – and the list goes on and on….

Detection + Tracking

We’re using a state-of-the-art person detector / tracker which is doing a pretty good job of both detecting and tracking players as you can see from the GIF.

We are using ByteTrack + FairMOT

transform to top-down view

To start, we manually annotate the 4 corners of the court. Then use cv2.getPerspectiveTransform() to perform the transformation of detection boxes from the camera view to the top-down perspective.

Naturally, this works well when players are on the ground, but will fail when players jump. This will be part of the ‘next steps’ section.

player identification

One issue with tracking players is when they disappear from view. Huddles after the point, overlapping blockers at the net pre-serve, and players exiting the field of view all make the problem of re-identification tricky.

Our current strategy is to train a jersey number detector using the off-the-shelf solution of RetinaNet trained on the Street View House Numbers dataset (SVHN):

As of early December, the results are encouraging! It’s detecting digit locations pretty well but does mis-classify numbers quite a bit – should be able to easily supervise away the false positives. We’re in the process of adding more in-domain (volleyball-specific) labeled data, which should go a long way to fix both issues.

As you can see below, there are some initial challenges with player tracking, specifically the re-id problem. You’ll notice color changes of the bounding boxes for many players throughout the sequence as the computer thinks it’s a brand new player. This is the problem we’re trying to solve by identifying the jersey numbers themselves so that the computer can connect what it saw a few seconds ago with what it’s currently seeing to produce a single player track as they move around the court.

Another challenge we have is the jitter of the player tracking. If you look carefully, you’ll see the bounding boxes switch from a player’s feet to kneepads in some frames. This in turn causes the “height” of the bbox to change and therefore it looks like the player is rapidly bouncing while staying pretty still. We’ll likely smooth this out with some quick filtering / moving average stuff – but this makes the initial data pretty noisy. And as you can probably tell, we aren’t capturing the volleyball in any way at this point.

Next steps

  • Improve jersey number detector using labeled data
  • Stitch broken player tracks together – likely using jersey numbers to guide this
    • Example: you pick up Player 1 using the detector, you keep track all rally, then the player goes into the huddle and you lose track. Using jersey numbers, you can re-id this person as Player 1 and stitch together the player’s paths.
  • Glue the pieces together and visualize the results. Will take work to fix where tracker gets confused about ID of player
  • How do we correct player location when someone jumps?
  • How do we detect the ball? Will eventually train a ball detector
  • Automatic court detection
  • Jersey detection confidence metric – we need to know when the jersey number cannot be seen


Loosely related, but along the same lines. There’s a “multi-person pose estimator” called AlphaPose that is pretty neat. It’s not directly what we’re trying to do, but I can see the application within the sport science realm. Being able to monitor training load, diagnose landing postures that may contribute to injury risk, maybe some attack / reception / setting mechanics that you could use this data for. I’m not sure yet – but I think it’s cool and worth sharing.


So there you have it. Andrew has done some pretty incredible work in the last couple months and I’m super excited to see how things continue to shape up in the future. As things progress, we’ll try and post major updates, and eventually do some analysis on the data that results from all this hard work!