Untangling the NFL Pt. 3: Defensive Value

Readers of my last column might remember that I watch quite a bit of Survivor. But what you might not know is that it’s far from the only competition reality series that I watch. In fact, before tuning into Thursday Night Football last night, I was listening to a YouTube recap of RuPaul’s Drag Race All Stars Season 3. My wife told me, “You are the only man in the world who could talk for hours about Ben Delacreme and Drake Maye alike.” I think she’s right. 

Thankfully for the rest of you, I dedicate most of my writing time on this website to talking about pro football. So in the greatest transition of all-time, let me remind you that over the last two weeks, I’ve researched the history of “value” as it pertains to different football research efforts to create a singular metric to assess player performance, as well as contributed to this history on the offensive end myself. The results, if you remember, ranged from somewhat accurate to totally garbage.

In a future column, I’m going to revisit my initial methodology for offense and deeply interrogate its limitations to find a better approach. For now though, and in today’s piece, I’m going to delve into the other end of the ball and try to figure out defensive value. 

Conceptual Architecture & Data Prep

When I created Total Offensive Metric (TOM) last week, I took nflWAR’s approach of breaking down offense into multiple categories: “Air,” “Rush,” and “Receive,” also adding a fourth one in “Block” to account for offensive linemen and blocking plays. In similar fashion, I decided to do something like this for defensive players (once again using end-of-season Pro Football Focus-sourced premium statistics for defense). 

This time, however, I organized the data around three key domains of defensive responsibility: rushing the passer (PassRush), stopping the run (RunDefense), and disrupting receivers (Coverage). Once again though, each dataset contained player-level records for the 2024 regular season, merged together by the players listed, their teams, and respective positions. Together, these will make up Total Defensive Metric (TDM), our defensive replacement for TOM.

PassRushRunDefenseCoverage
Player
Team
Position
PassRushSnapsSacksHitsHurriesPressures
WinRatePRPPFF_PassRushGradePFF_DefenseGrade
Player
Team
Position
RunDefenseSnaps
StopsTackles
MissedTackles
MissedTackleRateStopPercentForcedFumbles
PFF_RunDefenseGradePFF_DefenseGrade
Player
Team
Position
CoverageSnaps
TargetsReceptionsAllowed
YardsAllowed
TDsAllowed
PasserRatingAllowed
ForcedIncompletions
PBUs
INTs
PFF_CoverageGrade
PFF_DefenseGrade

EDITOR’S NOTE: In this edition, I accounted for players that played for multiple teams by treating them as individual instances of players and then combining their instances at the very end of our process for generating the individual leader-board. This was a change from last week, in which I ignored this minimal group of players. 

Like what I did with my features on the offensive side of the ball, I normalized each metric and then adjusted them by snaps. This was in order to compute weighted domain scores in each of these aforementioned categories. To ensure no outliers (like offensive players suddenly lining up on defense) were included, I filtered out players with fewer than 75 pass-rush snaps, 150 coverage snaps, 100 run-defense snaps, and anyone below 200 total snaps. 

When I examined the totality of our different weighted domain scores, I found interesting results for each of the three domains. PassRush had the largest mean (0.125), but it was largely skewed positively, while Coverage was mostly flattened around zero with a much more negative skew (-0.129). Conversely, RunDefense (0.016) was the most stable. 

Ridge Regression vs. Teamwide Metrics

After compiling together player-level domain scores for our three defensive domains (and normalizing them by snaps, as well as aggregating to reflect role-specific contributions), the next step was to explore how these domains actually factor into team success. Similar to what I did with offensive composite scores and DVOA offense, I decided to examine the link between our composite defensive scores and DVOA, but for defense.

First, I had to compile every player’s weighted domain values to the team level and compare them with FTN Fantasy’s 2024 tracking of team-wide defensive DVOA. Once again, I added sub-scores for pass defense DVOA and rush defense DVOA, making sure to invert the DVOA scores, since negative scores in DVOA are signs of better defenses. 

Unsurprisingly, I went with a ridge regression approach again due to the interdependence of pass defense and run defense again. With ridge regressions, I was able to minimize the chances of accidental coefficient inflation. 

PhaseR-squaredMAE
Pass Defense0.2996.96
Rush Defense0.5115.15
Overall0.4664.68

Although it’s not entirely explanatory, our model still explains roughly 47 percent of the variance between team-wide individual defensive composite scores and defense DVOA (though its impact is more moderated in pass defense). When examining the ridge coefficients for all our defensive domains against our overall defensive DVOA, I found that PassRushScore (3.054) leads the way, with RunDefenseScore (2.676) not too far behind it and the much noisier CoverageScore (1.306) last. 

Does any of this mean that pass rush and run defense mean at least twice as much as coverage in NFL defenses? Of course not. What this model essentially says is that pass rush and run defense tend to be more consistent predictors of team-level defensive outcomes. Basically, it’s easier for this model to detect and quantify the impact of trench play – ironically, the opposite of offense. 

Ridge Weighting & Team-Wide Calibration

With the generated ridge coefficients, I then moved back to applying them for individual player outputs, giving each defender composite scores in our domains, ensuring to normalize the scores for playing time and positional context. The final result was an early version of TDM – the defensive version of TOM. 

These domain scores were then multiplied by our corresponding ridge coefficients to help us restructure defensive performance from the bottom up. In other words, I scaled each player’s contributions by how predictive their particular domain performances were of team performance. Once this was complete, I then took the average of these ridge-weighted player values to create an aggregate to directly compare with defensive DVOA. 

On a team-wide basis, our ridge-calibrated TDM correlated with overall defensive DVOA at 0.73, showing a very strong correlation. Similarly encouraging were the correlation scores for pass defense (0.65) and run defense (0.74). Having now closed the loop between player production and team impact, I then moved to disentangling domain interactions and positional effects even further. After all, the goal is to create a single metric to encapsulate defensive value. 

The Top 25 Leaderboard

Imagine my exhaustion throughout this process. I was sick of the word “ridge” at this point; I had even forgotten to make myself dinner. But I felt fairly confident that I had taken an even more rigorous approach in my work than the prior week – I eagerly anticipated my Top 25 defensive leaderboard to be good. Maybe it wouldn’t be entirely accurate, but I would learn something about football that maybe I didn’t know before. 

In hindsight, I shouldn’t have been too surprised at seeing an entire top 25 dominated by edge rushers and interior defensive lineman. Even still, I felt immensely frustrated. Surely, cornerbacks – including the reigning defensive player of the year Pat Surtain II – are not a myth, and strong linebackers matter. In fact, where are the safeties? Is Kyle Hamilton secretly overrated?

I tried my best to tinker everywhere I could. I tried mild positional weights. I experimented with including and excluding different features per player. But ultimately, I felt like I was going in circles. I still kind of do. What’s next for me? Is this just completely doomed?

Next Steps

My original plan in writing this column was to have a rough “Universal Value” score by Week 4. It’s the end of Week 3 right now, and what I have is a very scuffed offensive metric (TOM) stand-in and an even more lopsided attempt at calculating defensive value (TDM). Although it’s been an interesting process, in many ways I feel like I’m back at square one. With that said though, I think I’ve learned a few lessons outside of “calculating value in football is hard.” 

Firstly, think the ‘domain’ approach I’ve taken in my last two weeks was not the right one. In my follow-up, I think I’m going to try to limit my features and composite scores on a positional basis rather than snap or team-wide one. I also think that my attempt to correlate individual features with team-wide impact feels largely outside the realm of what I should really be examining, which is properly assessing the value of individual contributions. And finally, I am going to try to look at 2021-2024 data rather than just one season (which is too noisy).  

    Was this whole process a total waste of time? I don’t think so. Sometimes to go forward, you have to go backward. For next week’s column, I’m going to revisit my attempts to capture offensive value, see exactly where I went wrong, and hopefully build a better metric.

    Appendix

    Project GitHub

    Pro Football Focus

    FTN Fantasy

    Published by EdwinBudding

    Anokh Palakurthi is a writer from Boston who is currently pursuing his masters degree in business analytics at Brandeis University. In addition to writing weekly columns about Super Smash Bros. Melee tournaments, he also loves writing about the NFL, NBA, movies, and music.

    Leave a Reply

    Discover more from bignokh.com

    Subscribe now to keep reading and get access to the full archive.

    Continue reading