We Looked at Distance
In the previous analysis of distance, we found that there was a severe uptick in distance gained off the tee after the introduction of the Pro V1 golf ball. As you can see from the plot below, the red line indicates when the Pro V1 was introduced in October 2000.
Now Enter Accuracy
But what has happened to accuracy off the tee during this same time period? Again, taking data derived from Shotlink and scraped from the PGA Tour’s public-facing website using Python, we have managed to collect information on the tournament week level for every player to have made the cut since 1980. Using this dataset, we have added another piece to the puzzle of an eventual model that can help us determine the most important features of the modern tour player’s success. (Note: the year 2005 is missing from this dataset due to issues scraping that particular year. All calculations made impute missing values based on the years 2004 and 2006 for this year).
Leading up to 2000, technology helped Tour players find the fairway off the tee. In fact, the trend from 1980 to 1999 is a story of increased accuracy off the tee. The sharp decrease occurred immediately when golf balls started flying further.
The narrative of distance over accuracy becomes apparent when we view distance and accuracy off the tee together. On average, Tour players got longer at the cost of accuracy. The relationship between driving distance and accuracy still holds for those players that won during the week, if not more. For Tour winners, there is an even more exaggerated drop in the percentage of fairways hit off the tee, while distance is more than the average Tour player.
Let’s now take a look at the correlation between driving distance and accuracy. Taking each player that has made the cut in a tour sanctioned event since 1980, approximately 112,619 observations, we can plot the distance and accuracy. Each blue dot represents a player, which allows us to view the distribution of accuracy and the distribution of distance on the far right and top axis as well. More importantly, this combined scatterplot lets us see the relationship between distance and accuracy. Known as a Pearson coefficient, we can calculate the linear co-movement of these two variables. Simply put, in relation to each other, how well do they move? For an additional yard of distance, what decrease in accuracy can we infer?
And for the stats nerds out there, the equation for your enjoyment.
The following scatterplot highlights the average correlation coefficient between 1980 and 2020 of -0.27, meaning that for every additional yard, a player can expect to lose 0.27 percent in accuracy. For better interpretation, an increase in 10 yards would yield a decrease in accuracy of 2.7 percent.
Now, this is all on average and it is very difficult to infer that 1980 looks like 2020. When running the numbers for 1980, the Pearson coefficient was -0.24, while 2020 was -3.3. What would be interesting to see is these coefficients over time.
As you can see from the scatterplot below, each Pearson coefficient was calculated for each year. These coefficients were then plotted over time. A linear trend line was placed to demonstrate that while there were fluctuations between years, the overall story is that players have been giving up accuracy as they get longer.
For example, in 1980, a player gave up approximately 2.5% fairway accuracy for each additional 10 yards they gained. But in 2020, a player will need to give up almost an additional 1% decrease in accuracy off the tee to gain 10 yards. This makes sense when we think about it for a moment. Players hit it longer and a 5 degree miss with the driver will be further offline at 300 yards out than it is at 250 yards out. This is simple geometry; the further one travels from a line at an angle, the further that person will be from the other line.
While none of these findings are earth-shattering, my hope is that through iterations of exploring these PGA statistics, a meager contribution to the golf analytics community can be made.
As always, the code used in this analysis is available at the author’s GitHub repository: https://github.com/nbeaudoin/PGA-Tour-Analytics and can be found on LinkedIn at https://www.linkedin.com/in/nicholas-beaudoin-805ba738/
https://www.wallstreetmojo.com/pearson-correlation-coefficient/ https://www.golfdiscount.com/blog/fun-facts/2018-pga-tour-driving-statistics/#prettyPhoto/0/ https://www.pgatour.com/news/2019/05/11/nine-things-to-know-pga-championship-bethpage-black.html