Statistical prediction shows Bradman retired too early

1.1 The decision to retire for sports persons

The decision to retire is a personal one (albeit this is sometimes influenced by the actions and attitudes of selectors) and while no one age can be identified as the “correct age” to retire, there are clearly implications on performance of ageing. Moreover, the decision to play on is not without dangers. The older player is often subject to greater scrutiny by the press and the selectors. They are often given less leeway in terms of their performances, with every failure being given great attention.  Within test cricket Michael Hussey comes into this category, with any minor form slump being heralded as “the end of the road”. Although in Hussey’s case, he started test cricket at the age of 30 and was less subject to less games, less touring etc. than some younger players who debuted at an earlier age. This may have helped his ability to play longer

Sir Donal Bradman is regarded as the greatest test batsman inn cricket history, retaining that title despite rigorous comparison with other, more recent, test cricketers 1 Bradman retired in 1948, aged 40 at the conclusion of an ash’s series against England with an average of 99.92., tantalisingly close to a fabled test batting average of 100.  

1.2 What if Bradman had played on?

Bradman’s decision to play on to age 40, considered quite old at the time, was driven by his desire to captain an Australian cricket team on a tour of England, something he had been denied by the cessation of test cricket during World War 11.  His last innings was a duck, bowled on the second ball he faced by Eric Hollies2 He had only required 4 runs in that innings to obtain an average of 100. He then retired, but what if he had played on? would he have reached that average? Did Bradman retire too soon? We will never know but by examining a statistical prediction model of Bradman’s test batting some conclusions can be drawn using the TASTATS cricket data base developed by Ric Finlay3. Some caveats attach. In essence the methodology is to: manipulate the TASTATS data base to obtain a time series of test batting averages for a selected player from their first test match to their last (or most recent in the case of players currently playing)

1.      Graph the data and then fit a trend line chosen, for each player, from a choice of several possible forms of distribution including.

  • Linear
  • Logarithmic
  • Polynomial (of various orders)
  • Exponential
  • Power series 4

In theory the polynomial distribution should provide the best fit, because,

“A polynomial trend-line is a curved line that is used when data fluctuates. It is useful, for example, for analysing gains and losses over a large data set”. This would appear to be suited to variations in test scores (although averages would be more stable than individual scores”.

Yet even here there is discretion in the form of the order of the polynomial, and this requires visual inspection of the raw data to determine the order.  As well, it was found in several cases that alternate forms, such as logarithmic, gave the best fit or appeared to be best suited to the data5.

In general, a trend line is most reliable when its R-squared value is at or near 1. However, this statistic cannot be, regarded as fully deterministic, as the Excel package used does not provide an R-squared (adjusted) diagnostic statistic which takes account of the sample size and the number of the predictor variables. Fortunately, in the cases studied here, the sample size is large, and the number of explanatory variables is low (only 1) so the R-squared and the R-squared adjusted should be approximately equal and there is no major problem. Finally, the size of the R-squared is not independent of the form of trend line chosen. For example, within the polynomial form, the order of the polynomial (2, 3 or 4) will impact upon the size of the R-squared.

The method used here would not be the one chosen if we wished to model those factors, apart from age, which influence the batting performance of individual players. For example, if we wished to find the determinants of the batting performance of a cricketer, we might use a regression analysis containing such variables as quality of the opposing team (is he more likely to score against India or England?), location of innings (home or away status) etc.  These factors would differ for each chosen batsman.  By contrast, here we are using the common factor of ageing to investigate its effect on performance of several players.

In the analysis, the trend line is used to predict the next 10-20 innings of the player (depending on the reliability of the fit

2.0 What the Statistics say

Sir Donald Bradman
Date of Birth27/08/1908
Age at debut20.27
Number of innings104
Highest average112.29 (29/01/1932)
Average on retirement99.94
  

Figure 1  D.G. Bradman test batting average over time predicted out 10 more innings

2.2 Conclusion

Several trend line formulations were used. The polynomial (3) obtained a higher R-squared (91%) but did not appear to properly describe the pattern of the data. Therefore, a logarithmic function (shown above) was used, and this appeared to comply with our criteria better, despite having a lower fit. This distribution explained 61.34% of the variation in Bradman’s test average, which is nonetheless more than acceptable (statistically) given the number of observations for a single variable model.  In any case it didn’t matter which statistical format was chosen as both models forecast suggests that Bradman’s average would have risen in the short run, (next 10 games) In this sense, age did not appear to be reducing his form and he may have retired prematurely. Had he played on he would have almost certainly reached a test batting average of 100.


  1. See, Borooah, V and Mangan, J. (2010)The “Bradman Class”: An Exploration of Some Issues in the Evaluation of  Batsmen for Test Matches, 1877-2006Journal of Quantitative  Analyses of Sport,  6 (3) 1-21 ↩︎
  2. See you tube video of the dismissal https://www.youtube.com/watch?v=RG29g-jf7f8 ↩︎
  3. See, http://www.tastats.com.au/about-us.htm. Thanks to Ric Finlay for informing the author how to manipulate the data base to obtain a time series. However, any errors resulting from the use of these data remain the sole responsibility of the author. ↩︎
  4. See,  http://office.microsoft.com/en-au/help/choosing-the-best-trendline-for-your-data-HP005262321.aspx ↩︎
  5. Stevenson, Christopher. “Tutorial: Polynomial Regression in Excel”facultystaff.richmond.edu. Retrieved 22 January 2017. ↩︎

Leave a Reply

Your email address will not be published. Required fields are marked *

error: Content is protected.

Fill out the form below and we will email you a PDF copy of the article.

Full Name(Required)