Skill ComparisonComparing the skill of two forecasts is fundamental to improving forecast systems. Such comparison requires a statistical test to decide whether the difference in skill is significantly larger than expected by random chance. Unfortunately, comparing skill measures like correlation or mean square error is problematic because standard significance tests assume that the skills were computed from independent samples. On the other hand, skills computed over the same period or validated with the same set of observations are not independent (e.g., forecasts tend to bust at the same time). Therefore, such tests give incorrect results, and in fact applying these tests when the skills are not independent leads to serious biases. Rigorous statistical methods for comparing forecast skill have been developed in the economics literature and in some underappreciated papers in weather and climate prediction. DelSole and Tippett (2014, Mon. Wea. Rev.) review these tests, including the sign test, the Wilcoxon SignedRank test, the MorganGrangerNewbold test, and Permutation tests. On this page, we provide codes for these tests and links to interesting applications of these methods.
If you use these algorithms and find bugs or improvements, then please let us know.
If you are just getting started or haven't used R or Matlab previously, then we highly recommend that you start with the paper Forecast Comparison Based on Random Walks (see Applications below). The random walk test is so simple there is no need for a code: given N events, simply count the number of times forecast A is more skillful than forecast B, and vice versa, then reject the hypothesis of equal skill (at the 5% significance level) if the difference in these two counts falls outside the interval [ 2 √ N , 2 √ N ].
Main references
Software
Applications
Seminar Presentations
