# Dependencies in statistical hypothesis tests for climate time series.

Statistical hypothesis tests stand as a crucial tool for
underpinning a given conjecture. Such tests have usually been developed
under a restrictive set of assumptions--for example, the independence of
successive data within the time series. As persistence can be found for
many weather and climate phenomena, the hypothesis of independence is
likely not valid for most problems. A consequence of persistence in a
time series is that a new datum coming in does not bring as much new
information about the statistical properties as would be the case for a
time series of independent data. A frequently used approach to account
for the reduced amount of information consists of adjusting the value
specifying the number of data used in the statistical test toward a
smaller value than the actual sample size--the so-called effective
sample size. This is the length of a fictive time series of independent
data carrying the same amount of information on the statistical
distribution. An estimate for this effective sample size is commonly
derived from a widely used formula based on the autocorrelation
function, but where the sample autocorrelation function is applied
instead of the true unknown autocorrelation function of the time series,
although a few studies highlighted the volatility of this approach.

We propose a more robust technique to estimate the effective sample size for the case of an autoregressive process of order 1 (AR1), a suitable hypothesis for many time series in meteorology and climate sciences. Our method, which involves a two-tiered estimation of the autocorrelation function to refine its estimate, decreases the root mean square error (RMSE) of the estimated effective sample size by as much as a factor of 3-4 compared to the original formula for AR1 time series with large persistence (i.e., large lag-1 autocorrelation coefficient). This new method has been implemented in the statistical programming software R, and it is distributed as part of the s2dverification package, available from the comprehensive R archive network (CRAN) repository (http:// cran.r-project.org/). Time series exhibiting long memory (such as power-law decay autocorrelation) or quasiperiodic behavior, however, may require a more refined treatment.

The development of this new method was triggered by the alarming observation that the RMSE of the estimated effective sample size could be as large as 60% of the effective sample size when using the original formula. Furthermore, the effective size is mostly overestimated, leading to confidence intervals being, on average, too narrow by a factor of 2. Although the original formula is valid under a wide range of hypotheses, the use of the sample autocorrelation instead of the true unknown autocorrelation function introduces a substantial uncertainty on the estimate of the effective sample size. Our approach fits an AR1 model to the full sample autocorrelation function to optimally reduce the uncertainty of the estimated effective sample size.

A second issue, even more challenging to tackle, comes from the hypothesis of identically distributed data used to derive the original formula. This hypothesis is hardly ever valid in real-world climate or meteorological problems. An illustrative example is the influence of the annual cycle--winter and summer data should be modeled as coming from two different probability distributions. Insidious is the climate change signal that induces a long-term, and not necessarily monotonic, change in the distribution of most climate variables. The violation of the hypothesis of identically distributed random variables is the source of spurious autocorrelations in most climate time series. To properly address this issue, further investigation is required.--VIRGINIE GUEMAS [INSTITUT CATALA DE CIENCIES DEL CLIMA (BARCELONA, SPAIN), AND METEO-FRANCE], L. AUGER, F. J. DOBLAS-REYES, H. RUST, AND A. RIBES, "Hypothesis Testing for Auto-Correlated Short Climate Time Series," in the March Journal of Applied Meteorology and Climatology.

We propose a more robust technique to estimate the effective sample size for the case of an autoregressive process of order 1 (AR1), a suitable hypothesis for many time series in meteorology and climate sciences. Our method, which involves a two-tiered estimation of the autocorrelation function to refine its estimate, decreases the root mean square error (RMSE) of the estimated effective sample size by as much as a factor of 3-4 compared to the original formula for AR1 time series with large persistence (i.e., large lag-1 autocorrelation coefficient). This new method has been implemented in the statistical programming software R, and it is distributed as part of the s2dverification package, available from the comprehensive R archive network (CRAN) repository (http:// cran.r-project.org/). Time series exhibiting long memory (such as power-law decay autocorrelation) or quasiperiodic behavior, however, may require a more refined treatment.

The development of this new method was triggered by the alarming observation that the RMSE of the estimated effective sample size could be as large as 60% of the effective sample size when using the original formula. Furthermore, the effective size is mostly overestimated, leading to confidence intervals being, on average, too narrow by a factor of 2. Although the original formula is valid under a wide range of hypotheses, the use of the sample autocorrelation instead of the true unknown autocorrelation function introduces a substantial uncertainty on the estimate of the effective sample size. Our approach fits an AR1 model to the full sample autocorrelation function to optimally reduce the uncertainty of the estimated effective sample size.

A second issue, even more challenging to tackle, comes from the hypothesis of identically distributed data used to derive the original formula. This hypothesis is hardly ever valid in real-world climate or meteorological problems. An illustrative example is the influence of the annual cycle--winter and summer data should be modeled as coming from two different probability distributions. Insidious is the climate change signal that induces a long-term, and not necessarily monotonic, change in the distribution of most climate variables. The violation of the hypothesis of identically distributed random variables is the source of spurious autocorrelations in most climate time series. To properly address this issue, further investigation is required.--VIRGINIE GUEMAS [INSTITUT CATALA DE CIENCIES DEL CLIMA (BARCELONA, SPAIN), AND METEO-FRANCE], L. AUGER, F. J. DOBLAS-REYES, H. RUST, AND A. RIBES, "Hypothesis Testing for Auto-Correlated Short Climate Time Series," in the March Journal of Applied Meteorology and Climatology.

Printer friendly Cite/link Email Feedback | |

Title Annotation: | NOWCAST: PAPERS OF NOTE |
---|---|

Author: | Guemas, Virginie; Auger, L.; Doblas-Reyes, F.J.; Rust, H.; Ribes, A. |

Publication: | Bulletin of the American Meteorological Society |

Geographic Code: | 1USA |

Date: | Nov 1, 2014 |

Words: | 616 |

Previous Article: | Hawaiian history in the making: twin hurricanes from the east. |

Next Article: | Ocean drones entering uncharted waters. |

Topics: |