NCAR'S REAL-TIME CONVECTION-ALLOWING ENSEMBLE PROJECT: NCAR's real-time convection-allowing ensemble project began as an experimental forecast demonstration to advance research in high-resolution ensemble design and evolved into a considerable outreach effort uniting research and operational meteorologists.
Between 7 April 2015 and 30 December 2017, the U.S. National Center for Atmospheric Research (NCAR) produced real-time, 48-h, 10-member ensemble forecasts with 3-km horizontal grid spacing over the entire conterminous United States (CONUS). These forecasts were colloquially known as the "NCAR ensemble."
The NCAR ensemble project was conceived in late 2014. At this time, although previous efforts had demonstrated high-resolution convection-allowing ensembles (CAEs) provide useful forecast guidance (e.g., Schwartz et al. 2010; Clark et al. 2012; Evans et al. 2014), the United States lacked an operational CAE and many questions remained about optimal CAE configuration. Thus, the NCAR ensemble project was developed primarily to further fundamental research about CAE design, with additional goals to showcase innovative methods of CAE visualization; produce a continuous, long-term dataset of publicly available CAE output; use the long-term statistics to identify systematic model errors; and demonstrate a next-generation CAE prototype that could be operationally implemented over the CONUS.
Ideally, ensembles, including CAEs, should be designed such that they are easy to maintain and postprocess, produce forecasts with simple probabilistic interpretations, enable straightforward identification of systematic biases, account for known error sources, and possess initial condition (IC) perturbations that reflect both forecast model configurations and day-to-day variations in meteorological uncertainty (Table 1). Many of these properties can be achieved by designing an ensemble that employs continuously cycling data assimilation (DA) to produce ICs and follows the principle of equal likelihood, where, over many cases, each member's forecast is equally likely of representing the "true" atmospheric state. (1)
In an equally likely ensemble, all members have identical physical parameterizations and dynamics and are initialized from states that are themselves equally likely closest to truth; equally likely limited-area ensembles further have lateral boundary conditions (LBCs) drawn from equally likely states. Adhering to equal likelihood engenders many beneficial properties, such as easy maintenance, post-processing, and statistical interpretation (Table 1). Furthermore, equally likely ensembles have members with similar model climatologies that avoid the deleterious clustering (e.g., Fig. 1a) that often occurs in unequally likely ensembles composed of members with varied physics and dynamics (e.g., Johnson et al. 2011; Gowan et al. 2018). An equally likely ensemble can be considered "formally designed."
Other desirable properties can be obtained by using continuously cycling DA to produce equally likely, flow-dependent IC perturbations that reflect forecast model configurations (Table 1). In contrast to randomly produced IC perturbations, flow-dependent IC perturbations yield larger spread in areas of greater forecast uncertainty, like those collocated with sharp gradients (Figs. 1b-d). Moreover, by leveraging the forecast model itself to create ICs, continuously cycling DA systems can produce IC perturbations spanning the full range of scales resolvable by the forecast model, promoting good error growth characteristics. Conversely, when ensembles are initialized from analyses provided by unrelated external numerical weather prediction (NWP) models, mismatches between the forecast model and external ICs regarding biases, scale representation, physics, or dynamics can produce undesirable error growth characteristics that limit usefulness of short-range forecasts (e.g., Raynaud and Bouttier 2016; Schwartz 2016; Gustafsson et al. 2018).
Although most global ensembles are formally designed and possess all the desirable properties (e.g., Bowler et al. 2008; Buizza et al. 2008; Zhou et al. 2017), CAEs are comparatively new, and, in part owing to computational constraints, real-time CAEs preceding the NCAR ensemble featured relatively ad hoc configurations. For example, the first real-time CAE forecasts, produced by the Center for Analysis and Prediction of Storms (CAPS) at the University of Oklahoma in spring 2007, were initialized from an external model and featured varied physics across 10 members (Kong et al. 2007; Xue et al. 2007). CAPS continued to produce real-time, multiphysics CAE forecasts in subsequent years during springtime, while other American institutions developed year-round real-time CAEs that were also informally designed: both the U.S. Air Force Weather Agency's (now the 557th Weather Wing) high-resolution CAE (Hacker et al. 2011; Kuchera et al. 2014) and the National Oceanic and Atmospheric Administration (NOAA) Storm Prediction Center's (SPC) StormScale Ensemble of Opportunity (SSEO; Jirak et al. 2012) were amalgamations of independent deterministic forecasts featuring varied physics and IC sources, and NOAA's National Severe Storm Laboratory's CAE (Gallo et al. 2017) used ICs from several unequally likely independent external sources, including analyses and short-term forecasts. Similarly, European CAEs preceding the NCAR ensemble, including operational CAEs in Germany (Gebhardt et al. 2011; Peralta et al. 2012) and the United Kingdom (Hagelin et al. 2017), relied on external analyses to obtain IC perturbations. Nonetheless, although the designs of these CAEs meant they did not possess all the desirable properties (Table 1), these collective efforts were highly successful and provided valuable forecast guidance and cutting-edge research datasets.
However, increased computing power and advances regarding limited-area DA have recently enabled development of more cohesive CAEs, with hopes of eventually realizing performance improvements. In particular, continuously cycling ensemble Kalman filters (EnKFs; Evensen 1994; Houtekamer and Zhang 2016) show promise for CAE initialization (e.g., Schwartz et al. 2014) because they produce an ensemble of flow-dependent IC perturbations centered about a high-quality ensemble mean analysis in a single step and offer a means toward equally likely CAE IC perturbations that reflect forecast model configurations (see "Continuously cycling EnKF DA" sidebar).
Among earlier efforts using continuously cycling EnKFs to initialize real-time convection-allowing forecasts were those for tropical cyclone prediction and field campaign support (Cavallo et al. 2013; Barth et al. 2015). Real-time EnKF-initialized CAE forecasts were introduced by Texas Tech University in 2012 and NCAR (Schwartz et al. 2015a) in May and June 2013; the two systems had much in common. Notably, the NCAR CAE demonstration occurred in support of the Mesoscale Prediction Experiment (MPEX; Weisman et al. 2015), where the forecasts were evaluated daily. These forecasts provided valuable guidance during MPEX and produced objectively skillful forecasts (Schwartz et al. 2015a), which motivated planning for a longer-term, real-time demonstration at NCAR again focusing on CAEs initialized from continuously cycling EnKF analyses. This longer-term effort became the NCAR ensemble project.
NCAR ENSEMBLE CONFIGURATIONS AND POTENTIAL LIMITATIONS. The NCAR ensemble was designed to possess many desirable properties (Table 1) and consisted of separate but related analysis and forecast components, where the analysis component provided equally likely ICs for the forecast component. As Schwartz et al. (2015b) comprehensively described the configurations, the documentation provided here is brief.
Configurations. As in NCAR's MPEX demonstration, the analysis component consisted of continuously cycling EnKF DA (see "Continuously cycling EnKF DA" sidebar) using the ensemble adjustment Kaiman filter (Anderson 2001) implemented in the Data Assimilation Research Testbed (DART; Anderson et al. 2009) software. New EnKF analyses were produced every 6 h that updated ensembles of 6-h forecasts by assimilating a set of conventional observations (Fig. 2a) subjected to numerous quality-control measures (Schwartz et al. 2015b). Initially, the EnKF DA system had 50 ensemble members (Schwartz et al. 2015b), but on 2 May 2016, the ensemble size was increased to 80 to bolster system performance. Also beginning 2 May 2016, assimilation of global position system radio occultation (GPSRO) refractivity observations commenced and a "spread restoration" algorithm in DART was activated to slightly increase analysis spread. Use of continuously cycling DA meant the analysis system was self-contained, with data from an external NWP model required solely for boundary condition updates.
The analysis component ran over a domain with 15-km horizontal grid spacing covering the CONUS and adjacent areas, while the forecast component ran with a 3-km horizontal grid spacing nest embedded within the 15-km analysis domain (Fig. 2b). All ensemble members in both components used version 3.6.1 of the Advanced Research Weather Research and Forecasting (WRF) Model (Skamarock et al. 2008; Powers et al. 2017) with 40 vertical levels, a 50-hPa top, and identical dynamics and physical parameterizations (Schwartz et al. 2015b), except no cumulus parameterization was used on the 3-km domain. (2) Unique LBC perturbations were derived for each ensemble member in both the analysis and forecast components by adding random, correlated, Gaussian noise to NCEP's Global Forecast System (GFS) analyses and forecasts (Torn et al. 2006; Schwartz et al. 2015b), yielding equally likely LBCs for all members.
For each 0000 UTC DA cycle between 7 April 2015 and 30 December 2017, 15-km analyses from 10 members were downscaled onto the 3-km grid to initialize 48-h forecasts on the nested 15- and 3-km domains (Fig. 2b). These 48-h, 10-member forecasts containing the 3-km nest required 5,120 processor cores for 4 h on dedicated nodes of NCAR's Yellowstone supercomputer, and the 3-km forecasts were the end result of the system. The choice to initialize the 3-km, 48-h forecasts solely at 0000 UTC with just 10 members was mostly driven by computational constraints; nonetheless, 10-member CAEs can provide useful probabilistic forecasts (e.g., Clark et al. 2011, 2018; Schwartz et al. 2014). Latency was approximately 4 h, with daily NCAR ensemble forecasts beginning around 0400 UTC, and the 48-h forecasts were typically complete by 0800 UTC.
Given the continuously cycling analysis system, CAE forecasts with 50 (or 80) members could have been initialized every 6 h; in fact, during the 13-14 March 2017 northeast U.S. blizzard, a special real-time, 1200 UTC, 10-member, 3-km forecast was produced. Moreover, analyses could have been performed more frequently (e.g., every hour). Thus, given sufficient computing, the NCAR ensemble analysis system framework could be useful for future rapidly updating or "on demand" CAE forecasts.
Potential limitations. Given the randomly produced LBCs, common physics and dynamics across all members, and ICs drawn from equally likely states provided by EnKF analyses, the NCAR ensemble forecasts conformed to the principle of equal likelihood (3) and to our knowledge represented the first formally designed, year-round, real-time CAE spanning the entire CONUS. Additionally, application of continuously cycling EnKF DA yielded flow-dependent IC perturbations. However, because 15-km analyses initialized 3-km forecasts, the ICs were inconsistent with the forecast model, meaning the 3-km forecasts were prone to "spinup" errors where small-scale structures require several hours to develop from comparatively coarse-resolution ICs. Nonetheless, when the NCAR ensemble project began, use of real-time, continuously cycling EnKF DA to derive flow-dependent IC perturbations was uncommon and a step toward designing a CAE over the full CONUS possessing all the desirable properties (Table 1).
Although the NCAR ensemble had many attractive properties and experiences during MPEX suggested the forecasts would be credible, there were risks associated with the NCAR ensemble configurations. In particular, model imperfections engender biases that perpetually remain in limited-area continuously cycling DA systems and can degrade forecasts (e.g., Hsiao et al. 2012; Torn and Davis 2012; Romine et al. 2013). Moreover, EnKFs have limitations when ensemble sizes are relatively small, and techniques to address these limitations are still under development (e.g., Houtekamer and Zhang 2016). Furthermore, most single-physics, single-dynamics CAEs are spread deficient (e.g., Due et al. 2013; Schumacher and Clark 2014; Schwartz et al. 2014; Hagelin et al. 2017), and incorporating physics or dynamics diversity can increase CAE spread (e.g., Schumacher and Clark 2014; Johnson and Wang 2017) to yield forecasts with improved verification metrics.
Therefore, using continuously cycling EnKF DA to initialize a single-dynamics, single-physics CAE was accompanied with knowledge that the forecasts may not be as objectively good as possible, as measured by common performance statistics. Nonetheless, these configurations were chosen to advance research regarding storm-scale ensemble design and further understanding of CAE forecasts initialized from continuously cycling EnKFs such that they can be improved.
THE NCAR ENSEMBLE WEBSITE. NCAR ensemble forecast data were primarily disseminated through the Internet via graphical products, which depicted a diverse collection of meteorological fields over eight regional subdomains and a CONUS-wide domain. During forecast integration, when output at a particular forecast hour was available from all members, graphics were generated for that hour using Python-based mapping and visualization tools (e.g., matplotlib, basemap, SciPy, NumPy). Graphics were uploaded to the project website (4) immediately upon creation and displayed using a collection of rollover links, enabling quick and easy visualization of temporal evolution of forecast fields. Approximately 100,000 graphics per forecast were generated and are archived for retrospective analysis.
Common ensemble-based products, such as ensemble means, probabilities, standard deviations, maxima, and minima, were plotted, in addition to novel diagnostics and fields primarily related to convective storms and winter weather. For example, "paintball plots" (Fig. 3a) and neighborhood maximum ensemble probabilities (Ben Bouallegue and Theis 2014; Golding et al. 2016; Schwartz and Sobash 2017), which are probabilities of event occurrence within a distance of a location, were used for precipitation accumulations and various convective storm diagnostics including updraft helicity (UH; Kain et al. 2008), a useful field for detecting rotating thunderstorms in high-resolution NWP models (Fig. 3b).
While website visitors mostly viewed forecast data, the NCAR ensemble website also displayed real-time products related to the analysis system, including objective performance statistics. Furthermore, near-real-time forecast verification information was provided, in part by overlaying observed rawinsonde profiles atop ensemble skew T diagrams and observed reflectivity atop forecast reflectivity products (e.g., Fig. 3a). In addition, National Weather Service (NWS) severe weather warning polygons were overlaid on ensemble forecasts of severe weather surrogates (Sobash et al. 2016a) to provide further verification information (Fig. 6), which revealed the NCAR ensemble was excellent at highlighting areas that were warned for severe weather over 24-h periods.
Given its layout, functionality, innovative tools, near-real-time verification, eye-catching graphics, and novel CAE forecast products, the NCAR ensemble website arguably broke new ground for a website presenting CAE output and sets a benchmark for future websites devoted to displaying CAE data. Respondents to an informal survey were remarkably consistent in highly ranking the website as among the most important components of the project (Fig. 7a).
COMMUNITY IM PACTS. The NCAR ensemble's cutting-edge web interface fostered two-way communication between NCAR and the community and heavily contributed toward the success and longevity of the project, which had wide-reaching impacts touching educational, operational, and research communities in both the public and private sectors. Google Analytics statistics indicated the project website was visited by over 40,000 unique users and received approximately 840 page views each day, and of those who responded to an informal survey, 75% visited the website at least once per week (Fig. 7b). Most users had favorable impressions of forecast spread and skill (Figs. 7c,d), which bolstered community usage.
Students and faculty frequently visited the NCAR ensemble website, reflecting use of the NCAR ensemble in classroom activities. Although daily or weekly map discussions represented the primary method of bringing the NCAR ensemble into the classroom, courses at the University of Utah (J. Steenburgh 2016, personal communication) and the University at Albany, State University of New York (SUNY), built class projects around evaluating NCAR ensemble output. This exposure provided students with experiences examining what a future, formally designed, operational CAE might look like and introduced them to challenges and opportunities associated with interpreting high-resolution probabilistic forecasts.
Additionally, although the NCAR ensemble forecasts were experimental, they were quickly embraced by operational NWS forecasters (see sidebar on "The NCAR ensemble in NWS area forecast discussions"). Thus, to facilitate operational use of the forecasts, gridded model output [in the form of gridded binary (CRIB) files] was provided in real time to the SPC and several NWS Forecast Offices (NWSFOs) for display in workstations. Website statistics indicate NOAA and NWS employees were responsible for at least 33% of all sessions, and forecasters from 74 unique NWSFOs mentioned the NCAR ensemble approximately 1,000 times in area forecast discussions, indicating widespread operational adoption of the NCAR ensemble as a guidance tool (see "NCAR ensemble in NWS area forecast discussions" sidebar). That operational forecasters embraced these experimental products suggest the NCAR ensemble filled a void, and although difficult to quantify, it appears that operational enthusiasm for the NCAR ensemble project may have accelerated CAE developments at NCEP.
Gridded NCAR ensemble output was also provided in real time to university, government, and private-sector researchers, who used the data for many purposes. For example, the NCAR ensemble was evaluated in model intercomparison studies concerning an East Coast winter storm (Greybush et al. 2017) and high-elevation snowfall over the western CONUS (Gowan et al. 2018). Moreover, NCAR ensemble performance was assessed alongside other convection-allowing models in the 2015-17 NOAA Hazardous Weather Testbed Spring Forecast Experiments (e.g., Gallo et al. 2017; Clark et al. 2018). The NCAR ensemble was also used to explore numerical guidance products for tornado forecasting (Sobash et al. 2016b), demonstrate high-resolution model verification principles (Schwartz 2017; Schwartz and Sobash 2017), assess CAE forecasts of convective mode evolution (Carlberg et al. 2018), and investigate the diurnal cycle of low-level winds over the Great Plains (Bluestein et al. 2018).
Furthermore, the NCAR ensemble was used in postprocessing studies, including machine learning applications (Gagne et al. 2017), Bayesian linear regression (e.g., Yang et al. 2017), and adjustment of satellite-estimated rainfall (Zhang et al. 2018). Additionally, experimental power-outage prediction models were driven by NCAR ensemble output (Cerrai et al. 2016), and the NCAR ensemble was used to produce specialized real-time products regarding heavy rainfall over the Denver urban drainage and flood control district (Smirnov et al. 2018) and debris flows at Mount Rainier National Park (S. Beason 2018, personal communication).
A myriad of other topics could be explored using the NCAR ensemble data archive, which contains gridded products most relevant to high-impact weather prediction from all 10 members and is freely available to the public through NCAR's Research Data Archive (https://doi.org/10.5065/D68C9TZ3), providing unprecedented, easy access to a multiyear CAE dataset. Moreover, ICs, LBCs, model configuration files, and source code are available from the authors such that NCAR ensemble forecasts can be reproduced.
ANALYSIS SYSTEM PERFORMANCE AND DIAGNOSIS OF MODEL BIAS. In the NCAR ensemble analysis component, observations adjusted an ensemble of backgrounds to produce an analysis ensemble each DA cycle ("Continuously cycling EnKF DA" sidebar). The difference between the analysis and background is called the analysis increment (or simply "increment"); when increments are averaged over long time periods, the sign and magnitude of the time-averaged increments can diagnose systematic biases between short-term model forecasts and observations. For example, if the background is too warm compared to a temperature observation, assimilation of the temperature observation will decrease the surrounding model state temperature to render an analysis colder than the background (a negative increment). In the NCAR ensemble analysis system, as the model itself was used to produce backgrounds via continuously cycling DA ("Continuously cycling EnKF DA" sidebar), the time-averaged increments reflect biases of the 15-km forecast model relative to observations, which arise as a result of persistent sampling errors dominated by effects of imperfect physical parameterizations (e.g., Romine et al. 2013).
Patterns of ensemble mean analysis increments varied geographically and seasonally (Fig. 8). For example, during winter (Fig. 8a), the Dakotas, eastern half of Oklahoma, and most of Texas had positive 2-m temperature increments (indicating cold biases), whereas in spring and summer (Figs. 8b,c), these same areas generally had negative increments (indicating warm biases). Seasonal differences were also evident over the Carolinas and Deep South, which had positive increments during winter (Fig. 8a) but negative increments over summer (Fig. 8c). More broadly, while the Intermountain West featured primarily positive increments year-round, Florida and the high plains of Colorado, Nebraska, and Kansas had mean negative increments across all seasons.
These results suggest model bias characteristics are complicated and that evaluating performance of a forecast model and associated physical parameterization suite over short time periods and small geographic regions is insufficient to fully understand how model errors behave. Rather, systematic model errors should be evaluated over multiple seasons and geographic regions to paint a full picture of physics scheme deficiencies. Ongoing research with the NCAR ensemble dataset will attempt to determine how these analysis biases impacted subsequent forecasts.
FORECAST SYSTEM PERFORMANCE. The multiyear nature of the NCAR ensemble project permits a rigorous examination of how a formally designed CAE performs seasonally. We focus on precipitation forecasts because simulated precipitation depends on many physical processes and provides an overall summary of model performance.
Methods. Objective verification scores were produced by comparing NCAR ensemble precipitation forecasts to NCEP stage IV observations (Lin and Mitchell 2005) over all grid points within the CONUS east of 105[degrees]W, where stage IV observations are most robust (e.g., Nelson et al. 2016). As many verification metrics require a common grid for corresponding forecasts and observations, the 3-km NCAR ensemble precipitation forecasts were interpolated to the stage IV grid (~4.763-km horizontal grid spacing) using a budget interpolation method (Accadia et al. 2003).
As a primary rationale for ensemble forecasting is explicit quantification of uncertainty through probabilities, we focus on verifying probabilistic precipitation forecasts. Several steps were performed to produce appropriate probabilities (Fig. 9). First, precipitation accumulation thresholds q were selected (e.g., q = 5.0 mm [h.sup.-1]) to define events. Then, because the NCAR ensemble conformed to equal likelihood, all members had identical weights, and the probability of event occurrence at a specific grid point was simply the fraction of ensemble members predicting the event at that point (Figs. 9a-d). Probabilities computed in this manner are called point-based probabilities.
However, high-resolution NWP models are inherently inaccurate at individual grid points, so it is inappropriate to verify point-based probabilities. Thus, a "neighborhood" approach (e.g., Theis et al. 2005; Ebert 2008; Roberts and Lean 2008) was applied to the point-based probabilities to incorporate spatial uncertainty. Application of the neighborhood approach involved selecting neighborhood length scales of 5,25,50,75, and 100 km; these neighborhood length scales were defined as the radius of a circle and determined which grid points resided within the neighborhood of a particular point (Fig. 9d). Then, given a neighborhood length scale, neighborhood ensemble probabilities (NEPs; Schwartz et al. 2010; Schwartz and Sobash 2017) at a specific point were obtained by averaging the point-based probabilities within the neighborhood of that point (Figs. 9d,e). Although NEPs are interpreted as probabilities of event occurrence at a specific point (e.g., Schwartz and Sobash 2017), they account for spatial and temporal errors by considering surrounding grid points and are more appropriate to verify than point-based probabilities. Thus, NEPs were objectively verified to assess NCAR ensemble performance.
Seasonal variations of forecast skill and reliability. Forecast skill was quantified with the fractions skill score (FSS; Roberts and Lean 2008), which compares a probabilistic forecast to fractional observations obtained analogously to NEPs; observed fractions (i.e., fractions based on stage IV observations) at a specific point were determined by dividing the number of observed events occurring within the neighborhood of that point by the total number of points in the neighborhood. The FSS then compares the probabilistic forecast--in this case, NEPs--with observed fractions (Roberts and Lean 2008). FSS = 1 indicates a perfect forecast, while FSS = 0 indicates no skill.
FSSs for 1-h accumulated precipitation decreased as both forecast length and precipitation accumulation threshold (e.g., q) increased (Fig. 10), indicating poorer performance at longer lead times and heavier rainfall rates. During autumn and winter, FSSs were notably higher than those during spring and summer for all thresholds, and similar patterns were documented for FSSs computed with both 50- (Figs. 10a-c) and 100-km (Figs. 10d-f) neighborhood length scales. Differences between cool- and warm-season FSSs were maximized over the first few hours, reflecting larger spinup errors during the warm season, when deep convection provides a considerable amount of precipitation.
The FSS can also be used to assess how forecast skill changes as a function of spatial scale by varying the neighborhood length scale. In general, as the neighborhood length scale increases, more leniency is given to forecasts with displacement errors and FSSs increase. Moreover, Roberts and Lean (2008) showed how the FSS corresponding to a minimum useful scale ([FSS.sub.useful]) can be calculated as [FSS.sub.useful] = 0.5 +f/2, where f is the fractional coverage of observed points where the event has occurred (i.e., fraction of observed points with precipitation [greater than or equal to] q). The neighborhood length scale that yields FSS = [FSS.sub.useful] is interpreted as the smallest spatial scale over which NWP output should be presented to forecasters (Roberts and Lean 2008).
As expected, as the neighborhood length scale increased, FSSs increased for both 24- (Figs, 11a-c) and 48-h (Figs, 11d-f) forecasts of 1-h accumulated precipitation. Seasonal differences regarding the minimum useful scale were apparent: 24-h forecasts of 1-h precipitation [greater than or equal to] 1.0 mm [h.sup.-1] (Fig. 11b) had a useful scale near 5 km during winter but between 50 and 75 km for summer, indicating that east of the Rockies, smaller-scale forecasts could be trusted more during winter than summer. Minimum useful scales for 48-h precipitation forecasts (Figs, 11d-f) were larger than those at 24 h, consistent with forecast degradation with lead time (e.g., Fig. 10). These seasonal differences reflect strength of synoptic-scale forcing, which is relatively strong over the CONUS during winter and weak in summer, and NWP models often perform best under strongly forced regimes.
Ensemble reliability was objectively measured by constructing reliability diagrams (Wilks 2011). To do so, forecast probability bins spanning 0%-5%, 5%-15%, 15%-25%, ..., 85%-95%, and 95%-100% were selected. Then, given forecast probabilities falling in a particular bin, the frequency of observed event occurrence was obtained; determining the conditional observed event frequency for each probability bin traces a reliability curve.
A perfectly reliable ensemble has forecast probabilities that agree with conditional observed event frequencies. For example, an ensemble is perfectly reliable in the 35%-45% bin if an observed event occurs on approximately 40% of the occasions when a forecast probability of 35%-45% is issued. Thus, if an ensemble is perfectly reliable across all probability bins, the reliability curve will lie along a diagonal line.
Shapes of the reliability curves (Fig. 12) indicate 24-h forecasts of 1-h precipitation were generally overconfident throughout the year at spatial scales up to 100 km. For example, during spring and summer at the 1.0 mm [h.sup.-1] threshold for point-based probabilities, when there was a 75%-85% forecast probability, the observed event only occurred ~50% of the time (see gray curve and annotation on Fig. 12b). However, as the neighborhood length scale increased, 1-h accumulated precipitation forecasts generally became more reliable (curves moved closer to diagonal).
Seasonal variations were again evident. During the spring and summer (Figs. 12a-c), there were larger differences between reliability curves computed with different neighborhood length scales compared to fall and winter (Figs. 12d-f), suggesting the NCAR ensemble had greater reliability at smaller spatial scales during the cool season. Additionally, forecasts during autumn and winter were typically more reliable than spring and summer forecasts, especially for probabilities < 45%. Similar patterns were noted at other forecast lead times (not shown).
In summary, FSSs > 0 indicate the NCAR ensemble possessed skill for precipitation rates [less than or equal to] 5.0 mm [h.sup.-1] (Figs. 10 and 11), and useful scales varied with season and forecast length. However, the NCAR ensemble was not perfectly reliable at spatial scales up to 100 km. Nonetheless, widespread operational use ("NCAR ensemble in NWS area forecast discussions" sidebar) and favorable survey responses regarding spread and skill (Figs. 7c,d) indicate the NCAR ensemble provided valuable probabilistic forecast guidance, highlighting the complex relationship between forecast quality and value (e.g., Murphy 1993).
FINAL REMARKS. The NCAR ensemble represented the first real-time, formally designed CAE spanning the entire CONUS. The innovative website, publicly available archive, and provision of raw, realtime output to interested researchers represented unprecedented outreach efforts for a CAE. Furthermore, the project demonstrated that continuously cycling EnKFs are a viable option for initializing credible real-time CAE forecasts over the CONUS.
During the NCAR ensemble's lifetime, CAE development progressed worldwide. France (Raynaud and Bouttier 2017), Switzerland (Klasa et al. 2018), and South Korea all implemented operational CAEs during the NCAR ensemble project, while over the United States, SPC's SSEO was modified and operationalized as version 2 of the high-resolution ensemble forecast (HREF) system in late 2017. However, like the former multidynamics, multiphysics SSEO, the HREF is an ad hoc CAE that does not possess many desirable properties (Table 1). Yet, HREF-like CAEs fit within operational constraints and perform comparably to or better than more formal experimental CAEs (e.g., Jirak et al. 2016; Clark et al. 2018), primarily owing to greater spread engendered by multiple dynamics cores and physics schemes. Nevertheless, given its suboptimal design, the HREF is best considered a high-quality baseline, and ultimately, when a formally designed CAE demonstrably outperforms the HREF, NCEP should swiftly operationalize such a system.
In fact, the NCAR ensemble could be considered an initial prototype of a future operational CAE over the CONUS, provided further developments. For example, the NCAR ensemble employed 15-km analyses to initialize 3-km forecasts, but performance gains can likely be realized by initializing the 3-km forecasts from a 3-km DA system incorporating radar and high-resolution satellite observations (e.g., Gustafsson et al. 2018). Moreover, ensemble spread characteristics should be improved in the context of single-physics, single-dynamics ensembles by better understanding IC and LBC perturbation characteristics and through stochastic physics schemes (e.g., Romine et al. 2014; Berner et al. 2017; Jankov et al. 2017) that inject spatially correlated random noise during model integration, which can increase ensemble spread while preserving equal likelihood. The feasibility of incorporating both high-resolution, continuously cycling DA and stochastic physics within operational CAEs has been demonstrated in European systems (e.g., Schraff et al. 2016; Hagelin et al. 2017; Raynaud and Bouttier 2017; Klasa et al. 2018), and efforts over the United States should be devoted to designing similar systems while recognizing that implementations of high-resolution DA systems encompassing large domains like the CONUS pose additional challenges compared to implementations over comparatively small European domains.
Development activities toward achieving these goals and designing a CAE possessing all the desirable properties (Table 1) are underway at NCAR. In addition, beyond the benefits of attaining these desirable properties through formal CAE design, CAE developers should also strive for favorable outcomes, including skillful predictions where the model climate is indistinguishable from the true atmosphere, forecasts with optimal dispersion characteristics, and probabilistic guidance that permits good decisionmaking (e.g., Murphy 1993). These outcomes should be achieved through the most efficient use of available resources.
Largely because NCAR ensemble output was displayed in an eye-catching, easily navigable manner, the forecasts were immediately accessible and generated excitement throughout all factions of the United States meteorological community. Thus, while generally, forging synergistic collaborations between American research and operational institutions has been challenging (e.g., UCAR 2015), the NCAR ensemble showed how long-term, real-time forecast demonstrations can be a powerful uniting force because of their ability to instantaneously reach and inspire all meteorologists, from forecaster to researcher and from amateur to professional.
Overall, the popularity and success of the NCAR ensemble indicates a broad appetite for experimental forecasts that surpass current operational capabilities, suggesting additional real-time demonstrations of forward-looking forecast systems and techniques are primed for success if products are cleanly displayed. Therefore, despite challenges associated with disseminating experimental, unproven forecasts, we encourage researchers working on novel forecast capabilities to publish real-time forecasts in an attempt to further unite forecasters and researchers and accelerate progress toward improved NWP models.
AFFILIATIONS: SCHWARTZ, ROMINE, SOBASH, FOSSELL, AND WEISMAN--National Center for Atmospheric Research, Boulder, Colorado
CORRESPONDING AUTHOR: Craig Schwartz, email@example.com
The abstract for this article can be found in this issue, following the table of contents.
In final form 17 September 2018
For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy.
ACKNOWLEDGMENTS. The NCAR ensemble was run on the Yellowstone supercomputer (ark:/85065/ d7wd3xhc) provided by NCAR's Computational and Information Systems Laboratory (CISL). Contributions from Michael Coniglio and Kent Knopfmeier greatly benefited this project. Thanks to Julie Demuth for helping with the survey. We are also grateful for support from Davide Del Vento, Al Kellie, Rich Rotunno, Carter Borst, Doug Schuster, David Edwards, Michael Moran, Wei Wang, Jenny Sun, Stan Trier, David Ahijevych, Jim Hurrell, Jeff Anderson, Chris Snyder, Greg Thompson, David John Gagne, Rebecca Adams-Selin, Chris Davis, Steve Weiss, Israel Jirak, Minghua Zhang, Steven Greybush, Gary Lackmann, Matt Parker, Marina Astitha, Manos Anagnostou, Jay Shafer, Jim Steenburgh, Brian Tang, Lance Bosart, Kristen Corbosiero, and Ryan Torn. Additionally, we acknowledge UPP support, CISL's RDA, and NCAR's DART developers. Three anonymous reviewers provided constructive comments that improved this paper. The NCAR ensemble was partially supported by NCAR's Short-Term Explicit Prediction (STEP) program and NOAA Grant NA150AR4590191. NCAR is sponsored by the National Science Foundation.
Accadia, C., S. Mariani, M. Casaioli, A. Lavagnini, and A. Speranza, 2003: Sensitivity of precipitation forecast skill scores to bilinear interpolation and a simple nearest-neighbor average method on high-resolution verification grids. Wea. Forecasting, 18, 918-932, https://doi.org/10.1175/1520-0434(2003)018<0918:s0 PFSS>2,0.CO;2.
Anderson, J. L., 2001: An ensemble adjustment Kaiman filter for data assimilation. Mon. Wea. Rev., 129,2884-2903, https://doi.org/10.1175/1520-0493(2001)129 <2884:AEAKFF>2,0.CO;2.
--, T. Hoar, K. Raeder, H. Liu, N. Collins, R. Torn, and A. Arellano, 2009: The Data Assimilation Research Testbed: A community facility. Bull. Amer. Meteor. Soc., 90, 1283-1296, https://doi.org/10 .1175/2009BAMS2618.1.
Barker, D. M., 2005: Southern high-latitude ensemble data assimilation in the Antarctic Mesoscale Prediction System. Mon. Wea. Rev., 133,3431-3449, https:// doi.org/10.1175/MWR3042.1.
Barth, M. C., and Coauthors, 2015: The Deep Convective Clouds and Chemistry (DC3) field campaign. Bull. Amer. Meteor. Soc., 96, 1281-1309, https://doi .org/10.1175/BAMS-D-13-00290.1.
Ben Bouallegue, Z., and S. E. Theis, 2014: Spatial techniques applied to precipitation ensemble forecasts: From verification results to probabilistic products. Meteor. Appl., 21, 922-929, https://doi.org/10.1002/met. 1435.
Berner, J., and Coauthors, 2017: Stochastic parameterization: Toward a new view of weather and climate models. Bull. Amer. Meteor. Soc., 98,565-588, https:// doi.org/10.1175/BAMS-D-15-00268.1.
Bluestein, H. B., G. S. Romine, R. Rotunno, D. W. Reif, and C. C. Weiss, 2018: On the anomalous counterclockwise turning of the surface wind with time in the plains of the United States. Mon. Wea. Rev., 146, 467-484, https://doi.org/10.1175/MWR-D-17-0297.1.
Bowler, N. E., A. Arribas, K. R. Mylne, K. B. Robertson, and S. E. Beare, 2008: The MOGREPS short-range ensemble prediction system. Quart. J. Roy. Meteor. Soc., 134, 703-722, https://doi.org/10.1002/qj.234.
Buizza, R., M. Leutbecher, and L. Isaksen, 2008: Potential use of an ensemble of analyses in the ECMWF Ensemble Prediction System. Quart. J. Roy. Meteor. Soc., 134, 2051-2066, https://doi.org/10.1002/qj.346.
Carlberg, B. R., W. A. Gallus Jr., and K. J. Franz, 2018: A preliminary examination of WRF ensemble prediction of convective mode evolution. Wea. Forecasting, 33, 783-798, https://doi.org/10.1175/WAF -D-17-0149.1.
Cavallo, S. M., R. D. Torn, C. Snyder, C. Davis, W. Wang, and J. Done, 2013: Evaluation of the Advanced Hurricane WRF data assimilation system for the 2009 Atlantic hurricane season. Mon. Wea. Rev., 141,523-541, https://doi.org/10.1175/MWR-D-12-00139.l.
Cerrai, D., and Coauthors, 2016: Enhanced outage prediction modeling for strong extratropical storms and hurricanes in the northeastern United States. Fall Meeting of the American Geophysical Union, San Francisco, CA, Amer. Geophys. Union, NH51B-1930, https://agu.confex.com/agu/fm16/meetingapp.cgi /Paper/166625.
Clark, A. J., and Coauthors, 2011: Probabilistic precipitation forecast skill as a function of ensemble size and spatial scale in a convection-allowing ensemble. Mon. Wea. Rev., 139, 1410-1418, https://doi .org/10.1175/2010MWR3624.1.
--, and Coauthors, 2012: An overview of the 2010 Hazardous Weather Testbed Experimental Forecast Program Spring Experiment. Bull. Amer. Meteor. Soc., 93, 55-74, https://doi.org/10.1175/BAMS -D-11-00040.1.
--, and Coauthors, 2018: The Community Leveraged Unified Ensemble (CLUE) in the 2016 NOAA/Hazardous Weather Testbed Spring Forecasting Experiment. Bull. Amer. Meteor. Soc., 99, 1433-1448, https:// doi.org/10.1175/BAMS-D-16-0309.1.
Duc, L., K. Saito, and H. Seko, 2013: Spatial-temporal fractions verification for high-resolution ensemble forecasts. Tellus, 65A, 18171, https://doi.org/10.3402 /tellusa.v6510.18171.
Ebert, E. E., 2008: Fuzzy verification of high resolution gridded forecasts: A review and proposed framework. Meteor. Appl., 15, 51-64, https://doi .org/10.1002/met.25.
Evans, C., D. F. Van Dyke, and T. Lericos, 2014: How do forecasters utilize output from a convection-permitting ensemble forecast system? Case study of a high-impact precipitation event. Wea. Forecasting, 29, 466-486, https://doi.org/10.1175/WAF -D-13-00064.1.
Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99, 10 143-10 162, https://doi.org/10.1029 /94JC00572.
Gagne, D. J., A. McGovern, S. E. Haupt, R. A. Sobash, J. K. Williams, and M. Xue, 2017: Storm-based probabilistic hail forecasting with machine learning applied to convection-allowing ensembles. Wea. Forecasting, 32, 1819-1840, https://doi.org/10.1175 /WAF-D-17-0010.1.
Gallo, B. T., and Coauthors, 2017: Breaking new ground in severe weather prediction: The 2015 NOAA/Hazardous Weather Testbed Spring Forecasting Experiment. Wea. Forecasting, 32, 1541-1568, https://doi. org/10.1175/WAF-D-16-0178.1.
Gebhardt, C., S. E. Theis, M. Paulat, and Z. Ben Bouallegue, 2011: Uncertainties in COSMO-DE precipitation forecasts introduced by model perturbations and variation of lateral boundaries. Atmos. Res., 100, 168-177, https://doi.org/10.1016/j .atmosres.2010.12.008.
Golding, B., N. Roberts, G. Leoncini, K. Mylne, and R. Swinbank, 2016: MOGREPS-UK convection-permitting ensemble products for surface water flood forecasting: Rationale and first results. J. Hydrometeor., 17, 1383-1406, https://doi.org/10.1175 /JHM-D-15-0083.1.
Gowan, T. M., W. J. Steenburgh, and C. S. Schwartz, 2018: Validation of mountain precipitation forecasts from the convection-permitting NCAR ensemble and operational forecast systems over the western United States. Wea. Forecasting, 33,739-765, https:// doi.org/10.1175/WAF-D-17-0144.1.
Greybush, S. J., S. Saslo, and R. Grumm, 2017: Assessing the ensemble predictability of precipitation forecasts for the January 2016 and 2016 East Coast winter storms. Wea. Forecasting, 32, 1057-1078, https://doi .org/10.1175/WAF-D-16-0153.1.
Gustafsson, N., and Coauthors, 2018: Survey of data assimilation methods for convective-scale numerical weather prediction at operational centres. Quart. J. Roy. Meteor. Soc., 144, 1218-1256, https://doi .org/10.1002/qj.3179.
Hacker, J., and Coauthors, 2011: The U.S. Air Force Weather Agency's mesoscale ensemble: Scientific description and performance results. Tellus, 63A, 625-641, https://doi.org/10.1111/j.1600-0870 .2010.00497.x.
Hagelin, S., J. Son, R. Swinbank, A. McCabe, N. Roberts, and W. Tennant, 2017: The Met Office convectivescale ensemble, MOGREPS-UK. Quart. J. Roy. Meteor. Soc., 143,2846-2861, https://doi.org/10.1002 /qj.3135.
Houtekamer, P. L., and F. Zhang, 2016: Review of the ensemble Kaiman filter for atmospheric data assimilation. Mon. Wea. Rev., 144, 4489-4532, https://doi.org/10.1175/MWR-D-15-0440.1.
Hsiao, L.-F., D.-S. Chen, Y.-H. Kuo, Y.-R. Guo, T.-C. Yeh, J.-S. Hong, C.-T. Fong, and C.-S. Lee, 2012: Application of WRF 3DVAR to operational typhoon prediction in Taiwan: Impact of outer loop and partial cycling approaches. Wea. Forecasting, 27, 1249-1263, https://doi.org/10.1175/WAF-D-11-00131.1.
Jankov, I., and Coauthors, 2017: A performance comparison between multiphysics and stochastic approaches within a North American RAP ensemble. Mon. Wea. Rev., 145, 1161-1179, https://doi.org/10.1175/MWR -D-16-0160.1.
Jirak, I. L., S. J. Weiss, and C. J. Melick, 2012: The SPC storm-scale ensemble of opportunity: Overview and results from the 2012 Hazardous Weather Testbed Spring Forecasting Experiment. 26th Conf. on Severe Local Storms, Nashville, TN, Amer. Meteor. Soc., 137, https://ams.confex.com/ams/26SLS/webprogram /Paper211729.html.
--, C. J. Melick, and S. J. Weiss, 2016: Comparison of the SPC storm-scale ensemble of opportunity to other convection-allowing ensembles for severe weather forecasting. 28th Conf. on Severe Local Storms, Portland, OR, Amer. Meteor. Soc., 102, https://ams.confex.com/ams/28SLS/webprogram/Session41668.html.
Johnson, A., and X. Wang, 2017: Design and implementation of a GSI-based convection-allowing ensemble data assimilation and forecast system for the PECAN field experiment. Part I: Optimal configurations for nocturnal convection prediction. Wea. Forecasting, 32, 289-315, https://doi.org/10.1175/WAF-D-16-0102.1.
--,--, F. Kong, and M. Xue, 2011: Hierarchical cluster analysis of a convection-allowing ensemble during the Hazardous Weather Testbed 2009 Spring Experiment. Part I: Development of the object-oriented cluster analysis method for precipitation fields. Mon. Wea. Rev., 139, 3673-3693, https://doi .org/10.1175/MWR-D-11-00015.1.
Kain, J. S., and Coauthors, 2008: Some practical considerations regarding horizontal resolution in the first generation of operational convection-allowing NWP. Wea. Forecasting, 23, 931-952, https://doi .org/10.1175/WAF2007106.1.
Klasa, C., M. Arpagaus, A. Walser, and H. Wernli, 2018: An evaluation of the convection-permitting ensemble COSMO-E for three contrasting precipitation events in Switzerland. Quart. J. Roy. Meteor. Soc., 144, 744-764, https://doi.org/10.1002/qj.3245.
Kong, F., and Coauthors, 2007: Preliminary analysis on the real-time storm-scale ensemble forecasts produced as a part of the NOAA Hazardous Weather Testbed 2007 spring experiment. Preprints, 22nd Conf. on Weather Analysis and Forecasting/18th Conf. on Numerical Weather Prediction, Salt Lake City, UT, Amer. Meteor. Soc., 3B.2, http://ams.confex.com/ams/pdfpapers/124667.pdf.
Kuchera, E., S. Rentschier, G. Creighton, and J. Hamilton, 2014: The Air Force weather ensemble prediction suite. 15th Annual WRF Users' Workshop, Boulder, CO, UCAR-NCAR, www2.mmm.ucar.edu/wrf/users/workshops/WS2014/ppts/2.3.pdf.
Lin, Y., and K. E. Mitchell, 2005: The NCEP stage II/ IV hourly precipitation analyses: Development and applications. Preprints, 19th Conf. on Hydrology, San Diego, CA, Amer. Meteor. Soc., 1.2, http://ams.confex.com/ams/pdfpapers/83847.pdf.
Murphy, A. H., 1993: What is a good forecast? An essay on the nature of goodness in weather forecasting. Wea. Forecasting, 8, 281-293, https://doi.org/10.1175/1520-0434(1993)008<0281:WIAGFA>2.0.CO;2.
Nelson, B., O. Prat, D. Seo, and E. Habib, 2016: Assessment and implications of NCEP stage IV quantitative precipitation estimates for product comparisons. Wea. Forecasting, 31,371-394, https://doi.org/10.1175/WAF-D-14-00112.1.
Peralta, C., Z. B. Bouallegue, S. E. Theis, C. Gebhardt, and M. Buchhold, 2012: Accounting for initial condition uncertainties in COSMO-DE-EPS. J. Geophys. Res., 117, D07108, https://doi.org/10.1029 /2011JD016581.
Powers, J. G., and Coauthors, 2017: The Weather Research and Forecasting Model: Overview, system efforts, and future directions. Bull. Amer. Meteor. Soc., 98, 1717-1737, https://doi.org/10.1175/BAMS -D-15-00308.1.
Raynaud, L., and F. Bouttier, 2016: Comparison of initial perturbation methods for ensemble prediction at convective scale. Quart. J. Roy. Meteor. Soc., 142, 854-866, https://doi.org/10.1002/qj.2686.
--, and --, 2017: The impact of horizontal resolution and ensemble size for convective-scale probabilistic forecasts. Quart. J. Roy. Meteor. Soc., 143, 3037-3047, https://doi.org/10.1002/qj.3159.
Roberts, N. M., and H. W. Lean, 2008: Scale-selective verification of rainfall accumulations from high-resolution forecasts of convective events. Mon. Wea. Rev., 136, 78-97, https://doi.org/10.1175 /2007MWR2123.1.
Romine, G., C. S. Schwartz, C. Snyder, J. Anderson, and M. Weisman, 2013: Model bias in a continuously cycled assimilation system and its influence on convection-permitting forecasts. Mon. Wea. Rev., 141, 1263-1284, https://doi.org/10.1175/MWR -D-12-00112.1.
--, --, J. Berner, K. R. Fossell, C. Snyder, J. L. Anderson, and M. L. Weisman, 2014: Representing forecast error in a convection-permitting ensemble system. Mon. Wea. Rev., 142,4519-4541, https://doi .org/10.1175/MWR-D-14-00100.1.
Schraff, C., H. Reich, A. Rhodin, A. Schomburg, K. Stephan, A. Perianez, and R. Potthast, 2016: Kilometre-scale ensemble data assimilation for the COSMO model (KENDA). Quart. J. Roy. Meteor. Soc., 142, 1453-1472, https://doi.org/10.1002/qj.2748.
Schumacher, R. S., and A. J. Clark, 2014: Evaluation of ensemble configurations for the analysis and prediction of heavy-rain-producing mesoscale convective systems. Mon. Wea. Rev., 142,4108-4138, https://doi .org/10.1175/MWR-D-13-00357.1.
Schwartz, C. S., 2016: Improving large-domain convection-allowing forecasts with high-resolution analyses and ensemble data assimilation. Mon. Wea. Rev., 144, 1777-1803, https://doi.org/10.1175/MWR -D-15-0286.1.
--, 2017: A comparison of methods used to populate neighborhood-based contingency tables for high-resolution forecast verification. Wea. Forecasting, 32, 733-741, https://doi.org/10.1175/WAF-D-16-0187.1.
--, and R. A. Sobash, 2017: Generating probabilistic forecasts from convection-allowing ensembles using neighborhood approaches: A review and recommendations. Mon. Wea. Rev., 145, 3397-3418, https://doi .org/10.1175/MWR-D-16-0400.1.
--, and Coauthors, 2010: Toward improved convection-allowing ensembles: Model physics sensitivities and optimizing probabilistic guidance with small ensemble membership. Wea. Forecasting, 25, 263-280, https://doi.org/10.1175/2009WAF2222267.1.
--, G. S. Romine, K. R. Smith, and M. L. Weisman, 2014: Characterizing and optimizing precipitation forecasts from a convection-permitting ensemble initialized by a mesoscale ensemble Kaiman filter. Wea. Forecasting, 29, 1295-1318, https://doi.org/10.1175 /WAF-D-13-00145.1.
--,--, M. L. Weisman, R. A. Sobash, K. R. Fossell, K. W. Manning, and S. B. Trier, 2015a: A real-time convection-allowing ensemble prediction system initialized by mesoscale ensemble Kaiman filter analyses. Wea. Forecasting, 30, 1158-1181, https:// doi.org/10.1175/WAF-D-15-0013.1.
--,--, R. A. Sobash, K. R. Fossell, and M. L. Weisman, 2015b: NCAR's experimental real-time convection-allowing ensemble prediction system. Wea. Forecasting, 30, 1645-1654, https://doi.org/10.1175 /WAF-D-15-0103.1.
Skamarock, W. C., and Coauthors, 2008: A description of the Advanced Research WRF version 3. NCAR Tech. Note NCAR/TN-475+STR, 113 pp., https://doi .org/10.5065/D68S4MVH.
Smirnov, D., D. McGlone, A. J. Clark, C. Schwartz, and K. Stewart, 2018: On the use of high-resolution ensembles for operational heavy rainfall forecasting in the Denver Metro Area. 32nd Conf. on Hydrology, Austin, TX, Amer. Meteor. Soc., J53.6, https://ams.confex.com/ams/98Annual/webprogram/Paper333117.html.
Smith, T. M., and Coauthors, 2016: Multi-Radar MultiSensor (MRMS) severe weather and aviation products. Bull. Amer. Meteor. Soc., 97, 1617-1630, https://doi.org/10.1175/BAMS-D-14-00173.1.
Sobash, R. A., C. S. Schwartz, G. S. Romine, K. Fossell, and M. Weisman, 2016a: Severe weather prediction using storm surrogates from an ensemble forecasting system. Wea. Forecasting, 31, 255-271, https://doi.org/10.1175/WAF-D-15-0138.1.
--, G. S. Romine, C. S. Schwartz, D. J. Gagne, and M. L. Weisman, 2016b: Explicit forecasts of low-level rotation from convection-allowing models for next-day tornado prediction. Wea. Forecasting, 31, 1591-1614, https://doi.org/10.1175/WAF-D-16-0073.l.
Theis, S. E., A. Hense, and U. Damrath, 2005: Probabilistic precipitation forecasts from a deterministic model: A pragmatic approach. Meteor. Appl., 12, 257-268, https://doi.org/10.1017/S1350482705001763.
Torn, R. D., and C. A. Davis, 2012: The influence of shallow convection on tropical cyclone track forecasts. Mon. Wea. Rev., 140, 2188-2197, https://doi.org/10.1175/MWR-D-11-00246.1.
--, G. J. Hakim, and C. Snyder, 2006: Boundary conditions for limited-area ensemble Kaiman filters. Mon. Wea. Rev., 134, 2490-2502, https://doi.org/10.1175/MWR3187.1.
UCAR, 2015: Report of the UCACN Model Advisory Committee. UCAR Rep., 72 pp., www.ncep.noaa.gov/director/ucar_reports/ucacn_20151207/ UMAC_Final_Report_20151207-v14.pdf.
Weisman, M. L., and Coauthors, 2015: The Mesoscale Predictability Experiment (MPEX). Bull. Amer. Meteor. Soc., 96, 2127-2149, https://doi.org/10.1175 /BAMS-D-13-00281.1.
Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. 3rd ed. Academic Press, 676 pp.
Xue, M., and Coauthors, 2007: CAPS real-time storm-scale ensemble and high-resolution forecasts as part of the NOAA Hazardous Weather Testbed 2007 Spring Experiment. Preprints, 22nd Conf. on Weather Analysis and Forecasting/18th Conf. on Numerical Weather Prediction, Salt Lake City, UT, Amer. Meteor. Soc., 3B, http://ams.confex.com/ams/pdfpapers/124587.pdf.
Yang, J., M. Astitha, and C. S. Schwartz, 2017: Improvement of storm forecasts using gridded Bayesian linear regression for northeast United States. Fall Meeting of the American Geophysical Union, New Orleans, LA, Amer. Geophys. Union, NG24A-04, https://agu.confex.com/agu/fm17/meetingapp.cgi/Paper/272811.
Zhang, X., E. N. Anagnostou, and C. S. Schwartz, 2018: NWP-based adjustment of IMERG precipitation for flood-inducing complex terrain storms: Evaluation over CONUS. Remote Sens., 10, 642, https://doi.org /10.3390/rsl0040642.
Zhou, X., Y. Zhu, D. Hou, Y. Luo, J. Peng, and D. Wobus, 2017: Performance of the new NCEP Global Ensemble Forecast System in a parallel experiment. Wea. Forecasting, 32, 1989-2004, https://doi.org/10.1175/WAF-D-17-0023.1.
(1) Even in equally likely ensembles, on a day-to-day basis, individual member performance is unequal; in a given forecast, one member will always be closest to what actually occurred. Yet, on average, over many cases, all members in an equally likely ensemble should be closest to truth an equal number of times.
(2) The resolution at which convective parameterization can be removed varies geographically. Over the United States, it is ally safe to remove convective parameterization with horizontal grid spacings of 4 km or less (e.g., Kain et al. 2008).
(3) Figure 16 in Gowan et al. (2018) objectively illustrates NCAR ensemble conformity to equal likelihood.
(4) The interface to NCAR ensemble products can be accessed from https://doi.org/l0.5065/D68C9TZ3.
RELATED ARTICLE: CONTINUOUSLY CYCLING ENKF DA OVER A LIMITED-AREA DOMAIN.
The first step of performing limited-area (regional) continuously cycling EnKF DA is to produce an initial ensemble composed of N members. This initial N-member ensemble can originate from downscaling an external (e.g., global) ensemble or be produced by adding correlated random noise to a deterministic external analysis (e.g., Barker 2005); initial random perturbations are not flow dependent (e.g., Fig. 1d). Fortunately, for long-term continuous cycling, details of initial ensemble construction are unimportant, as the ensemble will move away from the randomly produced or downscaled perturbations toward the climatology of the model "attractor" and develop flow-dependent perturbations (e.g., Fig. 1c) through assimilation of observations.
This initial ensemble then serves as the "background" or "prior" ensemble at time T. Then, EnKF DA updates the prior ensemble by assimilating observations to produce an "analysis" or "posterior" ensemble, also valid at T. This posterior ensemble serves as a set of initial conditions for a short-term ensemble forecast that is integrated for DT hours, where DT is typically [less than or equal to] 6 h (Fig. SBI a). The ensemble valid at T + DT then becomes the background ensemble for another EnKF DA step, where the background ensemble at T + DT is updated by observations to produce an analysis ensemble at T + DT. This analysis ensemble at T + DT then serves as a set of initial conditions for another DT-h forecast valid at T + 2DT, and so on. In addition to initializing DT-h forecasts for DA purposes, analysis ensembles can initialize longer "free" forecasts of length > DT for purposes of ensemble prediction (Fig. SBIb). Thus, continuously cycling EnKF DA seamlessly melds ensemble DA and ensemble forecasting and relegates the role of external models to solely providing boundary conditions.
In the context of the NCAR ensemble project, DT = 6 h, meaning ensembles were integrated for 6 h to advance the model state between analysis times. The randomly constructed initial prior ensemble was produced by adding random, correlated, Gaussian noise to the 1200 UTC 19 March 2015 GFS analysis, and a new analysis was produced every 6 h until 0000 UTC 30 December 2017 (4,067 total DA cycles). Beginning 7 April 2015, each 0000 UTC analysis initialized 10-member 48-h ensemble forecasts. Thus, 0000 UTC analysis ensembles served two purposes: to initialize both 6-h forecasts for the next DA cycle at 0600 UTC and 48-h forecasts (Fig. SB1b).
RELATED ARTICLE: THE NCAR ENSEMBLE IN NWS AREA FORECAST DISCUSSIONS.
NWSFOs across the country (Fig. SB2) regularly cited the NCAR ensemble in area forecast discussions (AFDs), in which NWS forecasters describe NWP model guidance and their forecast reasoning. Although formal surveys have not been performed regarding operational use of the NCAR ensemble, AFDs can provide insights regarding forecaster enthusiasm for the NCAR ensemble and how its output was used in the forecasting process.
The following AFD, issued by the La Crosse, Wisconsin, NWSFO at 2026 UTC 27 July 2015, shows how the NCAR ensemble was used as the basis for forecasting thunderstorm redevelopment ahead of a cold front:
One of my favorite guidance choices is the 10-member hi-res NCAR ensemble forecast system. This data shows over an 80 percent probability of 2000 J/kg of MLCAPE ahead of the cold front along 1-35 at 00z. So ... we are forecasting storm redevelopment.
This next AFD, issued by the Buffalo, New York, NWSFO at 0923 UTC 13 January 2016 reveals specific NCAR ensemble products that were used to help the forecaster upgrade a watch to a warning. Additionally, this AFD reveals that the NCAR ensemble was used in conjunction with other high-resolution NWP models, which was common:
The 00z NCAR ensembles indicate some variability in potential snowfall amounts across northern Erie county tonight with 00z-12z 12-hr ensemble means showing 3-6 inches and maxes of 10-12 inches. Neighbor[hood] probs of > 6 in of snowfall downwind of the lake across Buffalo run above 90 percent. The combination of the output from the NCAR ensembles along with 00z high res 1.33 km NAM and 00z SSEO 12-hr snowfall of 6-8 in has increased confidence this morning to upgrade the lake effect snow watch to a warning.
The next AFD, issued by the Phoenix, Arizona, NWSFO at 0857 UTC Wednesday 12 August 2015, indicates how the NCAR ensemble helped corroborate conceptual models, leading to an increased probability of precipitation (PoP) forecast:
As [is] always the case this time of year ... it's hard to have much stock in any specific model solution. In examining a number of global models ... hi-res models ... and some of the hi-res ensembles ... solutions are all over the map. The only ensemble that appears to be relatively on track is the NCAR ensemble which correctly depicted the evening storms around the Phoenix area. It also correctly depicts the Guaymas inverted trough and suggests storms will develop mainly west of Phoenix today all the way to the coastal mountains of southeast California. Conceptually this makes sense since these areas didn't see much in the way of storms on Tuesday ... their boundary layer hasn't been worked over ... and the ascent associated with this trough is making a beeline for the lower Colorado River Valley. PoPs have been increased west of Phoenix and I have also gone ahead and introduced a mention of blowing dust.
Finally, this discussion, issued by the Jackson, Mississippi, NWSFO at 0956 UTC 2 May 2017 indicates how the NCAR ensemble suggested considerable forecast uncertainty, which was then conveyed by the forecaster:
The 00z NCAR ensemble illustrates pretty well a few plausible scenarios that offer a wide range of impacts for the forecast area, and it makes me hesitate to commit too much to highlighting a more significant severe weather or heavy rain threat.... Overall, there are a wide range of possible solutions offered by this dynamic set-up and folks should be aware of the potential for significant severe weather if low level moisture return is greater than expected.
Caption: Fig. 1. (a) Schematic of clustering in a 10-member ensemble composed of members with two different dynamic cores. Time increases to the right and magnitude of a meteorological variable increases up the y axis. With time, the forecasts diverge and cluster based on dynamics, such that the distributions of members with dynamic core I (red) and dynamic core 2 (blue) have no overlap. The ensemble mean (black) lies in the middle of the distributions and does not resemble any single member, which is undesirable. See Fig. 16 in Gowan et al. (2018) for a similar example, but with real data, (b) Ensemble mean and (c),(d) standard deviation of near-surface temperature ([degrees]C) for 10-member ensembles with identical means, but (c) flow-dependent and (d) randomly produced IC perturbations. Ensemble mean sea level pressure (hPa; solid white lines contoured every 4 hPa) are overlaid and ensemble mean 10-m winds (kt; I kt = 0.51 m [s.sup.-1]; barbs) are shown in (b). The flow-dependent ensemble depicts more spread near a low pressure center in South Dakota and along a front (red solid lines), indicating uncertainty regarding placement of these features. Conversely, the randomly produced ensemble does not "see" the storm system and has random spread patterns lacking association between regions of enhanced spread and strong gradients, including a local spread minimum near the low pressure center.
Caption: Fig. 2. (a) Snapshot of observations assimilated during the 0000 UTC 7 May 2017 analysis, (b) NC AR ensemble computational domain. The analysis component only ran over the 15-km domain, while the forecast component ran in a nested configuration with a 3-km domain embedded within the 15-km parent domain. The 3-km ensemble forecasts were the end result of the NCAR ensemble system.
Caption: Fig. 3. For the 24-h forecast initialized at 0000 UTC 18 May 2017 and valid at 0000 UTC 19 May 2017, (a) "paintball plot" showing areas with simulated composite reflectivity [greater than or equal to] 40 dBZ for each ensemble member overlaid with observed composite reflectivity [greater than or equal to] 40 dBZ from Multi-Radar Multi-Sensor (MRMS) system (Smith et al. 2016) observations (black shading) and (b) probabilities of 2-5-km UH [greater than or equal to] 100 [m.sup.2] [s.sup.- 2] within 40 km of each point and ensemble mean 10-m winds (kt; barbs). The probabilities in (b) were smoothed with a Gaussian filter using a length scale of 6 km.
Caption: Fig. 4. Ensemble plume diagrams at Denver International Airport, Colorado, for the forecast initialized at 0000 UTC 18 May 2017, showing (a) individual member forecasts of 2-m temperature ([degrees]F; red lines) and ensemble mean 10-m wind speed and direction (kt; barbs) and (b) dominant precipitation type (green and blue dots, indicating dominant precipitation types of rain and snow, respectively) and running-total accumulated precipitation (in.; I in. = 2.54 cm) from each ensemble member (green lines). Thick black lines represent the ensemble mean, and boxplots show the 10th, 25th, 75th, and 90th percentiles of the ensemble distribution. Time is on the x axis and increases to the right.
Caption: Fig. 5. Example of the storm object viewer for the forecast initialized at 0000 UTC 18 May 2017. Individual storm objects (colored dots) were identified by regions of column-integrated graupel > 0.25 in. (0.635 cm) and dot size varies directly with 2-5-km UH magnitude. In this example, the storm objects were further filtered to show only those storms with hail [greater than or equal to] 2 in. (5.08 cm) and I-km AGL vertical vorticity [greater than or equal to] 0.013 [s.sup.-1]. Storm objects were colored by ensemble member, and the scatterplot shows within-storm 2-5-km UH and 0-3-km UH ([m.sup.2][s.sup.-2]) for all 144 filtered objects. The forecast hours atop the page and member numbers along the left were shaded by number of forecast objects, with darker shading indicating more objects for that forecast hour or member. Menus with available fields and filters were provided below the scatterplot. Hovering the mouse over a particular storm (large red dot with white outline) immediately reveals its properties. The storm selected here corresponded to member 6 and hour 24 and had moderately high 2-5-km and 0-3-km UH (see corresponding red dot in the scatterplot).
Caption: Fig. 6. Probabilities of the union of hourly maximum 2-5-km UH [greater than or equal to] 75 [m.sup.2] [s.sup.-2], surface wind [greater than or equal to] 25 m [s.sup.-1], and hail [greater than or equal to] 2.54 cm within ~40 km of a point accumulated over 24-h periods beginning at 1200 UTC (a) 13 Jul 2015, (b) 23 Dec 2015, (c) 26 Apr 2016, and (d) 28 Feb 2017. The probability at a specific point is nonzero if, within -40 km of the point, any ensemble member met at least one of the following conditions at any time within the 24-h period: hourly maximum 2-5-km UH [greater than or equal to] 75 [m.sup.2][s.sup.-2], surface wind [greater than or equal to] 25 m [s.sup.-1], or hail [greater than or equal to] 2.54 cm. The probabilities were smoothed using a Gaussian filter with a 120-km length scale. Probability contours were chosen to roughly match those used in SPC convective outlooks. NWS severe weather warning polygons valid over the corresponding 24-h periods are overlaid.
Caption: Fig. 8. Ensemble mean analysis increments of 2-m temperature ([degrees]C) averaged over all NCAR ensemble analyses between (a) 0000 UTC 15 Dec 2016 and 1800 UTC 14 Mar 2017 (winter), (b) 0000 UTC 15 Mar and 1800 UTC I4jun 2017 (spring), (c) 0000 UTC 15 jun and 1800 UTC 14 Sep 2017 (summer), and (d) 0000 UTC 15 Sep and 1800 UTC 14 Dec 2017 (autumn). The horizontal grid spacing of the analysis increments was 15 km, and analyses were produced every 6 h. Areas of negative increments suggest warm 2-m temperature biases in the backgrounds, while positive increments suggest cold 2-m temperature biases in the backgrounds.
Caption: Fig. 9. Schematic diagram of applying a neighborhood approach to a hypothetical three-member ensemble to produce NEPs. (a)-(c) Individual ensemble member forecasts of event occurrence, where forecast events have occurred in the gray-filled boxes, (d) Point-based probabilities were obtained by determining the fraction of members that forecast the event at each grid box. (e) Finally, NEPs were produced by averaging the point-based probabilities within the neighborhood of each point. In this example, the neighborhood is a circle with radius 1.7 times the horizontal grid spacing, as depicted in (d), where the red circle defines the neighborhood about a central grid point outlined in blue and those grid points falling within the circular neighborhood are in boldface. Similar neighborhoods exist for all grid points with NEP values in (e); note that NEPs could not be determined for points in the outer ring, since the neighborhoods about these points fall outside the grid.
Caption: Fig. 10. FSSs based on l-h accumulated precipitation computed with (a)-(c) 50- and (d)-(f) 100-km neighborhood length scales as a function of forecast hour for thresholds of (a),(d) 0.25, (b),(e) 1.0, and (c),(f) 5.0 mm [h.sup.- 1]. Aggregate statistics were computed separately for spring (15 Mar-14 Jun), summer (15 Jun-14 Sep), autumn (15 Sep-14 Dec), and winter (15 Dec-14 Mar) over the entire NCAR ensemble dataset (999 forecasts).
Caption: Fig. 11. FSSs as a function of neighborhood length scale (km) based on I-h accumulated precipitation aggregated over (a)-(c) 24- and (d)-(f) 48-h forecasts for (a),(d) 0.25, (b),(e) 1.0, and (c),(f) 5.0 mm [h.sup.-1] thresholds. Aggregate statistics were computed separately for spring (15 Mar-14 Jun), summer (15 Jun-14 Sep), autumn (15 Sep-14 Dec), and winter (15 Dec-14 Mar) over the entire NCAR ensemble dataset (999 forecasts). Values of [FSS.sub.useful] for each season are shown by horizontal dashed lines. Error bars denote bounds of 90% confidence intervals obtained with a bootstrap resampling technique with 1,000 resamples. Note that a neighborhood length scale of 0 corresponds to point-based probabilities (e.g., Fig. 9d), while neighborhood length scales > 0 indicate NEPs (e.g., Fig. 9e) that incorporate spatial uncertainty.
Caption: Fig. 12. Reliability diagrams for precipitation thresholds of (a),(d) 0.25, (b),(e) 1.0, and (c),(f) 5.0 mm [h.sup.-1] based on 24-h forecasts of l-h accumulated precipitation initialized during (a)-(c) spring and summer and (d)-(f) autumn and winter over the entire NCAR ensemble dataset (999 forecasts). Diagonal lines indicate perfect reliability, and different reliability curves were computed with different neighborhood length scales. Open circles represent forecast frequencies (%) within each probability bin for each reliability curve; many bins had forecast frequencies [much less than] 1%. Error bars denote bounds of 90% confidence intervals obtained with a bootstrap resampling technique with 1,000 resamples. Values were not plotted for a particular bin if fewer than 1,000 grid points had forecast probabilities in that bin over all forecasts. The dashed lines in (b) refer to an example in the text.
Caption: Fig. SB1. (a) Schematic diagram of a continuously cycling EnKF DA system based on the NCAR ensemble analysis system with 80 members and a 6-h cycle period. In the EnKF, an ensemble of backgrounds is combined with observations to produce an ensemble of analyses. The analyses then initialize ensembles of 6-h forecasts that become backgrounds for another DA cycle 6 h later, (b) Illustration of how at certain (or all) analysis times, EnKF analysis ensembles can initialize longer-term ensemble forecasts in addition to short-term forecasts needed for DA purposes.
Caption: Fig. SB2. Map of NWSFOs over the CONUS identified by their three-letter call signs. Outlines of the area of responsibility for each NWSFO are shown in black, and gray lines are state boundaries. Those NWSFOs shaded green mentioned the NCAR ensemble in an AFD at least once. Note that NWSFOs may have used the NCAR ensemble even if they did not mention it in an AFD.
TABLE 1. Desirable properties of ensemble prediction systems that can be achieved by design considerations. Using a combination of continuously cycling data assimilation and an equally likely ensemble achieves most of these properties. Although most operational global ensembles are designed to possess all these properties, real-time CAEs historically have not been designed to possess them. There are other desirable properties of ensemble prediction systems related to forecast quality and value (e.g., Murphy 1993) that all ensembles should strive to possess, regardless of how they are designed. See the "Continuously cycling EnKF DA" sidebar for more information. Desirable property How to achieve Comments and examples Easy to maintain Equal likelihood Easier to maintain, improve, and upgrade one model configuration than several. Easy to Equal likelihood No need to weight members postprocess differently in an equally likely ensemble, as, over many cases, each member is equally likely to be closest to "truth." In an unequally likely ensemble, some members are systematically closer to truth than others and should be preferentially weighted when computing averages and probabilities. Forecasts have Equal likelihood Say 7 of 10 ensemble simple members forecast an event. probabilistic If the members are equally interpretations likely, the probability of event occurrence is 70%. If the members are unequally likely, probability of event occurrence is not necessarily 70%. Enable Equal likelihood; Easier to identify biases identification continuously when working with a single of systematic cycling data model configuration than biases assimilation several. Continuously cycling data assimilation means IC and forecast model biases are inter- twined, enabling easier identification of biases compared to ensemble forecasts initialized from unrelated, external analyses with their own potentially different biases. Accounts for IC diversity Failing to account for all known error and inclusion known error sources risks sources of model error ensemble forecasts with representation insufficient spread. schemes; if a limited--area ensemble, also lateral boundary condition diversity IC perturbations Continuously The forecast model itself reflect forecast cycling data is used to create ICs, model assimilation which avoids mismatches configurations between the forecast model and ICs. Requires a continuously cycling data assimilation system with the same resolution, physics, and dynamics as the forecast model. IC perturbations Continuously IC perturbations should are flow cycling data temporally and spatially dependent assimilation evolve with atmospheric flow to yield more spread in areas of greater uncertainty. Randomly produced IC perturbations fail to do so. Did the NCAR Desirable ensemble possess property the property? Easy to maintain Yes Easy to Yes postprocess Forecasts have simple Yes probabilistic interpretations Enable Yes identification of systematic biases Accounts for Partly: initial and known error lateral boundary sources condition diversity but no model error representation scheme IC perturbations reflect forecast No: 15-km ICs used model to initialize a 3-km configurations forecast model IC perturbations Yes are flow dependent Fig. 7. Responses to informal survey questions regarding impressions of NCAR ensemble performance and most valued aspects of the NCAR ensemble. The survey was posted online in late 2017, and data were collected from 137 volunteer respondents. Specific questions asked about (a) aspects of the ensemble that were most valued, (b) how frequently the NCAR ensemble was used, (c) perceptions of forecast skill, and (d) perceptions of forecast spread. (a) Degree to which certain aspects of the NCAR ensemble were valued Not Neutral Most important important GRID data distribution 25% 21% 12% 16% 10% 9% 7% Widgets 5% 8% 8% 27% 19% 31% Probability guidance 2% 8% 31% 54% Website display 4% 15% 26% 52% Convection -allowing 5% 13% 22% 55% Note: Table made from bar graph. (b) Frequency of NCAR ensemble use Daily (39%) At least once per week (36%) Only during big weather events (20%) Convective events (1%) Other 4% Note: Table made from pie chart. (c) Perception of forecast skill 1 0%(0) 2 0%(0) 3 0.7%(1) 4 6%(8) 5 40%(55) 6 40%(55) 7 13%(18) Note: Table made from bar graph. (d) Perception of forecast spread 1 0%(0) 2 2%(3) 3 30%(41) 4 46%(63) 5 15%(20) 6 6%(8) 7 1%(2) Note: Table made from bar graph.
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||U.S. National Center for Atmospheric Research|
|Author:||Schwartz, Craig S.; Romine, Glen S.; Sobash, Ryan A.; Fossell, Kathryn R.; Weisman, Morris L.|
|Publication:||Bulletin of the American Meteorological Society|
|Date:||Feb 1, 2019|
|Previous Article:||PARTNERING RESEARCH, EDUCATION, AND OPERATIONS VIA A COOL SEASON SEVERE WEATHER SOUNDINGS PROGRAM: Projects that partner research, education, and...|
|Next Article:||BE PART OF THE SOCIETY'S CENTENNIAL CELEBRATION.|