# Identifying and treating outliers in finance.

1 | INTRODUCTION

Outliers represent a persistent concern in the empirical finance research. We are all aware that outliers, or observations that deviate markedly from the data, potentially lead to biased coefficient estimates in least-square regressions (Edgeworth, 1887). (1) Researchers often seek to identify these potential outliers by examining descriptive statistics regarding the variables of interest (Dittmar & Duchin, 2016) effectively examining observations three standard deviations from the mean. After identifying these influential observations, the econometrician typically relies on mitigation techniques to remedy this outlier problem (Henry & Koski, 2017). A review of recent articles that identify outliers in prominent finance journals indicates that almost all studies rely on univariate identification. (2) Table 1 indicates that the vast majority of these studies winsorize the data or perform some sort of listwise deletion. Yet, this identification and treatment of outliers implicitly relies on outliers arising in a univariate context.

We ask a fundamental, but simple question. Are the techniques we commonly use in finance to identify and treat outliers appropriate for the data structures we observe in practice? Many of the methods to identify and treat outliers, such as winsorizing, trimming, or dropping the affected observations, arose in a period with limited data sets and computer power. By necessity, these methods focused on identifying and treating outliers in a univariate setting, but studies in finance almost always require multivariate analysis. A simple example provides a useful illustration. Table 2 displays a small data set, where descriptive statistics indicate that none of the observations contain univariate outliers. Yet, two of the observations include outliers in a multivariate setting, dramatically influencing the coefficient estimates in an ordinary least square (OLS) regression framework. (3) Intuitively, if multivariate (i.e., regression) outliers arise in a nonrandom fashion, trimming and dropping potentially introduces sample selection problems and biased coefficient estimates (Heckman, 1979). Table 2 demonstrates that neither winsorizing nor trimming mitigates the influence of the multivariate outliers. Instead, these univariate outlier mitigation strategies actually exacerbate the multivariate outlier problem (our example is consistent with Bollinger & Chandra, 2005). The example provided in Table 2 clearly demonstrates that despite being the best linear unbiased estimator of the conditional expectation function from a purely statistical standpoint, naively using OLS can lead to incorrect economic inferences when there are multivariate outliers in the data.

Outliers arise in a variety of ways including data errors, variable construction, omitted variables, sampling errors, nonnormality, or chance. Outliers can also be the most important data in a sample when they reflect some unusual fact that will lead to an improvement in economic theory or model specification (Zellner, 1981). Therefore, identifying multivariate outliers is a key step in evaluating their impact in empirical finance research. Traditional methods, such as studentized residuals or Cook's D, while simple to implement and easy to evaluate, suffer from a masking problem that occurs when specifying too few outliers in the test. For example, if we are testing for a single outlier when there are, in fact, two (or more) outliers, these additional outliers may influence the value of the test statistic enough so that no points are identified as outliers. Traditional methods also rely heavily on their assumptions of normality. In this paper, we propose an identification strategy using outlier robust estimation as in Rousseeuw and van Zomeren (1990). We find this method effectively identifies the outliers and tests for their influence. We further propose an outlier robust regression approach that minimizes the bias outliers caused in both cross-sectional and panel regressions. The objective is to provide an outlier robust estimator that is as efficient as OLS when the data are normally distributed, but are more efficient than OLS and unbiased when the data contain outliers.

Yet, there is a cost to our approach. In particular, it is computationally intensive. We use a combination of base robust estimators, specifically M-estimators, and S-estimators, which are termed MM-estimators as described in Yohai (1987). The MM-estimators combine a high breakdown point, or the largest percentage of outliers in a sample that the estimator can handle without producing arbitrary results, with relatively high efficiency when compared to OLS. To the best of our knowledge, there are no readily available procedures that compute MM-estimators or any other high breakdown point estimators with clustered standard errors. We rely on the theory of the generalized method of moments to calculate these clustered standard errors (Croux, Dhaene, & Hoorelbeke, 2008).

Researchers often need to control for group-specific fixed effects using binary variables. However, there are no outlier robust regression methods available that can account for a large number of fixed effects. We address this issue by presenting a new method that draws on insights from Aquaro and Cizek (2013). We conduct simulations to show this method effectively mitigates outlier bias in models that use fixed effects and/or clustered standard errors. Our analysis demonstrates that outliers in continuous variables can cause biases in the OLS coefficients of binary variables. We also find that outlier bias in binary variables increases when continuous independent variables are correlated, a common occurrence in finance samples. These findings lead to a concern that the results of studies that use binary variables suffer from outlier-induced biases. Our outlier robust estimators can be used as a diagnostic tool yielding different results than OLS when influential outliers are present.

Once identified, the origins of the outliers can be determined. The decision as to how, or even whether it is appropriate, to mitigate the influence of multivariate outliers depends upon their cause and economic theory. For example, in the case of data entry errors, the ideal mitigation strategy is dropping or correcting errant observations. Modern financial data sets are often large making it impractical or impossible to manually search for outliers at the observational level. Our approach effectively identifies the most influential outliers that change coefficient estimates. Since influential outliers often represent only a small fraction of the total observations, researchers can minimize manual examination and data cleaning costs using the proposed approach.

Of course, not all influential observations are the result of data errors. Some observations are informative in the sense they represent omitted variables or interesting phenomena not previously considered. For these informative and influential outliers, modeling enhancements (e.g., additional control variables) can be effective in mitigating the bias they cause in the estimated coefficients of the main variables of interest. Economic theory can help guide any further mitigation efforts. When the hypothesized relation between the dependent and independent variable is a general effect and not an outlier effect (i.e., driven by rarely occurring events or circumstances), outlier bias can be mitigated by either dropping the most extreme outliers or employing outlier robust regressions that place less weight on extreme observations than OLS does. For example, when theory suggests a general effect, influential outliers comprising a small fraction of the sample should not drive the empirical results.

However, these mitigation approaches are not appropriate when outliers, as tail risk events, are the most informative observations as removing the largest manifestation of an effect can make it appear insignificant when it is not. For example, when examining the impact of low probability economic disasters on equity premiums as in Rietz (1988), Barro (2006), and Welch (2016), naively dropping the most influential outliers would lead to incorrect inferences. (4) In these cases, outlier mitigation should be limited to removing or correcting data errors and improving model specifications to account for any omitted variables.

Our recommended approach to multivariate outliers is comprised of five steps: 1) test for the presence of multivariate outliers since they are suggestive of bad data (e.g., data entry errors, sampling errors, and omitted variables), 2) identify outliers robustly in a multivariate context, 3) carefully consider and examine the nature and origin of the outliers, 4) correct data and omitted variables errors, and 5) consider the nature of the research question and economic theory to determine whether to mitigate further by dropping the influential observations in the OLS regressions or by employing outlier robust estimators. While outliers potentially influence statistical and economic inferences, they may not systemically affect the results in finance research. We examine this possibility by replicating four recently published papers using our outlier robust estimator as a diagnostic tool. For tractability, we concentrate our analysis to two main areas in finance (i.e., corporate finance and investments) with relatively large data sets to minimize concerns about sample sizes. We identify two published articles in premier finance journals via a formal screening process that biases against finding outlier problems and where the authors made their data sets and code available. We find the estimated coefficients for the primary variables and covariates change in terms of statistical significance and economic importance after mitigating the influence of outliers. We also collect data on our own in order to evaluate two additional articles where the authors do not disclose this information, again finding evidence of multivariate outliers. We provide illustrations of robust identification and implementation across a variety of empirical settings demonstrating drastic changes in the magnitude and signs of the coefficient estimates of interest. In short, this initial analysis suggests that the common methods used to identify outliers in finance can lead to distinct distortions in empirical analyses. The results in the studies that we formally investigate, an admittedly small sample, appear to stem from multivariate outliers in the data.

Our paper makes two main contributions. First, this is the only research that proposes a comprehensive multivariate outlier identification strategy, and demonstrates how to effectively detect outliers, test for their influence, and mitigate (when appropriate) the bias they cause in both cross-sectional and panel regressions. Our approach also provides outlier robust clustered standard errors and is capable of handling the large numbers of fixed effects common in finance models. We empirically test this approach using four replications from recently published papers in premier finance journals and show how adjusting for outliers can lead to different results. Current published work that incorporates replications highlights the importance of carefully examining and treating outliers. We argue that a comprehensive approach to addressing outliers improves internal and external validity thereby reducing the need for further investigations. (5)

In addition, we contribute to a growing literature on improving identification and estimation techniques in finance research. Angrist and Pischke (2010) argue that innovations in econometric identification techniques (e.g., unobserved heterogeneity, endogeneity, clustered standard errors in panel data, measurement errors, and natural experiments) represent a credibility revolution that enhances the ability of researchers to make accurate statements. Bowen, Fresard, and Taillard (2017) argue that researchers should quickly adopt useful econometric innovations, but various frictions hinder their diffusion. In the case of robust outlier technology, the frictions center on the lack of widely known and readily available methods to identify, test, and treat outliers. These frictions are coupled with the common belief that the standard outlier mitigation techniques, such as winsorizing and trimming, provide protection against these extreme observations. I n this paper, we attempt to al leviate these frictions.

2 | UNIVARIATE VERSUS MULTIVARIATE OUTLIERS

2.1 | Limitations of univariate outlier treatments

The most common outlier treatments in finance are winsorizing, trimming, and dropping (referred to hereafter as WTD). For example, of the 3,572 studies published in the top 4 finance journals from 2008 to 2017, only 999 (or 28%) mention outliers. Of the 717 studies that utilize OLS regression and mention outliers, the large majority use winsorizing (52%), trimming (16%), or dropping (17%). Researchers often argue these methods are effective or that outliers do not unduly influence their results. The following examples are excerpts from the top four finance journals over the study period. Authors note, "to prevent outliers from influencing the analysis all variables are winsorized" and "the dependent variable was winsorized to remove the effects of extreme outliers" and "to avoid potential problems with outliers all variables are winsorized." Similar language is also common when using trimming, dropping, or other techniques.

The main concern with WTD is that these are univariate attempts to correct a multivariate problem. In estimating a multivariate regression, observations that may not appear extreme in a single variable can exert outsized influence on the overall model. Accordingly, WTD can only be expected to reliably mitigate outlier-induced bias in univariate descriptive statistics. Since most empirical work in finance utilizes multivariate analysis, these outlier mitigation techniques are not consistently effective.

Univariate outlier treatments can also alter the data. For instance, across the distribution of a variable of interest, winsorizing requires identifying the h smallest and largest values and replacing them with the next smallest or largest values, where h is an integer or percentage. In a data panel with observations in rows and variables of interest in columns, the extreme value(s) within a column will be determined relative to the other column values and the researcher will alter both the highest and lowest amounts to what is deemed more reasonable. An issue with this procedure is that the setting of h is arbitrary. Also, replacing a column value for a particular observation with the value of another observation changes the information contained in that observation. Further, by identifying and targeting extreme values within a single column, trimming and dropping are similar to winsorizing, but instead of replacing the value of a column attribute, these techniques remove the entire row observation. And, while the observation is not maintained as a transformed record, trimming and dropping remove observations that may be well behaved in all other columns and can provide valuable insight into the data generating process. In fact, outliers as influential observations can be the most important data in a sample (Belsley, Kuh, & Welsch, 1980).

Another issue with WTD is that researchers continue to use OLS on the altered sample. Trimmed or winsorized least squares estimates can still be influenced (potentially greatly) by one remaining outlier. While employing a median rather than a mean estimator on the altered sample can provide more robustness, in simulations, we find that median regressions do not reliably mitigate the influence of multivariate outliers. The results of the simulations are provided by the authors upon request.

In addition, WTD can actually introduce new sample problems. Bollinger and Chandra (2005) and Verardi and Wagner (2011) find that trimming can lead to biased coefficient estimates even in samples without outliers. Moreover, arbitrarily winsorizing or removing observations with large values creates a sample selection problem (Heckman, 1979). The extreme values do not occur by accident. Rather, they arise from an underlying data generating process and removing them can introduce a new bias in parameter estimates.

2.2 | Univariate identification and treatment of multivariate outliers: An illustration

Panel A of Table 2 presents data from two small illustrative data sets. The first data set, labeled the "No Outlier Sample," features 20 simulated observations where the dependent variable, Y, equals 0.5 x 1 + 0.5 x 2 + a random error term. Independent variables XI and X2 are randomly generated with values in the range of 1 to 20. The second data set, labeled the "Multivariate Outlier Sample," is identical to the first except we replace the independent variables in Observation Number 18 and the dependent variable in Observation Number 19 with smaller values. These replacements represent multivariate outliers with dependent variable values that are larger (Observation Number 18) and smaller (Observation Number 19), relative to their independent variable values, than the remaining observations.

Similar to Anscombe (1973), Table 2 demonstrates that descriptive statistics cannot reliably identify outliers. (6) The mean and median values of the two samples are essentially the same and the standard deviation values are identical. More importantly, the "No Outlier Sample" and the "Multivariate Outlier Sample" exhibit the same minimum and maximum values. Univariate identification essentially entails some selection of the largest or smallest values of each variable. However, in this example, the multivariate outliers do not have extreme dependent or independent variable values and, as such, are not identified as outliers.

Panel B of Table 2 reports the effects of univariate outlier mitigation and multivariate outliers on regression estimates. The estimated coefficients for the "No Outlier Sample" in Column 1 are reasonably close to their expected values and the adjusted [R.sup.2] is high. Winsorizing and trimming, univariate outlier mitigation techniques, have minimal impact on the regression estimates for the "No Outlier Sample" (Columns 2 and 3). Column 4 reports the results for the "Multivariate Outlier Sample." The regression coefficient estimates are far from their expected values and the adjusted [R.sup.2] is low. Columns 5 and 6 indicate that winsorizing and trimming do not effectively mitigate the influence of the multivariate outliers and actually appear to make the estimation worse. Column 7 illustrates the importance of identifying multivariate outliers. After removing the multivariate outliers, the regression estimates are similar to those obtained for the "No Outlier Sample" in Column 1.

2.3 | Multivariate outlier identification

Empirical findings and conclusions can vary based upon the type of outliers. Figure 1 illustrates the three multivariate outlier types: 1) vertical, 2) good leverage, and 3) bad leverage. A vertical outlier is an observation outlying in the dependent variable dimension, but not outlying in the independent variable space. A good leverage point is an extreme observation outlying in the independent variable space, but located near the regression line. When good leverage points are very close to the regression line, they marginally affect parameter estimation, but they can affect statistical inference by deflating standard error estimates. A bad leverage point is an observation that is outlying in the independent variable space and located far from the true regression line. Bad leverage points often significantly affect the estimation of both the intercept and the slope. Since the difference between good and bad leverage is a matter of degree, we focus on bad leverage points. (7)

The first step in assessing outliers is identification. Dehon, Gassner, and Verardi (2012) follow the logic of Hausman (1978) and develop a procedure that compares estimates from outlier robust estimators and OLS. Outlier robust estimation attempts to utilize all available data, but minimize the effect of extreme observations (see Appendix A for a primer on robust estimators). When the test fails to reject, the more efficient OLS is the best estimator. (8) After testing for the presence of influential observations, the researcher should next explicitly identify extreme values to check for correctness and gain further insight into the data generating process. Researchers have previously used leverage, studentized residuals, DFBETAs, DFFITS, Cook's D, and partial regression plots as a way to identify outliers. The main limitation with these tests is their attempt to produce normal looking residuals even when the data are not normal (Rousseeuw & van Zomeren, 1990). Also, the popular Cook's D often suffers from a masking effect that occurs when a group of extreme values mask the impact of each other.

Rousseeuw and van Zomeren (1990) argue that a better identification method is to plot robust standardized residuals against robust distances. The standardized residuals are from an outlier robust estimation procedure, such as Sestimation. On the other axis is a measure of multivariate outlyingness. Multivariate outlyingness is defined as the Mahalanobis distance where the multivariate location vector and covariance matrix are estimated robustly. Specific observations merit special attention if they exceed common y- and x-dimension boundaries. Standard practice is to use y limits of [+ or -]2.25 to represent the values from the standard normal that separate the 2.5% most remote region from the central mass. For the x dimension, robust distances have high leverage if their magnitude is greater than 0.975 of the chi-squared distribution with degrees of freedom equal to the number of parameters in the model ([[chi].sup.2.sub.p,0.975]). We demonstrate the usefulness of the Rousseeuw and van Zomeren (1990) detection method in several of our empirical tests (see Section 4 and Figure 2).

3 | OUTLIER MITIGATION

3.1 | Outlier robust estimators

Regression analysis is the primary statistical method used in empirical finance research. When properly fitted, regression estimates provide a powerful summary of relations in the data. The goal of linear regression is to converge to the true values of the coefficients by minimizing a loss function on the residuals. When the assumptions of the classical linear regression model are violated, however, coefficients are estimated with error or bias, which can lead to spurious inferences. One of the main assumptions of linear regression is that estimation errors, or residuals, are distributed normally. In this case, OLS produces the best unbiased coefficient estimates with the smallest variance. Even when the residuals are not distributed normally, the OLS estimator is still the best linear unbiased estimator, a weaker condition indicating that among all linear unbiased estimators, OLS coefficient estimates have the smallest variance.

However, it is trivial to show that when a sample contains extreme observations, OLS becomes markedly inferior to outlier robust estimators. (9) The robustness of an estimator is the level of resistance to change that an estimator has to outliers (Andersen, 2008). When the residuals are not distributed normally, it is frequently possible to find robust estimators that are more efficient or produce estimates with smaller variances than OLS. Since the underlying error distribution is rarely known with certainty, robust estimation procedures that have attracted the greatest attention in the statistics literature are those that are concerned with finding estimators that are only slightly less efficient than OLS when the errors are normally distributed, but can be considerably less biased and more efficient in the presence of outliers in the data. The trade-off of efficiency and bias is critical to selecting the appropriate estimator.

Appendix A highlights the base parametric robust estimators: L-estimators, R-estimators, M-estimators, and S-estimators. Since each estimator has its trade-off, the most recent robust modeling uses combinations of the base robust estimators. The two primary estimators used are M-estimators and S-estimators, which are termed MM-estimators. To compute MM-estimators, Verardi and Croux (2009) program the algorithm of Salibian-Barrera and Yohai (2006) and use the iteratively reweighted OLS algorithm with the S-estimate as their initial value. The algorithm for computing the initial S-estimators begins by estimating regression parameters on randomly selected subsets. The intuition for multiple subsets is to obtain at least one subset without outliers and the final S-scale estimate is from the subset with the smallest scale. Once the S-estimator is estimated, the MM-estimator is computed via the iteratively reweighted OLS algorithm. MM-estimators combine a high breakdown point (BDP) of 50% with relatively high efficiency, 95% relative to OLS under the Gauss-Markov assumptions. (10)

To the best of our knowledge, there are no readily available procedures that compute MM-estimators or any other high BDP estimators with clustered standard errors. Relying on the theory of the generalized method of moments (GMM), these standard errors can be calculated (Croux et al., 2008). (11) Specifically, a preliminary S-estimator is fit with a given loss function [[rho].sub.0]. Parameters [beta] and [sigma] are estimated by [[??].sub.S;[rho]0] and [[??].sub.[rho]0]. Then, an M-estimator with loss function [rho] (with first derivative [psi]) that allows a higher efficiency is estimated with the scale parameter fixed to [[??].sub.[rho]0]. The functions [[rho].sub.0] and [psi], which are chosen by the statistician, are nonconstant, scalar-valued, and differentiable. Furthermore, [psi] is odd, [[rho].sub.0] is even and nondecreasing on [0, [infinity][with [[rho].sub.0] (0) = 0, and b is a selected constant that is usually chosen to be [E.sub.[PHI]][[[rho].sub.0](u)], where [PHI] denotes the standard normal distribution. In the MM-estimation procedure, the estimations are such that:

[mathematical expression not reproducible](1)

The first line of Equation (1) is the first-order condition of the final M-estimator, the second line is the first-order condition for the preliminary S-estimator, and the last line corresponds to the sample equivalent of Equation (A10) in Appendix A. The Tukey (1991) biweight function that has a smooth [psi] is used for both the preliminary S-estimator, as well as the final MM-estimator.[mathematical expression not reproducible] is the first-order equivalent with the generalized method of moment estimator for ([beta]', [[beta]'.sub.0], [[sigma].sub.[rho]0)'. Under usual GMM conditions:

[mathematical expression not reproducible]

The asymptotic variance of the MM-estimator is available from the standard GMM estimator (top left block of [V.sub.MM]). Thus, it is straightforward (Croux et al., 2008) to have a covariance matrix for the S- and the MM-estimator that is robust to heteroskedasticity, serial correlation, and/or clustered standard errors by relying on standard GMM theory.

The subsampling algorithm can result in collinear subsamples when there are multiple independent binary variables, which is a common occurrence in finance data that contain numerous binary variables to capture firm, year, and event effects. To remedy this issue, Maronna and Yohai (2000) introduce a MS-estimator that alternates an S-estimator and an M-estimator for continuous and binary variables, respectively, until convergence.

For the fixed effects panel data models in finance, Bramati and Croux (2007) recommend replacing the initial S-estimator with a MS-estimator in calculating MM-estimators. As stated by Aquaro and Cizek (2013), the problem with this method is that using the MS-estimator for panel data fixed effects estimations implies that the fixed effects must be explicitly estimated causing bias due to the nonlinearity of the procedure when the number of periods is fixed. Aquaro and Cizek (2013) argue the Bramati and Croux (2007) recommended method is consistent only if the number of time periods increases to infinity making them unsuitable for short panels. More practically, in replications and unreported simulations, we find that Bramati and Croux's (2007) method often does not converge when there are numerous fixed effects, an attribute of many finance models.

We address this issue by developing a procedure that computes MM-estimators with clustered standard errors and can handle a larger numbers of fixed effects relying on Aquaro and Cizek (2013). Consider a static linear fixed effect panel data model [y.sub.it] = [x'.sub.it] [beta] + [[alpha].sub.i] + [[epsilon].sub.it],i = 1,...,n and t = 1,..., [T.sub.max], where[y.sub.it] is the dependent variable, [x.sub.it] is the vector of the covariates, and [beta] is the vector of parameters of interest. Indices i and t index individuals and time, respectively. [T.sub.max] is the maximum value for the time index t. The unobservable term is a combination of the individual fixed effects [[alpha].sub.i] and the error term [[epsilon].sub.it]. Parameter [beta] can easily be estimated if the individual fixed effects are removed from the model equation. A simple way of estimating the parameters of interest is to apply the well-known first-difference transformation[DELTA][y.sub.it] = A[x'.sub.it][beta] + [DELTA][[epsilon].sub.it] and estimating the resulting model with a linear regression estimator. Subsequently, the researcher can use an S- and/or MM-estimator to mitigate outliers.

Alternatively, the researcher can obtain more accurate estimates by eliminating individual effects taking all pairwise differences within each individual (Aquaro & Cizek, 2013). This pairwise difference estimator transformation yields [[DELTA].sup.s] [y.sub.it] = [[DELTA].sup.s] [x'.sub.it][beta] +[[DELTA].sup.s][[epsilon].sub.it]. This transformation removes the individual-specific effect, but generates a larger sample size of nT(T- l)/2 instead of n(T- 1) as differences for alls = 1,..., t-1 are considered. To take into account the fact that individual specific observations are not independent, clustered standard errors have to be systematically considered at the individual level when running the linear regression estimator.

In practice, the idea is to plug [[DELTA].sup.s] [y.sub.it] into [y.sub.it] and [[DELTA].sup.s][x'.sub.it] into [x'.sub.it] in Equation (1) and estimate the parameters using GMM. Being a GMM estimator, the heteroskedasticity, clustering, and autocorrelation consistent estimator of the standard errors are readily available. The coefficients of interest will be [mathematical expression not reproducible] and their estimated covariance will be the upper left square of matrix [??].

Naturally, one might also want to cluster at an additional level than the individual one. As Thompson (2011) illustrates, computation of the two-way cluster-robust variance component estimation is straightforward. The two-way clustered variance can be calculated from V([??]) = [V.sub.1]([??]) + [V.sub.2]([??]) - [V.sub.12]([??]), where the three variance estimates are derived from one-way clustering on the first dimension, the second dimension (the additional dimension we are interested in), and their intersection, respectively.

Both the first difference and the pairwise difference transformations are linear transformations of the data. Thus, robust linear S- or MM-estimation applied to such transformed data does not lose its equivariance properties. The first difference estimator has a BDP of (T - 1)/4T for T [greater than or equal to] 3, where T is the number of periods. As such, first differencing approaches 25% BDP only for large T. In contrast, the BDP of the pairwise estimator is 25% for any T.

3.2 | A framework for handling outliers

Reproducibility is a fundamental requirement for empirical research. It is often impossible or difficult to replicate papers that do not carefully document how outliers are handled, Irreproducibility leads to questions about the validity of the research outcomes. For example, Adams, Hayunga, and Mansi (2018) find that outliers caused by data errors and comprising less than 2% of the original sample drive the Chen, Hong, Huang, and Kubik (2004) finding of mutual fund diseconomies of scale. Similarly, Adams, Hayunga, and Mansi (2018) revisit the nature of returns to scale following Pastor, Stambaugh, and Taylor (2015) and find that the documented negative relation between industry scale and return performance is an artifact of extreme observations that comprise less than 0.05% of the sample. Guthrie, Sokolowsky, and Wan (2012) report that two outliers out of a sample of 865 firms drive the Chhaochharia and Grinstein (2009) estimate on CEO pay decreases in noncompliant firms following the NYSE and NASDAQ revision of listing standards requiring majority independent boards.

We present a framework for handling outliers and improving empirical research replicability and robustness in Table 3. The primary focus in this paper is the identification and mitigation of multivariate outliers. Table 3 includes commentary for each recommendation. We call attention to Items 1 and 2 that warn against winsorizing and univariate trimming, Items 7 and 11 for documenting data decisions, Item 8 for advising a formal outlier test, and the decision to mitigate beyond data errors in Items 12 and 13. We employ this framework in the replications that follow. However, in the interest of brevity and because univariate identification is commonly used, we largely restrict our discussion to the multivariate outliers (i.e., Items 8 to 13 in Table 3).

4 | REPLICATIONS

While multivariate outliers have the potential to significantly influence OLS estimated coefficients and standard errors, it does not necessarily follow they unduly affect statistical inference in practice. Outliers may not be a concern when they are not very influential or occur infrequently. If, however, outliers are common and influential enough to bias coefficient estimates, they merit extra consideration.

This section is divided into two subsections. First, we evaluate the incidence, characteristics, and influence of outliers by replicating a study from corporate finance (Petersen, 2009) and another from asset pricing (Wahal & Yavuz, 2013). In both replications, we collect the data and conduct the empirical analyses ourselves (i.e., no code or data sets were provided by the authors of the two studies). In addition, we replicate two published papers where the authors have provided their code and data sets using outlier robust regressions as a diagnostic tool.

Appendix B provides the variable definitions, data sources, and outlier methods for the four studies. The STATA code we use to generate the figures and tables in the replications is available at https://uta.box.com/v/FinancialManagement. The codes are fully documented to provide guidance for future researchers. We also provide the data we use for the corporate finance (Petersen, 2009) and another from the asset pricing (Wahal & Yavuz, 2013) applications. A STATA package including MM-estimators, outlier diagnostics, and the identification methods we use are available by typing, "net from http://homepages.ulb.ac.be/~vverardi/stata" from within STATA.

4.1 | Incidence of articles with outlier mention

Table 1, Panel B, provides the number and percentage of articles published annually from 1988 to 2017 in the top four finance journals mentioning outliers using OLS, as well as the combination of outliers and OLS. The data are collected from the EBSCO database for JF, RFS, and JFQA and the Science Direct database for JFE using keyword searches. (12) Three issues are noteworthy. First, as a percentage, over time, research papers increasingly mention outliers (7% in 1988 vs. 32% in 2017). In addition, the majority of recently published papers use OLS. Finally, most papers using OLS fail to mention potential outlier bias.

4.2 | Outlier influence in corporate finance and asset pricing 4.2.1 | Petersen (2009) study: Corporate finance application

Our first replication examines a typical specification found in the corporate finance literature. We follow Petersen (2009) and compute annual variables from the Center for Research in Security Prices (CRSP)/Compustat merged database from 1973 to 2017. We begin by applying S-estimators to the following model separately for each of the years 1973-2017 to identify vertical outliers and bad leverage points.

[mathematical expression not reproducible] (2)

Panel A of Table 4 reports the percentage of multivariate outliers (vertical and bad leverage points) occurring in each year. Overall, vertical outliers compromise about 2.6% of all observations. The data show considerable variation in the annual incidence of outliers. Vertical outliers, for example, comprise only 0.6% of observations in 2003, but 8.3% in 1991. In terms of bad leverage points, the 44-year average annual incidence is 3.7%. Bad leverage point frequencies exhibit substantial variability, from a low of 2.4% in 1979, 2003, and 2005 to a high of 11.9% in 2016.

The results in Panel A provide context as to why winsorizing and trimming schemes are not consistently effective treatments for univariate outliers. First, winsorizing using the most common cutoff points is of limited value when outlier frequencies exceed the designated threshold. For example, 1% winsorizing is not likely to effectively mitigate outlier influence when outlier frequencies exceed 1%. Additionally, naive winsorizing or trimming schemes that apply a uniform cutoff rule for all years run the risk of having too small a cutoff in some years and too large a cutoff in others. As a result, winsorizing and trimming could potentially exacerbate rather than mitigate outlier-induced biases.

Since multivariate outliers are observations whose dependent variable, in this case the debt ratio, is not consistent with the model's prediction, examining them carefully provides insights on model improvement. We begin by comparing outliers to typical nonoutlier observations. Panel B of Table 4 reports the mean (and median) values for nonoutlier, vertical outlier, and bad leverage point observations, as well as mean (and median) difference testing results. The results report significant differences in nonoutlier and outlier mean (median) values for many of the variables. Vertical outliers in Column 2 have a mean (median) debt ratio that is much larger than the mean (median) value for typical observations, a difference that is significant at the 1% level. Vertical outlier observations also have smaller market-to-book ratios and smaller research and development expenses. Bad leverage points in Column 3 also have higher market debt ratios, are younger, have more tangible assets, higher market-to-book ratios, and invest more in advertising and research and development.

Next, we visualize the outliers using the Rousseeuw and van Zomeren (1990) plots. Because the total data set is large, Figure 2 plots observations occurring in 2017 only. Figure 3 illustrates the data set containing numerous outliers and identifies four bad leverage point examples with Internet service provider Windstream Holdings having the most extreme robust standardized residual (i.e., outlyingness in the dependent variable). The debt ratio for Windstream Holdings increased from around 40% in 2007 to over 80% in 2017. During this period, Windstream Holdings' stock price fell by more than 85% to less than \$2.00. Windstream Holdings also reported a negative book value of equity in 2017. Figure 2 also shows specialty refiner Calumet Specialty Products as a bad leverage point. Further analysis reveals Calumet Specialty Products appears financially distressed with stock prices that fell about 75% and a book value of equity that fell over 80% from 2012 to 2017 period. Also, during the same period, Calumet Specialty Products' debt ratio increased to over 65% from less than 30%. Ingles Markets is a supermarket chain operating in the southeastern US. Ingles Markets' debt ratio is about 50%, a value that is higher than the nonoutlier sample average of less than 20%. Typical for the supermarket industry, profit margins are small (about 6% compared to over 20% for the sample) and tangible assets are high. Domino's Pizza is a high growth stock whose book and market values of equity were negative \$2.7 billion and positive \$8.1 billion, respectively, in 2017. Operating profits grew at an average annualized rate of about 15% and its stock price increased more than 200% from 2014 to 2017.

Of primary interest is whether multivariate outliers materially affect regression coefficients. We add year and firm fixed effects to the model presented in Equation (2). Panel C of Table 4 reports OLS estimated coefficients with multiple winsorizing and trimming levels in Columns 1 to 5 and MM-robust regressions in Column 6 for the full sample period. Due to the large number of year and firm fixed effects, we use the panel data MM-robust regression extension developed in Section 3. The results in Panel C report considerable variation in coefficient estimates and t-statistics across the six models. The largest differences are the loadings on Tangible Assets, Firm Age, and the Advertising/Sales Ratio. The estimated coefficients for Tangible Assets are significant in the five OLS models, but insignificant in the MM-robust regression suggesting the OLS results are driven by outliers. The Firm Age coefficients are positive and significant for two of the OLS models, positive but insignificant for three OLS models, but negative and significant for the MM-robust specification. Likewise, the Advertising/Sales Ratio OLS estimated coefficients are generally positive, but insignificant while the MM-robust coefficients are negative and statistically significant. The differences in the OLS and MM-robust results indicate the OLS results are driven by outliers.

Panel C also reports the Dehon et al. (2012) outlier test p-values for each OLS specification. The outlier tests reject the null hypothesis that OLS estimated coefficients are not materially influenced by outliers, which is consistent with Panel A of Table 4 reporting that the incidence of outliers is large in certain years. Panel C also reports the maximum outlier robust efficiency for the MM-estimation is only about 31%, further evidence concerning the magnitude of the outlier bias problem for this model specification/sample combination. However, low efficiency in large data sets is less problematic than in small ones and this sample contains over 33,000 observations.

A few other determinants are statistically significant, but demonstrate altered economic inference in the robust model. The estimated coefficients on the market value of assets range from 200% to 400% larger for the OLS specifications than the slope coefficient of 0.008 using the robust regression. The profit-to-sales ratio is insignificant using OLS in Model (1), but significantly negative for the remaining estimations. Again, the winsorizing and trimming estimates for the profit-to-sales ratio are considerably larger than the robust result. The market-to-book variable displays similar coefficients in the winsorized and trimmed results in Models 2 and 6, but these coefficients are different when compared to the untreated OLS results and the MM-robust models (by a factor of at least two). This finding highlights our concerns that univariate treatments, such as winsorizing and trimming, can exacerbate outlier influence.

Overall, there are considerable differences in the OLS and MM-robust regression results. Thus, we conclude influential outliers bias the reported OLS estimated coefficients. We identify several influential observations and find they are most often firms in financial distress and firms that increased leverage substantially in a short period. The most appropriate outlier mitigation strategy relies on the nature of the data, the hypothesized relation between debt ratios, and the independent variable of interest. For example, if our primary variable of interest is the Market-to-Book ratio, mitigation is not necessary so long as the hypothesized effect of size on leverage is a general one (i.e., as firms become more valued in the market relative to their book value they tend to become more levered) since outliers do not appear to be driving the untreated OLS estimated coefficients. In contrast, we see that outliers drive the significantly positive OLS estimated coefficients for Tangible Assets as the outlier robust estimated coefficient is insignificant. If theory predicts an outlier effect in that the average relation between tangible assets and leverage is determined by a handful of observations, mitigation should be limited to correcting influential outliers caused by bad data. Alternatively, if there is a hypothesized general effect, better inferences can be obtained by either dropping the influential outliers or performing outlier robust regressions.

4.2.2 | Wahal and Yavuz's (2013) study: Asset pricing application

Asset pricing studies commonly use Fama and MacBeth (1973) (FM) regressions in their analysis. However, Knez and Ready (1997) note that the OLS loss function employed in almost all applications of Fama and MacBeth (1973) is sensitive to both vertical outliers and bad leverage points. Knez and Ready (1997) develop robust Fama and MacBeth (1973) estimates by replacing the OLS loss function with least trimmed squares. They replicate Fama and French (1992, 1993) and find that the estimated relation between size and returns changes from negative when estimated using FM-OLS to positive when using outlier robust FM-LTS. Knez and Ready (1997) conclude the often examined size effect in Banz (1981), where small firms outperform large firms, is driven by extreme observations accounting for as little as 1% of the data. Unfortunately, subsequent asset pricing studies appear to ignore robust estimation and instead winsorize or trim, often at the 1% level. One possible reason for the reluctance to adopt FM-LTS is that LTS estimates, while unbiased in that they fit most of the data, suffer from very low efficiency relative to OLS. Stromberg, Hossjer, and Hawkins (2000) find that LTS has a relative efficiency of only 7%.

We apply Knez and Ready's (1997) concept of robust Fama and MacBeth (1973) estimation, but instead use MM robust regressions that are more efficient than LTS. We replicate the models and sample generation process found in Wahal and Yavuz's (2013) examination of style investing's effect on return predictability. Specifically, we replicate table 1 in the Wahal and Yavuz (2013) study, where the results from Fama and MacBeth (1973) regressions of monthly future stock returns on prior style and stock returns, book-to-market, and size are reported. In their discussion, Wahal and Yavuz (2013) focus on the prototypical six-month future return on six-month prior returns and, for brevity, we follow their example. The data cover 1973-2017. Stock return, book-to-market, and size (the market value of equity) data are from the merged CRSP/Compustat database. The model uses NYSE size breakpoints along with the full set of securities for book-to-market to obtain 5 x 5 size and book-to-market style portfolios. Wahal and Yavuz (2013) drop stocks with negative book-to-market values, while prior stock returns, book-to-market ratios, and size are winsorized at the 1% level in each month. We do not apply these univariate treatments to our primary sample.

Table 5 presents the results in two panels. Panel A provides the mean (and median) values for each variable segmented by observation type, as well as testing for the mean (and median) differences between the segmentations. Multivariate vertical outliers and bad leverage points are identified monthly via S-regression estimators. Panel B reports the coefficient estimates and Newey-West t-statistics for the FM-OLS, FM-OLS with 1% winsorizing of all variables, FM-OLS with 1% trimming of all variables, and MM-robust Fama-MacBeth (FM-MM) specifications.

In terms of the dependent variable of future stock returns, Panel A reports the mean vertical and bad leverage values are about 10 and 25 times larger than the mean typical observation values. Mean size values are smaller for vertical outliers and leverage points as compared to the typical observation mean value. Another difference is that the vertical outliers have a mean prior stock return that is much smaller than the mean stock return for typical observations (-0.15% vs. 3.63%), while bad leverage points have a mean prior stock return that is a bit more than twice as large as the mean for typical observations. Also, prior six-month mean style returns are smaller for vertical outliers and larger for bad leverage points than for the typical observations.

Next, we investigate outlier frequency. Figures 3 and 4 report the monthly percentage of vertical outliers and bad leverage points. Figure 3 illustrates vertical outliers occurring very frequently in some periods (e.g., mid-1970s) and not so much in others (1999-2000). The mean percentage of vertical outliers for all years is 1.8% with a minimum of 0.2% occurring in August 1999 and a maximum of 4.8% occurring in December 1974.

Similarly, Figure 4 shows substantial monthly variation in bad leverage points. Bad leverage points occurred most frequently around 1996, 1999-2000, 2003-2004, and 2010. The high incidence of bad leverages points is cause for concern given that OLS and, by extension, FM-OLS, is especially sensitive to these types of outliers. The mean percentage of bad leverage points for all years is 6.4% with a minimum of 1.7% occurring in May 1981 and a maximum of 20.8% occurring in August 1999.

Panel B reports the estimated coefficients for four specifications, FM-OLS, FM-OLS with 1% winsorizing of all variables, FM-OLS with 1% trimming of all variables, and FM-MM-robust with 28.7% efficiency (MM-robust regression with 28.7% efficiency is equivalent to S-estimation). The estimated coefficients reported for the FM-OLS 1% winsorized specification in Column 2, with the exception of the Style return estimated coefficient, are not comparable to the six-month future return results reported in Panel A of table 1 of the Wahal and Yavuz (2013) study. We suspect the discrepancies are due to their outlier mitigation approach (selective winsorizing and dropping) and differences in the sample beginning and end dates. Figures 4 and 5 report the incidence of outliers varies considerably from month to month and year to year. Hence, different sample periods will contain different levels of outliers and FM-OLS estimates may be more biased for some sample periods and less biased for others. Also, we winsorize all of the variables and do not remove negative book-to-market ratio stocks (after winsorizing there are no negative book-to-market observations).

Panel B also reports the Dehon et al. (2012) outlier test p-value is 0.000, which indicates the FM-OLS monthly regression coefficients in Columns 1, 2, and 3 are biased and point to the necessity of robust regression, and the efficiency for the FM-MM-robust model is about 29%. The efficiency is a reflection of outliers in the data, but is still considerably higher than the 7% efficiency of the FM-LTS regressions in Knez and Ready (1997). The FM-MM Style return coefficient is about 15% larger than the FM-OLS estimate (10.32 vs. 8.96) and is significant at the 1% level in all specifications. However, because the estimated coefficients in the OLS and MM-robust regressions have the same sign and significance level, we conclude outliers are not driving the Prior Style Return empirical results. In contrast, the estimated coefficient for the prior stock return's estimated coefficient is insignificant for FM-OLS, marginally significant for FM-OLS with 1% winsorizing and FM-MM, and highly statistically significant when univariate outliers are trimmed at the 1% level (Column 3). Thus, we conclude the empirical support for stock level momentum at the six-month interval is fragile and influenced by outliers.

The estimated coefficients for size are negative and significant at the 1% level for the FM-OLS specifications and consistent with Asparouhova, Bessembinder, and Kalcheva (2013), Belo, Gala, and Li (2013), and Novy-Marx (2012). The FM-OLS results are also consistent with Fama and French (1992). In contrast, the FM robust estimated coefficient is positive and, while not quite statistically significant, is consistent with the Knez and Ready (1997) FM-LTS results. Thus, there is an outlier effect of size on future returns. For the overwhelming majority of stocks, there is a positive relation between size and future returns and not a negative relation as documented in most studies. Choosing the correct estimation method and mitigation strategy depends upon the hypothesized relation. If theory suggests an outlier effect (e.g., a few very large firms with low future returns or a few very small firms with high future returns), outlier mitigation beyond error correction is not desirable. Researchers will still want to know how many outliers are driving the result (e.g., 0.1%, 1%, 5%, or 10% of the sample). However, if theory predicts a general effect outlier mitigation, either dropping the most influential outliers or using outlier robust regressions will improve statistical inference. How outliers are treated in the empirical literature provides insight into the consensus as to whether the hypothesized relation is a general or an outlier effect. For example, in the stock momentum literature, it is common to winsorize. This suggests researchers in this area view momentum as a general effect and outlier mitigation beyond outlier correction will improve inference.

Overall, the capital structure and asset pricing replications demonstrate the practical importance of formal testing for multivariate outliers, identifying them, and potentially controlling for their influence. Table 4, using a capital structure setup, and Figures 4 and 5, focusing on asset pricing, indicate the incidence of outliers varies considerably from year to year and their occurrence is more frequent than implied by commonly used winsorizing and trimming levels. Figure 3 and Tables 4 and 5 demonstrate that many of the vertical outliers and bad leverage points are extremely influential and point to the necessity of using outlier analysis for model improvement and potentially identifying omitted variables. Finally, the results suggest that multivariate outlier robust regressions often yield different coefficient estimates than OLS regardless as to winsorizing or trimming levels for both the capital structure and the asset pricing models.

4.3 | Replications (using data sets/code provided by authors)

Next, we replicate two recently published articles to further illustrate the practical importance of identifying and potentially mitigating outlier bias. These are Becker and Stromberg (2012) and Guthrie et al. (2012). We selected these papers for replication using a formal screening procedure. Of the studies in the top four journals in 2012 using OLS, there are 36 that use commercial databases available to us. We requested the data and code of a single regression model from the authors of these studies stating our interest in outlier investigation. Because these authors provided code and data, replications of these studies should bias against finding any outlier issues (i.e., authors concerned about outlier robustness likely declined to participate). These replications are not preselected to support a particular hypothesis, but instead we investigate all published articles where authors provide working code and data. Golubov, Petmezas, and Travlos (2012) and Panousi and Papanikolaou (2012) also provide their working code and data sets. Our outlier robust replications indicate significant changes in economic and statistical significance of the estimated coefficients in both studies. However, these changes are not large enough to unequivocally change the overall conclusions in the original papers. Due to space constraints, we do not report the results here.

4.3.1 | Fixed effects in panel data models

In this section, we employ our newly developed panel data MM procedure to examine the prevalence of outlier bias by replicating Becker and Stromberg (2012) and Guthrie et al. (2012).

4.3.1.1 Becker and Stromberg (2012) study

Becker and Stromberg (2012) examine the effect of managerial fiduciary duties on equity-debt conflicts using a 1991 legal ruling that changed corporate directors' fiduciary duties in Delaware firms. The 1991 ruling decrees that fiduciary duties are owed to all interested parties not only when a firm is insolvent (the preruling standard), but also when it is in the "zone of insolvency." Consistent with their prediction that the ruling reduces affected equity holders' incentives to engage in risk shifting, Becker and Stromberg (2012) find a significant decrease in firm volatility following the ruling. Their unbalanced panel data set includes 2,145 observations for 745 Delaware firms and 653 non-Delaware incorporated firms.

Panel A of Table 6 compares the Delaware (treatment group) and non-Delaware (control group) samples in terms of multivariate outlyingness for Becker and Sternberg's (2012) firm volatility results in table 5 of their paper. Column 1 reports the mean values for Delaware firms, Column 2 presents the mean values for the non-Delaware firms, and Column 3 provides the mean differences. We also test whether the differences in the mean and median values between the two groups are significant. The objective is to assess the quality of the research design by determining whether there are significant differences in the incidence and magnitude of the multivariate outliers across the treatment and control groups. In terms of outlier classification types, the treatment and control groups have similar incidences of outliers. Likewise, there are no statistically significant differences in mean robust standardized residuals (outliers in the dependent variable space) across the Delaware and non-Delaware firms. Robust Mahalanobis distances are also similar across the two groups with none of the differences being statistically significant. However, we do note the difference in mean Mahalanobis distances for the Delaware and non-Delaware firms appears economically large.

Next, we examine the outlier detection plot in Figure 5 where the Delaware firms are represented with dots and the non-Delaware firms are represented with X's. Overall and consistent with Table 6, Panel A, the treatment and control groups appear similar in terms of vertical, good, and bad leverage outliers. However, there appear to be differences in the extreme bad leverage outlier space (denoted by the letter "B"). For example, a control group firm has the largest Mahalanobis distance, while a treatment group firm has the largest robust standardized residual. More importantly, Figure 6 demonstrates a large number of extreme bad leverage point outliers. Overall, Panel A and Figure 5 indicate the Delaware and non-Delaware firms are similar in their covariates and the quality of the research design is good. (13)

Panel B of Table 6 reports the original OLS estimated coefficients from Becker and Stromberg's (2012) table 5 in Column 1 and the outlier robust estimates in Column 2. The dependent variable in each model is the volatility of firm return on asset (ROA). The estimated coefficient for the primary variable of interest, Delaware*Post-1991, is significant at the 5% level in the original OLS specification. In contrast, Model 2 reports the MM-estimated coefficient for Delaware*Post-1991 is economically and statistically insignificant. Because the OLS results differ from the outlier robust results, we conclude outliers drive the published relation of interest. We next examine what type of outliers, vertical or bad leverage point, are driving the OLS results. Column 3 reports the OLS estimated coefficient on Delaware (*) Post-1991 is significant at the 5% level after dropping the vertical outliers in Panel A and Figure 5. In contrast, Column 4 indicates an insignificant Delaware*Post-1991 after dropping the bad leverage points.

We further confirm this in unreported analysis where we calculate percentiles of vertical and horizontal distances and rerun the original OLS regression. We find that less than 3% of the sample, 62 of 2,145 observations, are influential as they are responsible for the significant estimated coefficient for Delaware*Post-1991. Our identification of the influential outliers substantially reduces the number of observations subject to manual examination and the data cleaning costs from the full sample of 2,145 to 62. In the interest of brevity, we do not manually examine the influential outliers. However, we note these 62 influential outliers have much larger volatility of firm ROA (0.27 vs. 0.06), Tobin's Q (2.98 vs. 1.26), and two-year stock price change (0.32 vs. -0.02) than the noninfluential observations. We also note that 38 of the influential outliers are Delaware firms and 24 are control (not Delaware) firms.

After correcting any data or omitted variable errors, any further mitigation depends upon the nature of the hypothesized relation. That is, does the 1991 legal ruling matter for all firms? If so, dropping the influential observations or using outlier robust regressions is appropriate. If not, and the ruling only matters for certain types of firms (as described above), further mitigation is not warranted since the influential outliers represent the manifestation of the 1991 legal ruling's effect (i.e., an outlier effect). In this case, identifying and describing the influential outliers provides new insights regarding the ruling's affects.

As for the control variables, the economic and statistical importance of several coefficient estimates varies considerably in the OLS and outlier robust estimations. For example, the OLS coefficient for Ln MV is economically and statistically significant, while the MM-estimate is insignificant. While maintaining statistical significance, the MM-estimate for Ln Assets is approximately one-sixth the size of the OLS estimate. In addition, the estimated OLS coefficient on Market Leverage is large and statistically significant, but the MM-estimated ROA coefficient is close to zero and insignificant.

4.3.1.2 Guthrie et al. (2012) rebuttal to Chhaochharia and Grinstein (2009)

Bebchuk, Fried, and Walker (2002) argue that manager influence over boards of directors enables them to extract rents via compensation schemes that lower shareholder value (e.g., the managerial power hypothesis). If so, independent directors should be associated with better governance and lower CEO pay. Following the accounting scandals that led to the enactment of the Sarbanes-Oxley Act of 2002, the NYSE and NASDAQ began requiring majority independent director boards, as well as fully independent nominating and compensation committees. Chhaochharia and Grinstein (2009) examine the compliance status of firms prior to the NYSE and NASDAQ rule change and find that CEO pay decreases by about 17% more in noncompliant firms than in compliant firms. Chhaochharia and Grinstein's (2009) findings are consistent with the managerial power hypothesis in that nonindependent directors appear to allow CEOs to extract rents in the form of higher pay.

In a subsequent study, using the data and methodology of Chhaochharia and Grinstein (2009, 2012), Guthrie et al. (2012) contend that the drop in CEO pay is primarily due to decreases for just two CEOs out of a sample of over 865 (12 firm-year observations from a 5,190 firm-year observation sample). After removing the two CEOs, the change in pay becomes economically and statistically insignificant. In a rejoinder, Chhaochharia and Grinstein (2012) extend the sample period and argue their results are robust to removing outliers, asymmetric winsorizing, and median regressions. However, our unreported simulations demonstrate that these attempts do not adequately address the multivariate outlier issue. In addition, Chhaochharia and Grinstein (2012) and Guthrie et al. (2012) focus on outliers in the dependent variable space (vertical outliers), but not outliers in the independent space (bad leverage points). This is an important consideration as multivariate bad leverage points can severely influence coefficient estimates. Moreover, in both cases, the authors neither conduct formal testing of the influence of outliers nor examine outlier robust regressions. (14)

Guthrie et al. (2012) use the actual sample created by Chhaochharia and Grinstein (2009) for their main findings. They also reconstruct the sample following Chhaochharia and Grinstein (2009). Guthrie et al. (2012) share with us their reconstructed sample, which we replicate in Table 7. The table provides the published results from Guthrie et al. (2012) and our outlier robust regressions. For the sake of brevity, we focus on the compensation committee models where Guthrie et al. (2012) reports statistically significant increases in CEO pay at noncompliant firms following the rule change. Panel A of Table 7 reports the results of model 5 in the Guthrie et al. (2012) study, Appendix A. Models 1 to 4 provide coefficients for four variations of model 5. Models 1 and 2 include the two outliers that were excluded by Guthrie et al. (2012) and reports results using OLS and MM estimation. Models 3 and 4 repeat the specifications of Models 1 and 2, but exclude the two outliers. Panel B of Table 7 repeats these variations for the Guthrie et al. (2012) model 7 in Models 1 to 4. Guthrie et al.'s (2012) published model 7 examines the effect of the rule change in firms with high and low concentrations of institutional ownership. (15) The dependent variable in all of the models is the natural log of CEO pay.

Figure 5 illustrates the outlier detection plot of the sample for model 5 revealing several large outliers. Eight observations are particularly large vertical outliers (located below the lower horizontal and to the left of the vertical boundaries). Of the eight, four are Kinder Morgan, two are Fossil, one is Gateway, and one is Apple. The large negative robust standardized residuals of these eight observations indicate CEO pay is much smaller than predicted by model 5. Figure 6 also identifies three bad leverage points with large negative robust standardized residuals (located below the lower horizontal and to the right of the vertical boundaries) and four bad leverage points with large positive robust standardized residuals. Figure 5 makes clear that the Apple and Fossil CEOs are not the only, or even the most extreme, outliers in the sample. This means that the Chhaochharia and Grinstein (2009) and Guthrie et al. (2012) results suffer from multivariate outlier induced bias. Next, we compare to the estimated coefficients from Guthrie et al.'s (2012) OLS regressions to outlier robust MM-estimate to determine the scope of the bias. The estimated coefficient on the main variable of interest, Noncompliant x After, changes in statistical significance and economic importance when the two outliers are removed for the OLS fixed effects regressions in Models 1 and 3. This change demonstrates the potential of a few outliers to affect inference in large datasets. In contrast, the MM estimated coefficients in Models 2 and 4 are very similar and statistically insignificant. More interestingly, the estimated coefficients on the main variable of interest in OLS Models 1 and 3, Noncompliant x High Inst Conc., are statistically significant at the 1% level, but are insignificant in MM Models 2 and 4 (Panel B).

In summary, the replication exercises demonstrate the importance of identifying and addressing multivariate outliers. We find evidence of outliers that deviate so much from other observations as to arouse suspicions about data quality, model specification, and the overall mechanisms generating the data. We also find small numbers of these outliers can drive published empirical results and commonly used mitigation techniques are not effective. The efforts of Chhaochharia and Grinstein (2009, 2012) and Guthrie et al. (2012) highlight that even when finance researchers attempt to control for outlier influence the remedies (i.e., identifying and treating outliers in a univariate rather than multivariate regression framework), they can be ineffective. In addition, the analysis of Becker and Stromberg (2012) indicates multivariate outliers also affect commonly used control variables including book-to-market ratios, market leverage, and other data from the Compustat and CRSP databases.

5 | CONCLUSION

Upon examining research in premier finance journals, we find that the majority of studies employ OLS regression as the primary statistical inference technique. When the assumptions of the OLS regression are met, OLS estimates provide a precise summary of relations in the data. However, these assumptions are simplifications that do not necessarily reflect financial or economic reality. One assumption, in particular, that the observed data have a normal distribution is problematic in finance data sets in the presence of unusual (or extreme) observations. These outliers occurring far from the majority of the data can bias OLS coefficient estimates. To remedy this problem, most research efforts attempt to make the data appear normal by altering the characteristics of outliers using univariate identification (e.g., winsorizing, trimming, or removing them altogether using dropping or filtering before applying OLS regression). The problem with these alteration schemes is that they fundamentally change the data thereby introducing new inference limitations.

In this paper, we examine the limitations associated with the use of univariate identification in finance research to remedy the multivariate outlier problem. We propose an identification strategy for multivariate outliers and find this method effectively detects outliers. We then develop a robust regression method that minimizes the bias outliers caused in both cross-sectional and panel regressions. Specifically, we use a combination of base robust estimators (MM-estimators) as described in Yohai (1987). To the best of our knowledge, there are no readily available procedures that compute MM-estimators or any other high breakdown point estimators with clustered standard errors. We rely on the theory of the generalized method of moments to calculate these clustered standard errors. This method also provides improvements that address fixed effects in cross-sectional and panel regressions. Empirically, we employ this method as a diagnostic tool using replications of four recently published studies in the finance journals to demonstrate how adjusting for multivariate outliers can lead to significantly different results.

The infrequent use of methods in finance research to reliably identify multivariate outliers and, when appropriate, remedy the bias they cause suggests there are considerable impediments preventing their widespread use. One obstacle, in particular, is the lack of readily available methods for the types of models and data structures encountered in the finance field. There also appears to be a misplaced belief that common univariate outlier mitigation techniques provide protection against extreme multivariate observations. We help reduce these limitations by proposing a methodology to identify and treat multivariate outliers in the finance field.

Finally, we wish to be clear. We do not advocate simply removing outliers, but to find them and then decide whether to keep, correct, delete, or mitigate them is the most appropriate path.

ACKNOWLEDGMENTS

The authors would like to thank the editors Utpal Bhattacharya (Executive Editor) and Bing Han (Editor), an anonymous referee, Robert Andersen, Bo Becker, C.Y. Choi, Rachel Croson, Paul Goldsmith-Pinkham, Andrey Golubov, Jim Musumeci, David Rakowski, Jan Sokolowsky, Peter Westfall, Mahmut Yasar, Zeynep Senyuz, seminar participants at Virginia Tech and the University of Texas Arlington, and conference participants at the 2018 Financial Management Association Annual Conference, the 2018 Financial Management Association European Conference, and the 2017 World Finance Conference. Special thanks to Adam Harper, Vasanth Rajarajan, and Anurag Mehrotra for help with the data collection. Vincenzo Verardi gratefully acknowledges financial support from FNRS.

REFERENCES

Adams, J., Hayunga, D., & Mansi, S. (2018). Diseconomies of scale in the actively-managed mutual fund industry: What do the outliers in the data tell us? Critical Finance Review, 7, 1-48.

Andersen, R. (2008). Modern methods for robust regression. No. 152. Los Angeles, CA: Sage.

Angrist, J., & Pischke, J. (2010). The credibility revolution in empirical economics: How better research design is taking the con out of econometrics. Journal of Economic Perspectives, 24, 330.

Anscombe, F. J. (1973). Graphs in statistical analysis. American Statistician, 27, 17-21.

Aquaro, M., &Cizek, P. (2013). One-step robust estimation of fixed-effects panel data models. Computational Statistics and Data Analysis, 57, 536-548.

Asparouhova, E., Bessembinder, H., & Kalcheva, I. (2013). Noisy prices and inference regarding returns. Journal of Finance, 68, 665-714.

Banz, R. (1981). The relationship between return and market value of common stock. Journal of Financial Economics, 9, 3-18.

Barro, R. J. (2006). Rare disasters and asset markets in the twentieth century. Quarterly Journal of Economics, 121,823-866.

Bebchuk, L., Fried, J., & Walker, D. (2002). Managerial power and rent extraction in the design of executive compensation. University of Chicago Law Review, 69, 751-846.

Becker, B., & Stromberg, P. (2012). Fiduciary duties and equity-debtholder conflicts. Review of Financial Studies, 25, 1931-1969.

Belo, F., Gala, V., & Li, J. (2013). Government spending, political cycles, and the cross section of stock returns. Journal of Financial Economics, 107, 305-324.

Belsley, D., Kuh, E., & Welsch, R. (1980). Regression diagnostics: Identifying influential data and sources of collinearity. New York, NY: John Wiley.

Bollinger, C. R., & Chandra, A. (2005). Iatrogenic specification error: A cautionary tale of cleaning data. Journal of Labor Economics, 23, 235-257.

Bowen, D., Fresard, L., & Taillard, J. (2017). What's your identification strategy? Innovation in corporate finance research. Management Science, 63(August), 2529-2548.

Bramati, M., & Croux, C. (2007). Robust estimators for the fixed effects panel data model. Econometrics Journal, 10, 521-540.

Chan, L., & Lakonishok, J. (1992). Robust measurement of beta risk. Journal of Financial and Quantitative Analysis, 27,265-282.

Chen, J., Hong, H., Huang, M., & Kubik, J. D. (2004). Does fund size erode mutual fund performance? The role of liquidity and organization. American Economic Review, 94, 1276-1302.

Chhaochharia, V., & Grinstein, Y. (2009). CEO compensation and board structure. Journal of Finance, 64,231-261.

Chhaochharia, V., & Grinstein, Y. (2012). CEO compensation and board structure - There is an effect after all. Journal of Finance, Comments and Rejoinders.

Croux, C., Dhaene, G., & Hoorelbeke, D. (2008). Robust Standard Errors for Robust Estimators. Catholic University of Leuven Working Paper.

Croux, C., Rousseeuw, P., & Hossjer, O. (1994). Generalized S-estimators. Journal of American Statistical Association, 89, 1271-1281.

Davis, J., & McKean, J. (1993). Rank-based methods for multivariate linear models. Journal of the American Statistical Association, 88, 245-251.

Dehon, C., Gassner, M., & Verardi, V. (2012). Extending the Hausman test to check for the presence of outliers. Advances in Econometrics, 29, 435-453.

Dittmar, A., & Duchin, R. (2016). Looking in the review mirror: The effect of managers' professional experience on corporate financial policy. Review of Financial Studies, 29, 565-602.

Edgeworth, F. (1887). On observations relating to several quantities. Hermathena, 6, 279-285.

Fama, E., & French, K. (1992). The cross-section of expected stock returns. Journal of Finance, 47, 427-465.

Fama, E., & French, K. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33, 3-56.

Fama, E., & MacBeth, J. (1973). Risk, return, and equilibrium: Empirical tests. Journal of Political Economy, 81, 607-636.

Golubov, A., Petmezas, D., & Travlos, N. (2012). When it pays to pay your investment banker: New evidence on the role of financial advisors in M&As. Journal of Finance, 67, 271-311.

Guthrie, K., Sokolowsky, J., & Wan, K. (2012). CEO compensation and board structure revisited. Journal of Finance, 67, 1149-1168.

Hampel, F. (1974). The influence curve and its role in robust estimation. Journal of American Statistical Association, 69, 383-393. Hampel, F., Ronchetti, E., Rousseeuw, P., & Stahel, W. (1986). Robust statistics. The approach based on influence. New York, NY: Wiley.

Harvey, A. (1978). On the unbiasedness of robust regression estimators. Communications in Statistics, 7, 779-783.

Hausman, J. (1978). Specification tests in econometrics. Econometrica, 46, 1251-1271.

Hawkins, D. (1980). Identification of outliers. London, UK: Chapman and Hall.

Heckman, J. (1979). Sample selection bias as a specification error. Econometrica, 47, 153-161.

Henry, T., & Koski, J. (2017). Ex-dividend profitability and institutional trading skill. Journal of Finance, 72, 461-494.

Huber, P. (1964). Robust estimation of a location parameter. The Annals of Mathematical Statistics, 35, 73-101.

Huber, P. (1981). Robust statistics. New York, NY: Wiley.

Huber, P. (2004). Robust statistics. New York, NY: Wiley.

Jaeckel, L. (1972). Estimating regression coefficients by minimizing the dispersion of the residuals. Annals of Mathematical Statistics, 43, 1449-1458.

Kennedy, P. (2001). Guide to econometrics. Cambridge, MA: MIT Press.

Knez, P., & Ready, M. (1997). On the robustness of size and book-to-market in cross-sectional regressions. Journal of Finance, 52, 1355-1382.

Koenker, R., & Bassett, Jr., G. (1978). Regression quantiles. Econometrica, 46, 33-50.

Maronna, R., Martin, D., & Yohai, V. (2006). Robust statistics, theory and methods. New York, NY: John Wiley and Sons.

Maronna, R., & Yohai, V. (2000). Robust regression with both continuous and categorical predictors. Journal of Statistical Planning and Inference, 89, 197-214.

Moeller, S. B., Schlingemann, F., & Stulz, R. (2005). Wealth destruction on a massive scale? A study of acquiring-firm returns in the recent merger wave. Journal of Finance, 60, 757-782.

Novy-Marx, R. (2012). Is momentum really momentum. Journal of Financial Economics, 103, 429-453.

Pastor, L., Stambaugh R. F., & Taylor L. A. (2015). Scale and Skill in Active Management. Journal of Financial Economics, 116, 23-45.

Panousi, V., & Papanikolaou, D. (2012). Investment, idiosyncratic risk, and ownership. Journal of Finance, 67, 1113-1148.

Petersen, M. (2009). Estimating standard errors in finance panel data sets: Comparing approaches. Review of Financial Studies, 22, 435-480.

Rietz, T. A. (1988). The equity risk premium a solution. Journal of Monetary Economics, 22, 117-131.

Roberts, M., & Whited, T. (2013). Endogeneity in empirical corporate Finance. In G. Constantinides, R. Stulz, & M. Harris (Eds.), Handbook of the economics of Finance (Vol. 2, pp. 493-572). Part A. Amsterdam, Netherlands: Elsevier.

Rousseeuw, P. (1984). Least median of squares regression. Journal of the American Statistical Association, 79, 871-880.

Rousseeuw, P., & Croux, C. (1993). Alternative to the median absolute deviation. Journal of American Statistical Association, 88, 1273-1283.

Rousseeuw, P., & van Zomeren, B. (1990). Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association, 85, 633-639.

Rousseeuw, P., & Yohai, V. (1984). Robust regression by means of S-estimators. Nonlinear Time Series Analysis: Lecture Notes in Statistics, 26, 256-272.

Salibian-Barrera, M., & Yohai, V. (2006). A fast algorithm for S-regression estimates. Journal of Computational and Graphical Statistics, 15, 414-427.

Stromberg, A., Hossjer, O., & Hawkins, D. (2000). The least trimmed difference regression estimator and alternatives. Journal of the American Statistical Association, 95, 853-864.

Thompson, S. (2011). Simple formulas for standard errors that cluster by both firm and time. Journal of Financial Economics, 99, 1-10.

Tukey, J. (1991). Graphic displays for alternative regression fits. In W. Stahel & S. Weisberg (Eds.), Direction in robust statistics and diagnostics (Part 2). New York, NY: Springer-Verlag.

Verardi, V., & Croux, C. (2009). Robust regression in stata. The Stata Journal, 9, 439-453.

Verardi, V., & Wagner, J. (2011). Robust estimation of linear fixed effects panel data models with an application to the exporter productivity premium. Jahrbucherf. Nationalokonomieu. Statistik, 231, 546-557.

Wahal, S., & Yavuz, M. (2013). Style investing, comovement and return predictability. Journal of Financial Economics, 107, 136-154.

Welch, I. (2016). The (Time-Varying) importance of disaster risk. Financial Analysts Journal, 72, 14-30.

Yohai, V. (1987). High breakdown-point and high efficiency robust estimates for regression. The Annals of Statistics, 15, 642-656.

Zellner, A. (1981). Philosophy and objectives of econometrics. In D. A. Currie, A. R. Nobay, & D. Peel (Eds.), Macroeconomic analysis: Essays in macroeconomics and econometrics. London, UK: Croom Helm.

How to cite this article: Adams J, Hayunga D, Mansi S, Reeb D, Verardi V. Identifying and treating outliers in Finance. Financial Management. 2019;48:345-384. https://doi.org/10.1111/fima.12269

APPENDIX A: ROBUST ESTIMATORS

A.1 Estimator criteria

The ideal estimator efficiently provides precise (i.e., unbiased) coefficient estimates, but there is a trade-off between efficiency and precision. Consider sample [Z.sup.(n)] = {[Z.sub.1],..., [Z.sub.n]}, with n observations. Let T([Z.sup.(n)]) represent an estimator for the parameter [theta]. Applying T to [Z.sup.(n)] provides the estimate of the population parameter such that T ([Z.sup.(n)]) = [??]. The estimator is unbiased if. E[T([Z.sup.(n)])] = E([??]) = [theta]. It follows then that the bias of an estimator is given by:

bias([??]) = E ([??] - [theta]) = E ([??]) - [theta]. (A1)

To provide unambiguous statistical inference, an estimator must converge to the population parameter [theta] and the variance approach zero as the sample size grows. A good method for accounting for bias and variance is to measure the mean squared error (MSE). The MSE of parameter [??] reduces to MSE([??]) = E[[([??] - [theta]).sup.2]] = V([??]) + [[bias([??])].sup.2] and the desired estimator is one where:

[??] MSE([??]) = 0. (A2)

Another way to account for bias and variance is efficiency. In the strictest case, an estimator's efficiency is the ratio of its minimum possible variance to the actual variance. More practically, an estimator is considered efficient if its sampling variance is relatively small. This leads to small standard errors. Since certain estimators are more efficient than others are, we use relative efficiency defined as:

Efficiency([T.sub.1], [T.sub.2]) = E[[([T.sub.2] - [theta]).sup.2]]/E[[([T.sub.1] - [theta]).sup.2]]. (A3)

When considering outliers, Hampel (1974) introduces two criteria for evaluating an estimator's robustness to extreme observations. These are the influence function (IF) and the breakdown point (BDP). The IF is a measure of the dependence of the estimator on the value of a single sample observation y1 on the theoretical distribution F. The IF for estimator T is:

[mathematical expression not reproducible] (A4)

where [mathematical expression not reproducible] is the cdf of the point mass distribution at [y.sub.1] and F is the cdf of the uncontaminated data generating process of [Z.sub.i] for all i. (16) [lambda] gives the proportion of contamination at y. The OLS estimator has an unbounded IF suggesting that the influence of a single outlier on the coefficient estimate grows steadily as that observation becomes more extreme. Practically speaking, the unbounded IF of OLS infers that just one outlier can severely bias the OLS slope coefficients, even in larger samples. A more robust estimator will have a bounded IF. As such, an outlier does not unduly influence its coefficient estimates.

While IF measures resistance to individual or local influential observations, the global measure of resistance is the BDP. BDP is the smallest percentage of outliers in a sample that the estimator can handle without producing arbitrary results. Following Andersen (2008), consider the replacement of m observations in the data set with observations that do not fit the general trend in the data for all possible corrupted samples [Z.sup.(n)]*. The maximum effect from these substitutions is:

[mathematical expression not reproducible] (A5)

where the supremum is over all possible [Z.sup.(n)]*. The estimator breaks down if the effect (m; T,[Z.sup.(n)]) is infinite and the m outliers have an arbitrarily large impact on T.

Fifty percent is the highest acceptable BDP indicating that the estimator withstands a contamination of up to half of the dataset. A BDP higher than 50% is nonsensical as it implies over half of the sample is not representative of the overall sample. Hampel, Ronchetti, Rousseeuw, and Stahel (1986) argue that the BDP should be at least 10%. OLS has BDP = 0.

Next, we provide a background of the robust estimators using the estimator metrics and Equations (1) to (5) above. We also discuss potentially ineffective methods that we find finance researchers consider useful for outlier mitigation (e.g., median regression). The parametric estimators are L-estimators, R-estimators, M-estimators, and S-estimators. (17) In most cases, the current preferred techniques use combinations of these base estimators.

A.2 L-estimators

Estimators that are linear combinations of the order statistics are L-estimators. A special class of L-estimators is the least power, [L.sub.p], estimators. These estimators result from minimizing the sum of the absolute values of the errors raised to the power of p, where p is usually between one and two. (18) The general form minimizes [[SIGMA].sub.i] [|[y.sub.i] -[x'.sub.i].[beta]|.sup.p]. Note that OLS is the case when p = 2. When p = 1, the estimator minimizes the sum of the absolute errors. This specification has many names and predates OLS by about 50 years. The most common designation for this specification is the least absolute values (LAV) estimator. (19) While the LAV regression is resistant to some specific types of outliers, it is not a good robust estimator as the BDP = 0. It is resistant to vertical outliers, but potentially breaks down in the case of presence of a single bad leverage point. LAV also exhibits low efficiency.

Another common type of L-estimator is the regression quantile. A regression quantile is an estimate of a coefficient that results from minimizing a weighted sum of the absolute values of the errors, with positive errors possibly weighted differently than negative errors. The objective function minimizes:

[n.summation over (i=1)] [[rho].sub.[alpha]] ([e.sub.i])

where

[mathematical expression not reproducible]

and [alpha] is the order of the quantile to estimate. Thus, the [alpha]th regression quantile is the coefficient estimate that results from minimizing the weighted sum of the absolute values of the errors. For instance, the 0.25 regression quantile places 0.25 weight on the positive errors and 0.75 on the negative errors. Edgeworth (1887) introduces the median regression estimator, [L.sub.1], where positive and negative errors are weighted equally ([alpha] = 0.5).

Koenker and Bassett (1978) note that trimming or trimmed least squares (TLS) can be thought of as an extension of the quantile regression. (20) To obtain the TLS estimator, the econometrician first computes [??](a) and [??](1 - [alpha]), where [alpha] is the desired trimming proportion(0 < [alpha] < 0.5). Then, observations where [y.sub.i] - [x'.sub.i][??]([alpha]) [less than or equal to] O or [y.sub.i] - [x'.sub.i].[??](1 - [alpha]) [greater than or equal to] 0 are dropped and least squares is computed on the remaining observations.

The problem with all quantile estimators is that the BDP = 0. While quantile regressions mitigate bias from vertical outliers, they do not protect against bad leverage points. Quantile estimators also suffer from low efficiency relative to OLS when error terms are distributed normally (Huber, 1981).

Rousseeuw (1984) develops two other L-estimators: least median of squares (LMS) and least trimmed squares (LTS). LMS replaces the summing of the square errors in OLS with the median of squared residuals. The estimates equation is:

min M [([y.sub.i]-[x'.sub.i][beta]).sup.2] = minM([e.sup.2.sub.i]), (A6)

where M is the median. Since the base median location parameter is more resistant, the estimator and regression model are also resistant to influential observations. While the estimator has a BDP = 0.5, LMS does not have a well-defined influence function (Rousseeuw & Croux, 1993) and has a slow convergence rate.

The LTS estimator, which should not be confused with TLS, minimizes the sum of the squared residuals such that min [[SIGMA].sup.q.sub.i=1] [e.sup.2.sub.(i)] where q = [n(1 - [alpha]) + 1] is the number of subsample observations used to calculate the estimator, and [alpha] is the proportion of trimming. Algorithms are employed to find the subsample that yields the minimum sum of the squared residuals. Using q = n/2 + 1 yields a BDP = 0.5. However, Stromberg et al. (2000) find that LTS has a relative efficiency of only 7%. This excludes its use in most instances, but the LTS estimator can be useful as a first stage in a multistep robust regression technique.

A.3 R-estimators

Jaeckel (1972) proposes a set of estimators that use dispersion measures based on the linear combinations of ordered residuals. The R-estimators are scale equivariant, which is advantageous over the M-estimators discussed next. However, the BDP = 0 and there are issues with the intercept and determining a score function necessary for use. Accordingly, we do not recommend R-estimators and refer the interested reader to Davis and McKean (1993) and Huber (2004).

A.4 M-estimators

Huber (1964) generalized the median regression to a wider class of M-estimators by considering functions other than the absolute value of the residuals. The maximum-likelihood type or M-estimator minimizes the sum of a less rapidly increasing loss function of the errors such that:

min [n.summation over (i=1)] [rho] ([y.sub.i] - [x'.sub.i]) = min [n.summation over (i=1)] [rho]([e.sub.i]).(A7)

The concept is to use weights that do not continue to grow in magnitude as the absolute value of the error term grows. Equation (A7) is not scale equivariant, so the errors must be standardized by a robust estimate of their scale [[??].sub.e], which is also estimated on the data as:

min [n.summation over (i=1)] [rho]([[e.sub.i]/[[sigma].sub.e]]) (A8)

M-estimation requires an iterative procedure. Iteratively reweighted least squares, also known as Iterative weighted least squares, is the standard process to find M-estimates. The steps are: 1) fit an OLS model to the data to obtain the initial [??], 2) use the errors to calculate the initial estimates for the weights, 3) the analyst then choses a weight function and applies it to the initial OLS residuals to create preliminary weights, and 4) the analyst uses weighted least squares (WLS) to minimize [SIGMA] [w.sub.i][e.sup.2.sub.i] and obtain updated [??]. This is the standard solution in matrix form of [[??].sub.1] = [(X'WX).sup.-1] X'Wy, where W is a (N x N) diagonal matrix of individual weights. The process continues by using residuals from the WLS model to calculate new weights to use in a new iteration of the WLS, and this is repeated until the [??] s converge. In practice, the finance researcher need not manually code this process as typical software, such as SAS and Stata, include the routines.

This M-estimator is efficient and an improvement with respect to outliers, but it is not reliably robust to bad leverage points. (21) This is because the iteratively reweighted OLS algorithm is only guaranteed to converge to the global minimum for monotone M-estimators, and monotone M-estimators are not robust to bad leverage points. (22) This shortcoming is not limited to this particular method of M-estimation. In general, all methods of computing M-estimators suffer from the global minimum problem or an inability to identify all leverage points when outliers are clustered (Rousseeuw & van Zomeren, 1990). Because of this issue, M-estimators are usually combined with other robust estimators in a multistep process.

A.5 S-estimators

Rousseeuw and Yohai (1984) develop S-estimators that seek to minimize a measure of residual dispersion that is less sensitive to outliers than variance. The S-estimators are the solution with the smallest possible dispersion of the residuals:

min [[??].sup.s][[e.sub.1]([??]),...,[e.sub.n]([??])]. (A9)

Note that OLS is a special less robust case of S-estimators. OLS minimizes the variance of the residuals. The problem can be seen as looking for the smallest [sigma] that satisfies the equality [1/n] [[SIGMA].sup.n.sub.i=1] [([e.sub.i]/[[sigma].sub.e]).sup.2] = 1, which is the definition of the variance. S-estimation replaces the square in the variance with another loss function, [[rho].sub.0], that awards less importance to large residuals. Specifically, S-estimation minimizes a robust M-estimate of the residual scale:

[1/n] [n.summation over (i=1)] [[rho].sub.0]([[e.sub.i]/[[??].sup.S]) = b, (A10)

where b is a constant defined as b = [E.sub.[PHI]][[[rho].sub.0](e)] and [PHI] is the standard normal distribution. The value of [beta] that minimizes [[??].sup.s] is the S-estimator. While S-estimators address the low breakdown point with BDP = 0.5, it comes at a cost of low efficiency compared to OLS that Croux, Rousseeuw, and Hossjer (1994) indicate is approximately 30%. Consequently, the benefits of S-estimators are commonly combined with the efficiency characteristic of M-estimation to compute MM-estimators.

A.6 Choosing an estimator

The ideal estimator is both efficient and robust to outliers. There is a tradeoff as highly efficient estimators are often not robust to outliers and robust estimators tend to be less efficient when there are no outliers. The issue with low efficiency is that inference is problematic if the errors are normally (or nearly so) distributed. As such, selecting the best estimator requires an examination of the sample. OLS is highly efficient, but has a BDP = 0 and outliers can have unbounded influence (i.e., parameter estimate bias increases with outlier size). LMS and LTS have a high BDP and can identify outliers, but are not a practical estimation choice since standard errors are not available and the results are not stable in larger samples. S-estimation is robust to outliers with a BDP = 0.5 and is more efficient than LMS, but still much less efficient than OLS. M-estimates have poor resistance to outliers and are less efficient than OLS.

In contrast, the MM-estimator has high outlier resistance of up to BDP = 0.5 and a bounded influence with respect to outliers. (23) In addition, MM-estimation can be nearly as efficient as OLS. This suggests that MM-estimators can provide coefficient estimates with less bias than OLS when data sets contain outliers and coefficient estimates that are similar to those provided by OLS in data sets without outliers. Dehon et al. (2012) follow the logic of Hausman (1978) to develop a testing procedure that compares estimates from outlier robust estimators and OLS. When the test fails to reject OLS, there are no significant outliers and the more efficient OLS is the best estimator. In cases where the test rejects OLS, the next step is to perform a second test that compares the robust, but less efficient S-estimator to a more efficient MM-estimator. The highest possible efficiency for the MM-estimator is determined via repeated testing. Thus, OLS, S-, or MM-regressions can be the most appropriate estimator depending upon the severity of the outlier problem.

Alternatively, researchers can use S- and MM-estimations to identify outliers and then drop some or all prior to implementing OLS. This approach has the advantage of using the more efficient OLS, but information from the dropped observations is lost. This approach typically yields similar results to the MM-estimation. At the very least, we recommend researchers compare estimated coefficients from OLS to coefficients from S- or MM-estimators to ease concerns that outliers are biasing estimates. STATA packages to implement the robust methods used in this paper's replications are available by typing, "net from http://homepages.ulb.ac.be/~vverardi/ stata" from within STATA.

APPENDIX B: VARIABLE DEFINITIONS

This table provides variable definitions, data sources, and outlier mitigation efforts for the six samples used in the outlier analysis. The data definitions and outlier mitigation details are obtained from the replicated articles.
```                                                          Winsorize (W),
Variable             Definition (data source)             Trim (T), or
Drop (D)
levels in OLS
regressions

Capital Structure
Application
(Table 4)
Debt Ratio           Book value of debt divided by        None
the sum of the book value of
assets minus the book value
of equity plus the market
value of equity (Compustat).
Ln (MV Assets)       Log of the sum of the book           None
value of assets and the market
value of equity minus the
book value of equity
(Compustat).
Ln(1 + Firm Age)     Log of one + Firm Age where          None
Firm Age is difference years
between the current year
and the first trading date
(Compustat).
Profit/Sales         Ratio of operating income            W & T @ 0%,
before depreciation to sales         1%,2.5%
(Compustat).
Tangible Assets      Ratio of property, plant, and        W & T @ 0%,
equipment to the book value          1%,2.5%
of total assets (Compustat).
Market to Book       Ratio of the market value of         W & T @ 0%,
assets to the book value of          1%,2.5%
total assets (Compustat).
Advertising/Sales    Ratio of advertising expenses        W & T @ 0%,
to sales (Compustat).                1%,2.5%
R&D/Sales            Ratio of research and                W & T @ 0%,
development expenses to sales        1%,2.5%
(Compustat).
R&D Dummy            Dummy variable that is equal         None
to one when the research and
development expense is
positive and zero otherwise
(Compustat).
Asset Pricing
Application
(Table 55)
Future Stock         Six-month geometric stock            None
Return               return beginning in month t + 1
(CRSP).
Prior Style Return   Six-month geometric                  None
value-weighted return on a style
portfolio constructed using
the intersection of NYSE,
Amex, and NASDAQ size
and book-to-market quintiles
for months t - 5 through
t = 0 (CRSP).
Prior Stock          Six-month geometric stock            W @ 1%
Return               return for months t - 5 through
t = 0 (CRSP).
Log Size             Log of the market value of           W @ 1%
equity (Compustat).
Log BM               Log of the ratio of the book         Negative book
value of equity to the market        value of
value of equity (Compustat).         equity stocks
dropped and
then W @ 1%
Becker and
Stromberg
(2012)-
Equity-debtholder
conflicts (Table
6)
ROA Volatility       Standard deviation of the            T if outside
previous eight quarterly changes     [0,1]
in the return on assets.
(Compustat via authors).
Delaware*Post-       Dummy variable: one if the           None
1991                 firm is incorporated in
Delaware and the
observation year is post-1991
(Compustat via authors).
Post 1991            Dummy variable: one for              None
years after 1991 and zero
otherwise (Compustat
via authors).
Return on Assets     EBITDA divided by total assets       T if outside
(Compustat via authors).             [-0.5,5]
Return on Sales      EBITDA divided by sales              Tif outside
(Compustat via authors).             [-1,1]
Ln Assets            Log of the book value of assets      None
(Compustat via authors).
Ln Sales             Log of sales (Compustat              None
via authors).
Ln MV                Log of the market value of equity    None
(Compustat via authors).
Depreciation/        Depreciation divided by the          T if outside
Assets               book value of assets                 [0,.3]
(Compustat via authors).
Variable             Definition (data source)             Winsorize (W),
Trim (T), or
Drop (D)
levels in OLS
regressions
Book Leverage        Assets minus common equity           T if outside
(book value) and minus tax           [0,1]
liabilities divided by assets
(Compustat via authors).
Market Leverage      Assets minus common equity           T if outside
(book value) and minus tax           [0,1]
liabilities divided by assets
minus common equity (book
value) and minus tax liabilities
plus the market value of
equity (Compustat via authors).
Q                    Assets minus common equity           None
(book value) plus the market
value of equity minus tax
liabilities divided by assets
minus 0.1 times the common
equity (book value) and
plus 0.1 times the market value
of equity (limits q to a
maximum value of 10)
(Compustat via authors).
Two-Year Stock       Two-year log change in stock
Price Change         price (CRSP via authors).
Guthrie et al.
(2012)-CEO
Compensation
(Table 7)
CEO pay              Log of CEO pay (Execucomp            None
via authors).
Noncompliant         Binary variable that takes a value   None
of one if the firm did not
have a majority of independent
directors on the board in
2002 and zero otherwise
(IRRC via authors).
Before/After         Period indicators taking a value     None
of one if the observation is
in the premandate (before)
period (2000-2002) or
postmandate (after) period
(2003-2005) and zero
otherwise (via authors).
High/Low Inst        Binary variable that takes a         None
Cone                 value of one if a firm's
institutional ownership
concentration falls into the top
quartile (high) or bottom
quartile (low) (Thomson
Financials 13F database
via authors).
Sales                Log of sales (Compustat              None
via authors).
ROA                  Log of one plus net income           None
before extraordinary items
scaled by the book value
of assets (Compustat via
authors).
RET                  Log of one plus the annual           None
stock return (dividends
reinvested) in the prior
year (CRSP via authors).
Tenure               Log of one plus the number of        None
years the CEO served in the
firm (Execucomp via authors).
```

John Adams (1) | Darren Hayunga (2) | Sattar Mansi (3) | David Reeb (4) | Vincenzo Verardi (5)

(1) Department of Finance and Real Estate, University of Texas at Arlington, Arlington, TX, USA

(2) Department of Insurance, Legal Studies, and Real Estate at the University of Georgia, Athens, GA, USA

(3) Department of Finance, Law, and Insurance, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA

(4) Departments of Accounting and Finance, National University of Singapore, Singapore

(5) FNRS Department of Economics, Universite de Namur, Namur, Belgium

Correspondence

John Adams, Department of Finance and Real Estate, University of Texas at Arlington, 701 S. West Street, Arlington, TX 76019, USA.

(1) While definitions vary, outliers describe observations that deviate so much from other observations as to arouse suspicions about the mechanism generating the data (Hawkins, 1980). We use the term bias to mean the difference between an estimator's expected value and the value of the parameter as determined by the bulk of the data.

(2) We examine articles in the Journal of Finance, the Journal of Financial Economics. Review of Financial Studies, and the Journal of Financial and Quantitative Analysis. We find that 66% use OLS as the primary statistical technique.

(3) To illustrate with a more real-world example, consider a panel data of Minnesota employees containing information on natural hair color, height, weight, eye color, and ethnicity. In this sample, neither a 5'2" person nor an employee with blues eyes or an employee with blond hair would likely register as univariate outliers. Similarly, neither an observation regarding a Chinese male employee nor an employee weighing 235 pounds would appear as outliers. However, if all of these characteristics describe a single employee, then we might suspect this observation is an outlier.

(4) In another example. Moeller, Schlingemann, and Stulz (2005) find large average losses in shareholder value in mergers and acquisitions (M&As). They report a small number of 87 announcements that resulted in a collective loss in acquiring-firm shareholder wealth of \$397 billion, but for the overwhelming majority of 4,109 announcements, shareholders of the acquiring firms collectively gained \$157 billion. Because the main research question in Moeller et al. (2005) is the average wealth effect of all M&A announcements on shareholder value, it would be incorrect to mitigate the influence of the 87 outliers.

(5) For example, in a study on board composition, Chhaochharia and Grinstein (2009) find a negative and significant relation between chief executive officer (CEO) pay and board independence enhancements. In a subsequent study. Guthrie, Sokolowsky, and Wan (2012) replicate Chhaochharia and Grinsteins (2009) work and show the relation is driven by two CEOs out of a sample of 865 firms. Thus, a comment by Chhaochharia and Grinstein (2009) and a rejoinder by Guthrie et al. (2012) attempt to confirm their reported results. Importantly, Chhaochharia and Grinstein (2009) and Guthrie et al. (2012) focus on outliers in a univariate context. Altogether, three publications and a tremendous amount of time was spent attempting to validate each author's work. In our own replication of their work, we find neither Chhaochharia and Grinstein (2009) nor Guthrie et al. (2012) reliably mitigate outliers in a multivariate context.

(6) Anscombe's (1973) Quartet illustrates the importance of visualizing the data prior to analysis and demonstrates howoutliers can affect causality inferences.

(7) As an illustration, Observations 18 and 19 in Table 2 are examples of bad leverage points.

(8) When outlier mitigation is the objective and the test rejects OLS as unbiased, the next step is to perform a second Dehon, Gassner, and Verardi (2012) test that compares the robust S-estimator to a more efficient MM-estimator. The highest possible efficiency for the MM-estimator is determined via repeated application of the Dehon et al.'s (2012) test.

(9) In unreported simulations, we find the superiority of outlier robust regressions to OLS in samples with outliers. These results are available on request.

(10) A breakdown point is the largest percentage of outliers in a sample that the estimator can handle without producing arbitrary results with relatively high efficiency when compared to OLS. For a more complete discussion of MM-estimators, see Maronna, Martin, and Yohai (2006, p. 124).

(11) Employing Monte Carlo simulations to compute outlier robust clustered standard errors is not feasible given the computationally intensive nature of outlier robust regressions.

(12) We use keywords OLS and "least squares" to capture the use of OLS and 'outlier(s)," "extreme value(s)," and "extreme observation(s)" to identify papers mentioning outliers.

(13) This approach can be extended to propensity score and other matching techniques to evaluate covariate balances across the treatment and control groups in a multivariate framework. The concern is that outliers may cause bias if they are more prevalent in either the treatment or the control group. How outliers affect design quality is an open question as some matching methods may drop outlier observations that do not have close peers.

(14) In unreported simulations, we find that median regressions used by Chhaochharia and Grinstein (2009) and Guthrie et al. (2012) provide protection against vertical outliers, but are ineffective in mitigating bad leverage outlier influence.

(15) Guthrie et al. (2012) note the fragility of the results for institutional ownership in conjunction with compensation committee independence in their footnote 21.

(16) For this definition, sample [Z.sub.1],... [Z.sub.n], is assumed to be independent and identically distributed.

(17) Nonparametric estimation, such as artificial neural networks or kernel estimation, can be found in Kennedy (2001, pp. 302-303).

(18) Harvey (1978) provides a simple proof of unbiasedness for 1 < p < [infinity].

(19) Other estimator names are least absolute deviations (LAD), least absolute residual (LAR), least absolute error (LAE), and the minimum absolute deviation (MAD).

(20) Chan and Lakonishok (1992) introduce the trimmed regression quantile estimators.

(21) Theoretically, there are some robust M-estimators that estimate [sigma] and [beta] simultaneously but these cannot be fit using the IRWLS algorithm.

(22) M-estimators are monotone if p is convex over the entire domain.

(23) There are multiple loss functions available to compute MM-estimators.

DOI: 10.1111/fima.12269
```TABLE 1 Incidence of articles in historical finance journals with
outlier mention and treatments

Panel A. Outlier treatments in articles using OLS
Year      % Winsorize   % Trim   % Drop   % Winsorize,   % All other
trim, and/or   treatments
drop

2008      35            11       31        75            38
2009      46            14       21        80            25
2010      41            24       10        75            29
2011      53            12       12        78            29
2012      64            20        6        91            21
2013      54            15       39       109            11
2014      35            13       30        78            35
2015      69             7       17        93            14
2016      56            26       21       103            12
2017      72            10        8        90             5
Average   52            16       17        85            24

Panel B. Incidence of articles using OLS and mentioning outliers
Year   All papers    % All        All papers   % All papers   % OLS
in JF, JFE,   papers       utilizing    utilizing      papers
RFS.JFQA      mentioning   OLS          OLS            mentioning
outliers                                 outliers

1988   194            7%           64          33%            17%
1989   209            5%           71          34%            13%
1990   228            7%           89          39%            15%
1991   195            5%           58          30%            10%
1992   201            6%           66          33%            15%
1993   207            7%           77          37%             9%
1994   182           13%           68          37%            21%
1995   201           12%           58          29%            28%
1996   199           14%           86          43%            26%
1997   223           11%          106          48%            19%
1998   201           14%           85          42%            28%
1999   208            8%           96          46%            23%
2000   216            9%           90          42%            23%
2001   220           16%          102          46%            26%
2002   243           13%          110          45%            25%
2003   241           18%          118          49%            34%
2004   250           15%          105          42%            28%
2005   248           23%          137          55%            35%
2006   259           24%          163          63%            32%
2007   288           23%          185          64%            30%
2008   298           27%          200          67%            37%
2009   379           25%          252          66%            35%
2010   381           24%          234          61%            38%
2011   400           26%          279          70%            34%
2012   365           27%          231          63%            27%
2013   364           30%          178          49%            44%
2014   316           28%          152          48%            33%
2015   328           30%          137          42%            36%
2016   355           31%          163          46%            36%
2017   386           32%          195          51%            37%

Notes: This table provides the number and percentage of articles
published each year in the historical finance journals [Journal of
Finance (JF), Journal of Financial Economics (JFE), Review of Financial
Studies (RFS), and the Journal of Financial and Quantitative Analysis
(JFQA)]. Panel A reports the outlier mitigation methods used in the
historical finance journal articles from 2008 to 2017 using hand
collection. Percentages total more than 100% due to multiple treatments
in some papers. Panel B presents the incidences of articles with
outlier mention, with OLS mention, and with OLS and outlier mentions
from 1988 to 2017 using keyword searches. The data are from the EBSCO
database for JF, RFS, and JFQA and the Science Direct database for JFE.

TABLE 2 An illustration of the multivariate outlier issue

Panel A. Samples
No outlier sample                            Multivariate
outlier
sample
Observation   V-value  X1    X2    V-value   X1    X2
(1)           (2)      (3)   (4)   (5)       (6)   (7)

1              4       10     2     4        10     2
2              5        7     3     5         7     3
3              6        4     7     6         4     7
4              7       15     1     7        15     1
5              7        3    12     7         3    12
6              7        3    10     7         3    10
7             11       10     9    11        10     9
8             11        7    12    11         7    12
9             11       20     3    11        20     3
10            11       16     4    11        16     4
11            11       12     8    11        12     8
12            12        3    12    12         3    12
13            12       19     2    12        19     2
14            12       19    10    12        19    10
15            13       20     5    13        20     5
16            15       19     9    15        19     9
17            16       15    10    16        15    10
18            17       16    17    17         5     2
19            18       19    19     5        19    19
20            21       17    20    21        17    20
Mean          11       13     9    11        12     8
Median        11       15     9    11        14     9
Minimum        4        3     1     4         3     1
Maximum       21       20    20    21        20    20
Std. Dev.      4        6     5     4         6     5

Panel B. Regressions
No outlier sample

All            Winsorized   Trimmed   All
observations                          observations
(1)            (2)          (3)       (4)

Intercept       0.554          0.893        1.511     6.348
(0.550)        (0.935)      (1.203)   (2.558)
X1              0.443          0.451        0.448     0.219
(7.723)        (8.062)      (6.018)   (1.476)
X2              0.574          0.516        0.463     0.193
(8.887)        (8.401)      (5.875)   (1.091)
Observations   20             20           13        20
Adjusted       87%            89%          89%        7%
[R.sup.2]

Panel B. Regressions
No outlier sample   Multivariate outlier
sample
Winsorized          Trimmed   Multivariate
outliers
removed
(5)                 (6)       (7)

Intercept       7.414              10.557    -0.331
(3.208)             (3.564)   (0.297)
X1              0.196               0.154     0.478
(1.403)             (0.816)   (7.978)
X2              0.075              -0.234     0.653
(0.453)             (1.006)   (8.261)
Observations   20                  13        18
Adjusted       <1%                 <1%       86%
[R.sup.2]

Notes: The table provides simulated data and regressions to illustrate
the multivariate outlier problem. Panel A presents the data where the
dependent variable, Y, equals 0.5 x 1 + 0.5 x 2 + the random error
term. X1 and X2 are randomly generated with values in the range of 1 to
20. Columns (2) to (4) report the no outlier sample observations.
Columns (5) to (7) report the multivariate outlier sample. The
observations in the multivariate outlier sample are the same as the no
outlier sample with the exceptions of Observations 18 and 19 where the
dependent variable is much larger and smaller, respectively, relative
to the independent variables as indicated by the other observations and
the data generating process. Values are rounded to the nearest whole
number to ease exposition. Panel B reports the regression estimates for
the no outlier [Columns (1) to (3)] and the multivariate outlier
samples [Columns (4) to (7)].

TABLE 3 A framework for handling outliers

Item                            Comment

1) Do not winsorize             Winsorizing does not mitigate data
errors and the choice of treatment level
is arbitrary. It alters the data so that
inferences are not generalizable.
2) Do not trim based on         Similar to winsorizing, this ad hoc
univariate statistics        univariate approach does not reliably
mitigate multivariate outliers and may
introduce new statistical biases.
3) Decompose constructed        Constructed variables using two or more
variables                    variables can potentially misrepresent
economic reality or mask data errors.
Use both constructed variables and their
underlying variables to identify
outliers. For example, consider return
on equity (ROE) defined as the ratio of
net income (Nl) to book equity (BE). A
researcher may assume that a firm with a
ROE of 20% is performing better than a
firm with 10% ROE. However, if the 20%
ROE firm has a negative 20 Nl and
negative 100 BE, it is clearly doing
worse than a 10% ROE firm with positive
10 Nl and positive 100 BE. Using both
the constructed variables and their
underlying variables helps to identify
outliers that can lead to incorrect
inferences.
4) Drop observations with       Observations with missing information
missing values for the       for any of the dependent or independent
variables used in            observations are dropped in OLS
regressions                  regression computations. Prior to
running the regression, dropping
observations with missing data for key
variables helps ensure that the
univariate summary statistics are
meaningful representations of the sample
used in the regression analysis. In
addition, removing observations that
will not be included in the subsequent
regression analysis reduces manual
checking costs.
5) Identify extreme values of   Not all univariate outliers are data
all dependent and            errors. Likewise, not all univariate
independent variables        outliers are multivariate (regression)
outliers (though many are). Identify the
minimum and maximum value observations
and those in the 1st, 5th, 50th, 95th,
and 99th percentiles. Include the
underlying variables used in the
construction of other variables (Item
3).
6) Correct or remove            Extreme values that are impossible or
impossible and implausible   implausible are generally data errors.
observations                 Examine observations with extreme values
to determine whether they are impossible
or highly improbable. The preferred
course of action is to correct data
errors thereby avoiding potential sample
selection bias. However, when the number
of potential errors is large and manual
investigation is prohibitively
expensive, the next best alternative is
to remove unlikely observations. The
exception to the removing alternative is
when there is structure in the
measurement error.
7) Report and document          Reproducibility is a fundamental
removals and sample          research requirement. It is difficult to
extremes                     replicate papers that do not explain how
outliers are treated. Carefully document
and justify data decisions and their
effects in the data section. Report
detailed summary statistics including
minimum, the 1st. 5th, 50th, 95th, and
99th percentile and maximum values.
8) Test for multivariate        Conduct a formal test to determine
(regression) outliers        whether multivariate outliers
significantly influence OLS regression
coefficient estimates. We recommend the
DGV outlier test. If the test fails to
reject the null of no outlier bias,
report the OLS results. If the test
rejects the null, continue to Item 9.
The DGV outlier text is an option for
the ROBREG command and is available by
typing, "net from
http://homepages.ulb.ac.be/-vverardi
/stata" from within STATA.
9) Identify multivariate        Identify outliers robustly in a
outliers                     multivariate context. We recommend using
S-estimation to compute robust
standardized residuals and Mahalanobis
distances. Provide outlier detection
plots to help identify and label
particularly large multivariate
outliers. S-estimation and outlier plots
are options for the SREGRESS command
that is available by typing, "net from
http://homepages.ulb.ac.be/~vverardi
/stata" from within STATA.
10) Evaluate the multivariate   Carefully consider and examine the
outliers                    nature and origin of the outliers
identified in Item 9 to identify
potential data entry, sampling, omitted
variables problems, and other errors.
Manual examination costs can often be
reduced by focusing on vertical and bad
leverage point outliers with the largest
robust standardized residuals and robust
Mahalanobis distances. That is, by
examining the most influential
observations, those that heavily
influence coefficient estimates, it is
possible to reduce manual evaluation
costs. Specifically, compute percentiles
of the robust standardized residuals and
Mahalanobis distances and rerun the OLS
regressions excluding the extremes to
identify influential observations. For
example, if a previously significant OLS
regression coefficient becomes
insignificant when observations with the
largest 1% robust standardized residuals
and the largest 1% Mahalanobis distances
are dropped, those observations in the
top 1% are influential.
11) Correct the data and        Correct data entry, sampling, omitted
document                    variables, and other errors. Remedy data
entry errors by replacing erroneous
entries with correct values. If this is
not feasible, explain why and remove the
suspected data entry errors. Remove
sampling errors and add omitted
variables to the regression model.
Carefully document data decisions to
preserve reproducibility and validity.
Report the coefficient estimates.
12) Decide if further           Identify, examine, discuss, and document
mitigation is prudent and   the remaining influential outliers.
how treat outliers          Consider the nature of the research
question and economic theory, if the
research question involves tail-risk
events or phenomenon, further mitigation
can lead to incorrect inferences.
However, even for tail-risk research
questions, we recommend to identify,
report, and examine influential
observations (see Items 9, 10, and 11).
Reporting influential observations
improves inference when the influential
observations are consistent with the
tail-risk events or phenomenon of the
research question. Reporting influential
observations also removes the incentives
for unscrupulous researchers to falsely
argue research questions are about
tail-risk phenomena and not general
effects. For general effect research
questions, which we contend represent
the majority of finance research,
mitigate multivariate outliers by
dropping extreme influential
observations and repeating the OLS
regressions. That is, when theory
suggests a general effect, influential
outliers comprising a small fraction of
the sample should not drive the
empirical results. We generally
recommend a maximum 10% cutoff rule for
dropping influential observations. For
example, if influential observations
comprise less than 1% of the sample,
dropping them improves inferences, but
if they account for 30% of the sample,
dropping can lead to incorrect
inferences. Justify the cutoff rule in
the context of the research question.
Alternatively, for general effect
research questions, employing outlier
robust estimators, by design, yields
general effect coefficient estimates.
The robust estimators we employ in this
paper can be found at
http://homepages.ulb.ac.be/-werardi
/stata.
13) Recommendations for         We recommend using MM robust regression
outlier robust estimators   or S-estimation to mitigate outlier
influence as they provide a good balance
of robustness and efficiency. MM robust
estimators can have greater efficiency
than S-estimations when outlier bias is
less severe. Perform repeated DGV
outlier bias tests to determine the
highest efficiency possible with outer
robust MM-estimators. Median regressions
and other quantile estimators have a
BDP = 0 (sensitive to even a single
outlier) and while they mitigate bias
from vertical outliers, they do not
protect against bad leverage points.
Appendix A provides a summary of the
costs and benefits of several outlier
robust estimators. Finally, MM robust
regressions and S-estimation regressions
require patience. These outlier robust
estimators are computationally
intensive, particularly in asset pricing
applications where robust Fama and
Macbeth (1973) regressions may take
several hours to process on a reasonably
powerful desktop computer.

TABLE 4 Capital structure application based on Petersen (2009) study

Panel A. Incidence of outliers (vertical and bad leverage points)
Years   Vertical   Bad leverage   Years     Vertical   Bad leverage
outliers   points         (cont.)   outliers   points

1973    0.015      0.038          1996      0.062      0.060
1974    0.017      0.026          1997      0.060      0.075
1975    0.0026     0.033          1998      0.050      0.079
1976    0.035      0.036          1999      0.024      0.044
1977    0.039      0.027          2000      0.037      0.059
1978    0.024      0.029          2001      0.020      0.052
1979    0.028      0.031          2002      0.027      0.064
1980    0.028      0.023          2003      0.027      0.040
1981    0.021      0.022          2004      0.035      0.042
1982    0.066      0.044          2005      0.036      0.053
1983    0.081      0.045          2006      0.014      0.032
1984    0.049      0.057          2007      0.026      0.069
1985    0.049      0.050          2008      0.012      0.102
1986    0.036      0.063          2009      0.029      0.097
1987    0.033      0.062          2010      0.038      0.070
1988    0.062      0.073          2011      0.042      0.071
1989    0.064      0.069          2012      0.042      0.100
1990    0.072      0.056          2013      0.040      0.097
1991    0.097      0.067          2014      0.037      0.089
1992    0.097      0.054          2015      0.034      0.108
1993    0.115      0.058          2016      0.031      0.119
1994    0.056      0.105          2017      0.033      0.109
1995    0.065      0.074          All       0.041      0.062

Panel B. Descriptive statistics
Observation type mean (Median) values
Nonoutliers   Vertical outlier    Bad leverage
(1)           (2)                 (3)

Debt Ratio           0.174         0.448               0.453
(0.152)       (0.437)             (0.447)
Ln(MV Assets)        6.280         6.229               6.839
(6.259)       (6.102)             (6.771)
Ln(1+Firm age)       2.245         2.234               2.225
(2.398)       (2.303)             (2.398)
Profits/Sales        0.215         0.175               0.229
(0.184)       (0.140)             (0.177)
Tangible Assets      0.182         0.174               0.315
(0.080)       (0.070)             (0.238)
Market-to-Book       1.101         1.033               1.738
(1.036)       (1.009)             (1.159)
Advertising/Sales    0.016         0.016               0.029
(0.013)       (0.012)             (0.015)
R&D/Sales            0.003         0.001               0.009
(0.000)       (0.000)             (0.000)
R&D Dummy            0.195         0.117               0.273
(0.000)       (0.000)             (0.000)

Panel B. Descriptive statistics
Mean (Median) differences
Vertical - Nonoutliers   Bad-Nonoutliers
(2)-(1)                  (3)-(1)

Debt Ratio            0.274 (***)              0.279 (***)
(0.285) (***)            (0.295) (***)
Ln(MV Assets)        -0.051                    0.559
(0.157)                  (0.512)
Ln(1+Firm age)       -0.011                   -0.020
(-0.095)                 (-0.000)
Profits/Sales        -0.039 (***)              0.014 (***)
(-0.044) (***)            (0.007)
Tangible Assets       0.008                    0.132 (***)
(0.012) (***)            (0.158) (***)
Market-to-Book       -0.068 (***)              0.637 (***)
(-0.027) (***)            (0.123) (***)
Advertising/Sales     0.001 (*)                0.013a
(0.001)                  (0.002) (***)
R&D/Sales            -0.002 (***)              0.006 (***)
(0.000) (***)            (0.000) (***)
R&D Dummy            -0.078 (***)              0.078 (***)
(0.000) (***)            (0.000) (***)

Panel C. Regressions
OLS                OLS               OLS
No                 1%                1%
Winsorizing        Winsorizing       Trim
(1)                (2)               (3)

Intercept                0.138 (***)       0.145 (***)       0.142 (***)
(7.448)           (8.314)           (8.284)
Ln(MV Assets)            0.017 (***)       0.027 (***)       0.030 (***)
(3.226)           (7.538)           (8.248)
Ln(l+Firm Age)           0.009 (***)       0.004             0.004
(2.744)           (1.316)           (1.361)
Profits/Sales           -0.012            -0.099 (***)      -0.129 (***)
(0.757)           (6.020)           (7.146)
Tangible Assets          0.054 (*)         0.052             0.054
(1.94)            (2.020)           (1.989)
Market-to-Book          -0.021 (***)      -0.043 (***)      -0.059 (***)
(2.617)           18.363)          (19.017)
Advertising/Sales       -0.087             0.028             0.026
(0.770)           (0.291)           (0.228)
R&D/Sales               -0.024            -0.126            -0.159
(0.610)           (1.079)           (1.066)
R&D > 0                  0.007             0.006             0.008
(= 1 if yes)
(0.978)           (0.908)           (1.094)
Year Fixed         Yes                Yes               Yes
Effects
Firm Fixed         Yes                Yes               Yes
Effects
Outlier Test            0.000              0.000             0.000
p-value
MM Efficiency
Adjusted                0.748              0.756             0.739
[R.sup.2]
Observations       37,430             37,430            33,543

Panel C. Regressions
OLS               OLS               MM
2.5%              2.5%              Robust
Winsorizing       Trim              Regression
(4)               (5)               (6)

Intercept                0.156 (***)       0.145 (***)      -0.004 (***)
(9.350)           (7.914)           (4.625)
Ln(MV Assets)            0.027 (***)       0.033 (***)       0.008 (***)
(7.813)           (8.148)           (4.207)
Ln(l+Firm Age)           0.002             0.005            -0.005 (***)
(0.841)           (1.749)           (2.044)
Profits/Sales           -0.091 (***)      -0.127 (***)      -0.063
(5.271)           (6.463)           (5.460)
Tangible Assets          0.044             0.051 (*)         0.007
(1.723)           (1.829)           (0.318)
Market-to-Book          -0.053 (***)      -0.062 (***)      -0.020 (***)
(20.683)          (19.434]          (12.357)
Advertising/Sales        0.040             0.118            -0.198
(0.371)           (0.889)           (2.664)
R&D/Sales               -0.151            -0.279            -0.155
(0.937)           (1.196)           (4.842)
R&D > 0                  0.008             0.010             0.010
(= 1 if yes)
(1.107)           (1.395)           (1.211)
Year Fixed          Yes               Yes               Yes
Effects
Firm Fixed          Yes               Yes               Yes
Effects
Outlier Test             0.000             0.000
p-value
MM Efficiency                                               30.7%
Adjusted                 0.756             0.738             0.033
[R.sup.2]
Observations        37,430            28,384            37,430

Notes: This table provide percentage of outliers in the capital
structure data identified from robust regressions of the market debt
ratio on the log of market value of assets, log of 1 plus firm age, the
profit to sales ratio, tangible assets ratio, market to book ratio,
advertising to sales ratio, R&D to sales ratio, and a dummy to capture
R&D spending. This table reports the annual percentage of outliers in
the capital structure model using a sample of domestic firms with data
in Compustat from 1973 to 2017. The table reports mean (median) values
for the capital structure sample that contains 37,430 firm-year
observations from 1973 to 2017. The results provide segmentations for
each observation type, as well as mean (median) differences in the
nonoutlier and outlier values. The notations (***), (**), and (*)
denote statistical significance at the 1%, 5%, and 10% levels,
respectively. Variable definitions are provided in Table l.This table
reports the regression coefficients of the market value of debt to
assets ratio on firm measures for the complete sample of domestic firms
with data in Compustat. The sample contains annual observations on
4,919 firms from 1973 to 2017. Outlier test p-values are computed as in
Dehon et al. (2012). All models include firm and year fixed effects.
Firm level clustered robust t-statistics are in parentheses. The
notations (***), (**) and (*) denote significance at the 1%, 5%, and
10% levels, respectively. Variable definitions are provided in Table 1.

TABLE 5 Asset pricing application based on Wahal and Yavuz (2013) study

Panel A. Descriptive statistics

Observation type mean (median)    Mean (median) differences
values
Nonoutlier  Vertical  Bad       Vertical -       Bad -
leverage  Nonoutlier      Nonoutlier
(1)         (2)       (3)       (2) - (1)       (3) - (1)

Future      2.959      29.427    73.119    26.467 (***)    70.159 (***)
Stock
Return
(3.122)    (52.102)  (75.000)  (48.980) (***)  (71.878 ()***)
Prior       2.363       1.997     5.153    -0.336 (***)     2.791 (***)
Style
Return
(2.834)     (2.277)   (3.365)  (-0.557) (***)   (0.531) (***)
Prior       3.625      -0.153    14.238    -3.778 (***)    10.613 (***)
Stock
Return
(3.321)    (-0.634)  (-0.606)  (-3.955) (***)  (-3.923) (***)
Log Size   12.760      12.112    11.274    -0.647 (***)    -1.485 (***)
(12.692)    (12.086)  (11.154)  (-0.607) (***)  (-1.539) (***)
Log BM     -0.475      -0.460    -0.887     0.016 (***)    -0.411 (***)
(-0.466)    (-0.475)  (-0.883)   (0.008) (**)   (-0.417) (***)

Panel B. Regressions

FM-OLS                FM-OLS
1% Winsorized
(1)                   (2)

Prior Style Return              8.960 (***)           9.525 (***)
(3.966)               (4.797)
Prior Stock Return              1.047                 2.134 (***)
(0.745)               (1.847)
Log Size                       -2.041 (***)          -1.520 (***)
(7.000)               (7.005)
Log BM                         -7.673 (***)          -6.577 (***)
(9.250)              (10.998)
Outlier Test p--value           0.000                 0.000
MM Efficiency
Adjusted [R.sup.2]              8.02                  8.50
Number of               1,745,845             1,745,845
Observations

Panel B. Regressions

FM-OLS                FM-MM
1% Trimmed            Robust
(3)                   (4)

Prior Style Return             10.891 (***)          10.330 (***)
(5.440)               (5.296)
Prior Stock Return              3.311 (***)           2.200 (*)
(3.165)               (1.685)
Log Size                       -0.977 (***)           0.215
(5.351)               (1.566)
Log BM                         -5.237 (***)          -3.661 (***)
(10.383)               (7.401)
Outlier Test p--value           0.000
MM Efficiency                                        28.7%
Adjusted [R.sup.2]              7.47                  3.32
Number of               1,663,268             1,745,845
Observations

Notes: This table reports the results from the univariate and
cross-sectional analysis of style and stock level momentum anomalies.
Panel A provides mean (median) values for nonoutlier observations,
vertical outliers, and bad leverage points, as well as mean (median)
differences. Panel B presents the monthly average coefficient estimates
obtained from regressing future stock returns on prior style returns,
stock returns, log size, and log BM. Newey-West t-statistics with
nine-month lags are reported in brackets. Future Stock Return is
computed for the six-month period beginning the following month, Prior
Style Return is the prior six-month value-weighted return on a style
portfolio constructed using the intersection of the size and
book-to-market quintiles, Prior Stock Return is each stock's prior
six-month return, Log size is the natural logarithm of each stock's
market value, and Log BM is the log of the book-to-market ratio. All of
the variables are winsorized (trimmed) at the 1% level in Column 2 (3).
The sample consists of all NYSE, Amex, and NASDAQ stocks from 1973 to
2017. Outlier test p-values are computed as in Dehon et al. (2012). The
notations (***), (**), and (*) indicate statistical significance at the
1%, 5%, and 10% levels, respectively. Variable definitions are provided
in Table 1.

TABLE 6 Becker and Stromberg (2012) study--Equity-debtholder conflicts
(replication using BS Data and OLS Code)

Panel A. Comparing treatment and control samples
Observation type mean values
Delaware      Not Delaware   Treatment-
(treatment)   (control)      control
(1)           (2)            (3)

Incidence by Treatment Type
Overall Sample                   0.524         0.476          0.058
Non-Outliers                     0.558         0.583         -0.025
Vertical Outliers                0.062         0.053          0.009
Good Leverage Points             0.277         0.263          0.014
Bad Leverage Points              0.102         0.100          0.002
Robust Standardized Residuals
Overall Sample                   1.087         1.008          0.079
Non-Outliers                     0.318         0.248          0.070
Vertical Outliers                3.198         3.459         -0.261
Good Leverage Points             0.240         0.315         -0.075
Bad Leverage Points              6.287         5.954          0.333
Robust Mahalanobis Distances
Overall Sample                   6.369         6.148          0.221
Non-Outliers                     2.696         2.676          0.020
Vertical Outliers                3.301         3.368         -0.067
Good Leverage Points            11.219        11.694         -0.475
Bad Leverage Points             15.145        13.309          1.836

Panel B. Regressions

Published          MM robust          OLS w/o
results                               vertical
OLS                regression         outliers
(1)                (2)                (3)

Intercept             0.1182 (***)      -0.0019 (***)       0.1023 (***)
(6.709)            (4.6702)           (6.5723)
Delaware             -0.0076 (**)       -0.0015            -0.0061 (*)
(*) Post-1991
(2.0898)           (1.2174)           (1.8780)
Controls
Post-1991             0.0003            -0.0016            -0.0010
(0.0953)           (1.1748)           (0.4092)
Return on Assets      0.0001            -0.0129            -0.0032
(0.0020)           (1.6326)           (0.0949)
Return on Sales      -0.0020             0.0078             0.0065
(0.0706)           (1.3474)           (0.1888)
Ln Assets            -0.0296 (***)      -0.0054 (**)       -0.0270 (***)
(3.9021)           (1.6944)           (3.1548)
Ln Sales             -0.0101             0.0012            -0.0091
(1.5968)           (0.4359)           (1.3602)
Ln MV                 0.0191 (***)      -0.0006            -0.0189 (***)
(3.1806)           (0.3126)           (3.0831)
Depreciation         -0.0804             0.0047            -0.0811
/Assets
(1.2762)           (0.2084)           (1.2921)
Book Leverage        -0.0244             0.0105 (*)        -0.0143
(0.7507)           (1.7469)           (0.4042)
Market Leverage       0.1086 (**)        0.0010             0.1028 (*)
(2.0677)           (0.1364)           (1.8792)
Q                     0.0082            -0.0003             0.0088
(0.4929)           (0.1687)           (0.5063)
Two-Year Stock        0.0006             0.0004             0.0007
Price
Change
(0.2347)           (0.5228)           (0.2664)
Firm/Year Fixed      Yes                Yes                Yes
Effects
Outlier Test          0.000                                 0.000
p-value
MM Efficiency                           50.1%
Adjusted              0.710              0.250              0.736
[R.sup.2]
Observations      2,145              2,145              2,074

Panel B. Regressions

leverage
points
(4)

Intercept              0.0585 (***)
(6.7103)
Delaware              -0.0004
(*) Post-1991
(0.3424)
Controls
Post-1991             -0.0039 (***)
(3.2219)
Return on Assets      -0.0341 (**)
(2.2451)
Return on Sales       -0.0069
(0.7308)
Ln Assets             -0.0122 (*)
(1.9279)
Ln Sales               0.0010
(0.1828)
Ln MV                  0.0070 (***)
(3.1192)
Depreciation           0.0071
/Assets
(0.2486)
Book Leverage          0.0210
(1.2035)
Market Leverage        0.0197
(1.2150)
Q                      0.0020
(0.6895)
Two-Year Stock        -0.0015 (*)
Price
Change
(1.7813)
Firm/Year Fixed       Yes
Effects
Outlier Test           0.000
p-value
MM Efficiency
[R.sup.2]
Observations       1,874

Notes: This table replicates the fixed effects panel OLS regressions of
table 5, model 1, in Becker and Stromberg (2012). The dependent
variable is the volatility of ROA. Panel A reports the incidence of
outliers by classification type and S-estimates of the robust
standardized residuals and Mahalanobis distances. Panel B provides the
regression results. Standard errors are clustered, where clusters are
defined by the interaction of year and the firm's state of
incorporation. Outlier test p-values are computed as in Dehon et al.
(2012). The notation (***), (**) and (*) denote statistical
significance at the 1%, 5%, and 10% levels, respectively. Variable
definitions are provided in Table 1.

TABLE 7 Guthrie et al. (2012) study--CEO compensation (revisited)
(replication using GSW data and OLS Code)

Panel A. CSW (2012) - Replications of model 5
GSW Model 5         Model 5 Including
including Apple &   Apple & Fossil MM
Fossil              replication
(1)                 (2)

Noncompliant x          -0.014                 0.041
After
(0.209)               (1.126)
Sales x Before           0.313 (***)           0.328 (***)
(5.249)               (6.545)
Sales x After            0.290 (***)           0.330 (***)
(4.578)               (6.653)
ROA x Before             0.480                 0.291
(1.428)               (1.134)
ROA x After              0.191                 0.142 (***)
(1.445)               (2.430)
RET x Before             0.083 (***)           0.122 (***)
(2.689)               (4.564)
RET x After              0.285 (***)           0.306 (***)
(5.754)              (10.58)
Tenure                  -0.015                 0.039 (***)
(0.831)               (2.568)
Outlier test             0.000
p-value
Max MM                                        59.7%
Efficiency
Adjusted [R.sup.2]       0.096                 0.045
Observations         5,318                 5,318

Panel B. GSW (2012 - Replications of model 7
GSW Model 7       Model 7 including
including Apple   Apple & Fossil MM
& Fossil OLS      replication
replication
(1)               (2)

Noncompliant             0.156 (***)           0.089
x high inst cone        (2.684)               (1.306)
-0.068                 0.031
x low inst cone         (0.863)               (0.742)
Sales x Before           0.317 (***)           0.314 (***)
(5.334)               (6.338)
Sales x After            0.296 (***)           0.316 (***)
(4.740)               (6.444)
ROA x Before             0.462                 0.386
(1.387)               (1.387)
ROA x After              0.192                 0.176 (***)
(1.452)               (3.016)
RET x Before             0.082 (***)           0.127 (***)
(2.655)               (4.837)
RET x After              0.284 (***)           0.306 (***)
(5.735)              (10.543)
Tenure                  -0.015                 0.042 (***)
(0.815)               (2.770)
Outlier test             0.001
p-value
Max MM                                        68.9%
Efficiency
Adjusted [R.sup.2]       0.096                 0.038
Observations         5,318                 5,318

Panel A. CSW (2012) - Replications of model 5
GSW Model 5         Model 5cexcluding
Excluding Apple &   Apple & Fossil MM
Fossil Published    replication
results
(3)                 (4)

Noncompliant x           0.060 (*)            0.041
After
(1.781)              (1.126)
Sales x Before           0.354 (***)          0.329 (***)
(6.985)              (6.561)
Sales x After            0.342 (***)          0.331 (***)
(6.775)              (6.668)
ROA x Before             0.378                0.281
(1.131)              (1.095)
ROA x After              0.120                0.142 (***)
(1.144)              (2.429)
RET x Before             0.087 (***)          0.123 (***)
(2.829)              (4.576)
RET x After              0.319 (***)          0.306 (***)
(7.454)             (10.599)
Tenure                  -0.007                0.039 (***)
(0.419)              (2.565)
Outlier test             0.002
p-value
Max MM                                       68.5%
Efficiency
Adjusted [R.sup.2]       0.121                0.045
Observations         5,306                5,306

Panel B. GSW (2012 - Replications of model 7
GSW Model 7         Model 7 excluding
excluding Apple &   Apple & Fossil MM
Fossil Published    replication
results
(3)                 (4)

Noncompliant             0.167 (***)          0.088
x high inst cone        (2.956)              (1.298)
0.026                0.031
x low inst cone         (0.669)              (0.737)
Sales x Before           0.356 (***)          0.315 (***)
(7.023)              (6.353)
Sales x After            0.346 (***)          0.317 (***)
(6.849)              (6.458)
ROA x Before             0.367                0.375
(1.103)              (1.354)
ROA x After              0.121                0.176 (***)
(1.151)              (3.014)
RET x Before             0.087 (***)          0.128 (***)
(2.807)              (4.850)
RET x After              0.318 (***)          0.306 (***)
(7.444)             (10.557)
Tenure                  -0.007                0.042 (***)
(0.410)              (2.767)
Outlier test             0.002
p-value
Max MM                                       69.5%
Efficiency
Adjusted [R.sup.2]       0.122                0.038
Observations         5,306                5,306

Notes: This table replicates select fixed effects regressions of
Appendix A in Guthrie et al.'s (2012). The dependent variable is the
natural log of CEO pay. Standard errors are clustered at the
firm-period level. Outlier test p-values are computed as in Dehon et
al. (2012). The notations (***) and (*) denote statistical significance
at the 1% and 10% levels, respectively. Variable definitions are
provided in Table 1.
```
COPYRIGHT 2019 Financial Management Association
No portion of this article can be reproduced without the express written permission from the copyright holder.