STAS and BehaviorScan - It's Just Not That Simple.
JOHN PHILIP JONES AND I HAVE very different paradigms for what constitutes valid research that a manager should use for decision making. His rejoinder to my comments on his STAS methodology doesn't address our main point of disagreement very well. The basic issue is whether STAS is a 'solid foundation' for improving advertising productivity as he and M. H. Blair stated in the article that started our dialogue. Let me just comment on the issues raised in the latest comments by Jones and refer the reader to my original article for details that do not need repetition.
Jones describes 10 features of the STAS design. I applaud him for trying to make it as clear and as simple as possible to understand. The first feature describes the national sample and the measurement of purchases. This feature is reasonable. The second feature attempts to isolate the effect of advertising by separating the purchases that are preceded by advertising entering the household during the previous seven days (the Ad Households) from those not preceded by advertising (the Adless Households). Purchases in the Adless households are subtracted from those in the Ad households to 'measure the purchases driven by advertising.' He justifies this analysis procedure as follows: 'The whole research depends on two samples which are similar in all respects except one - the inclusion or absence of the variable to be tested. This is of course the classic methodology of the controlled experiment.'
Jones's research design differs from the classic 'controlled experiment' in very important ways. In a controlled experiment the test and control groups are determined by random assignment of subjects. In Jones's design, advertising media planners and buyers do the assignment of households to the Ad or Adless condition. If, for example, the media planners used a media plan to target a certain group or groups over a year or quarter, then those households who had an ad in one seven-day period would be much more likely to have been exposed in prior seven-day periods also. If the groups of ad and adless households over time are correlated in this manner, then it is invalid to give all of the credit for share differences between the groups to this week's advertising, It is just as plausible to say that the difference in shares this week is due to all of the past advertising targeted at the Ad versus Adless households. The Jones design does nothing to recognize or control for differences in past advertising exposures.
To make matters even worse, many media planners target current users of their products. If the ad group had higher shares (and that was why they were targeted with advertising), then why should the advertising be credited for shares that were there anyway? If households are assigned to treatment (Ad) and control (Adless) at random, then these issues would not be as salient. Given that the assignment to the Ad and Adless condition is not at random, it is imperative that statistical covariate adjustment be applied to adjust for the differential effects of past advertising exposures and past brand purchases between the groups. Even if the group assignment is random, it is accepted, good analytical procedure to apply the statistical covariate adjustment anyway to reflect random differences between the groups that are not associated with the treatment (e.g., advertising) differences. In the BehaviorScan analysis of the test and control groups (which are randomized within matched subgroups), it is standard operating procedure to still adjust the results using statistical covariance analysis to account for differences in past purchasing and past advertising exposure (if available) at the household level. Because of the lack of randomization and the lack of any accepted statistical adjustment for the biases that creates, there is prima facie reason to be very skeptical of the results of a STAS analysis. So when Jones says, 'since a single advertising exposure is enough to put a household into the Ad household groups, the research examines the sales-generating effect of one advertisement'; he neglects some very important potentially biasing effects. He ignores the impact of other ads that the household group would have seen before, and that the household might have already been a loyal purchaser of the brand.
However, the comparison Schroeder et al. allude to when they used BehaviorScan and STAS to analyze the same experiments, gave STAS the benefit of the doubt on the above randomization issues. They took situations where there really was a randomized set of test and control groups, used the STAS analysis procedure, and compared the results to the BehaviorScan analysis procedure. In my analysis of this comparison I take issue only with the very simplified way in which Jones compared the Ad versus Adless groups. I do not take issue and, in fact, applaud the national sample and the automated way exposures are captured.
What is missing from Jones's procedure is any adjustment for differences in the Ad versus Adless groups, week by week, store by store, in the promotional, distribution, shelf placement, competitive activity, etc., in-store conditions. These differences happen because shoppers in the test and control groups, the Ad versus Adless groups, do not all shop in the same stores each week, and the in-store conditions change considerably from week to week and from store to store. The BehaviorScan analysis procedure uses the store-week shares of the test brand for the stores in which the households shop as a covariate. This covariate adjusts for in-store factors that should impact sales of the brand. For example, assume two households shop in two different stores: store A and store B during a week. Assume the test brand's share is 5 percent in store A and 25 percent in store B because of in-store activity in store B. The BehaviorScan analysis would weight these two store shares to imply that a household which shops at store B during that week will have a higher probability of buying the test brand than a household that shops in store A that week. This statistical adjustment reflects the in-store activity, regardless of the effect of advertising. As any brand or sales manager knows, the effect of in-store activity can vary greatly from week to week and from store to store. It is these very large differences in in-store conditions that randomly happen at the micro level of store-week-household, which make it necessary to adjust simple comparisons of test and control groups, even when the assignment is truly at random.
The low correlation between STAS results and BehaviorScan is due to the above in-store adjustment problem. In his comments, Professor Jones shows some data on promotional intensities of brands compared to their categories and shows no relation to the STAS results across brands in a decile analysis. His analysis sheds little light on the in-store lack of balance problem. This problem is not as much an aggregate promotion problem as a problem of different conditions for each household each week when they make purchases. The fact that 75 percent of the variation in sales changes that were accounted for by advertising is missed by the STAS analysis of the seven BehaviorScan tests is pretty solid evidence that the STAS analysis is too simple. It misses the impact on brand sales of in-store differences among weeks, stores, and households, even if it would be used on a randomized assignment of test and control groups. When one then considers the problems discussed above of brand-loyalty differences and past advertising exposure differences that are not handled either by randomization or statistical covariate analysis, I am apprehensive.
Professor Jones's comments on the subject of ARS Persuasion testing are directly contradicted by the ARS data I present in my rebuttal. The data show an extremely high correlation between live rough ads and the performance of the actual ads that were aired.
Billions of dollars are spent on TV advertising each year. As marketing researchers, educators, and consultants, we owe it to the advertisers to help them do the best possible job in evaluating and improving the productivity of their TV advertising expenditures. Professor Jones has correctly pointed out the value of using single-source data to help evaluate and improve the process. Those data need to be coupled with appropriate analytical models and tools that isolate the impact of advertising from the other confounding factors impacting brand sales. I would be happy to work with Professor Jones and/or any entity or company that wanted to develop a system to use single-source data which does adjust for the other confounding impacts on sales. Without such adjustments, the use of simple procedures like STAS can lead to big errors in decision making.
In reference to Occam's razor, if two theories equally account for the phenomenon under study, then one selects the simpler of the two theories. But Occam's razor does not hold when the premise (of consistent theories) is violated. JAR