Structuring and measuring the size of business markets.
It is perhaps appropriate to start with a little history. In the days when manufacturing was king and service industries were hardly thought of, there was a significant market for what was called 'industrial market research'. This comprised researching small populations of extremely non-homogeneous businesses to establish the market. for certain types of industrial machinery, equipment or products. The studies generally comprised equal parts of desk research and depth interviews, and a great deal of experience was required to interpret results and derive market size, shares, etc. It was generally considered, because the markets were small in terms of absolute numbers of buyers, and non-homogeneous, that large statistical samples and sophisticated sampling techniques were simply not appropriate.
In the 1960s new, bigger, 'industrial' markets emerged; for example, the agricultural research market was large at that time, the construction industry thrived and spent money on research, and office equipment began to emerge as a research market when IBM, Rank Xerox and Philips started to make copiers and electric typewriters which were affordable by most businesses.
The major change between these 'new' markets and traditional industrial research markets was that the number of potential participants was large and therefore the application of large samples and sophisticated sampling techniques was necessary. The market research industry therefore started to apply what it had learnt in consumer markets to these 'news' markets.
Andrew McIntosh and Roger Davies' paper, 'The sampling of non-domestic populations', was published in 1970, and summarised the state of knowledge in the agency world at the end of the 1960s, recognising that there was a market for business research which could benefit from the application of the already tried and tested consumer research techniques which had been developed. The paper deliberately referred to 'non-domestic' populations to make the distinction from consumer and industrial research. Agencies began at this time to use the term 'mass' business markets, to refer to the new area.
The purpose of the present paper is to update the 1970 paper and summarise the state of business sampling at the beginning of the twenty-first century. It also aims to encourage the reader to consider what work needs to be done to develop business sampling in the future, since the industry is facing a faster rate of change than at any point in the past.
The business market has extended considerably into the consumer population, as work habits change and more people work from home. Products that would once have been considered suitable only for businesses have become available to all. Clients too have blurred the boundaries. Yet there is still need for specialist techniques to address an increasingly complex business world.
Key differences between business and consumer sampling
Whilst the sample theory is the same for all markets, there are still some key differences between business sampling and consumer sampling as follows:
* The definition of the unit to be interviewed -- is it the site (establishment), organisation, legal entity or some other business unit? Each definition offers different benefits and drawbacks, depending upon the research objectives. In consumer research only the individual or the household has to be considered.
* The definition of respondent -- in business research, an individual is interviewed as a representative of the business. We have to be sure to interview the right individual in each business, and be consistent across businesses. This can be complex as business titles and responsibilities vary widely across businesses of different sizes and sectors.
* The lack of homogeneity in the business universe. There is much more variance between a small business and a large organisation than between any two consumers. There is also great variation between different types of small businesses.
* The available sample frames of businesses vary considerably in accuracy and coverage.
* The public sector generally counts as business, though the usage and buying patterns are frequently very different from the private sector.
The sampling techniques involved in business research are essentially the same as in consumer research, and are described in the seminal textbooks on this topic (e.g. Kish 1965). The main difference is that the lack of homogeneity in the universe means that stratification (1) is very much more critical in business markets, and there is no possibility of applying quota sampling (which has become much more commercially acceptable for consumer research studies in recent decades).
Great progress has been made since 1970, but business research sampling is still problematic and requires considerable knowledge and skill, particularly when statistically valid market estimates are the research objective.
Defining business markets
The necessity of actually defining a business market was not mentioned explicitly in McIntosh and Davies' 1970 paper, because at that point it was not considered relevant; it was fairly clear then what a business was. Home-based businesses (i.e. literally a business run from a residential address) were considered irrelevant in terms of consumption of products and services (because business products were relatively expensive), and even subsidiary sites of larger organisations had considerable autonomy of decision-making. Therefore a business was generally defined as a separate location of a commercial, industrial or public sector entity -- what we might call an establishment today -- and most research was conducted at establishment level. However, in the current world, the spread of information communication technology (ICT), due to low-cost new technology, and the growth of more complex organisations, means that the whole way in which businesses relate to each other, and decisions are made, is completely diff erent from how it was 30 years ago. Many subsidiary sites of large organisations do not make decisions now, and so are not relevant to research, but the requirement of even the smallest businesses for services and equipment means that the definition of the business market has widened, to include home businesses. This change has made sampling more complex, and makes segmentation (or sample stratification) even more important, to achieve sample efficiency.
The elements making up a business market differ considerably in type and importance to any survey. McIntosh and Davies (1970) recommended size as the main stratifying variable. Because of the changes in organisation structures described above, size alone can no longer be used. Current practice for both sampling and marketing purposes is to segment the market into different organisational types as follows:
* Large global and national organisations
* The public sector
* SMEs (small and medium enterprises)
* SOHO (small office/home office)
These groupings encompass a great deal of the differences between businesses, in terms of the way in which the different types specify and use products and services. The groupings also take account of both organisational structure and size. Sampling considerations for each segment are discussed below.
Large global and national organisations
The issue here, for sampling or market-sizing purposes, is the complexity of the organisation structure per Se, and the decision-making structure within that. For example, to measure the market for some form of office equipment (say switchboards, printers or copiers) the research needs to understand where equipment is held and who knows about it (in order to measure inventory), and where decisions are made (in order to measure the decision-making process). Further complexities include:
* Each organisation can comprise many legal entities, and they may operate autonomously or be heavily centralised. The legal entities may themselves have numerous sites of varying size and importance, some of whom will make autonomous decisions. Decision autonomy will vary by type of equipment and service.
* The composition of organisations changes constantly as businesses are bought and sold, so maintaining a consistent sample frame is extremely difficult.
* Organisations do not always own outright all the legal entities involved ; they will have major and minor shareholdings, sometimes in conjunction with other major organisations. Defining an organisation uniquely is therefore not always possible.
Experience shows that many clients do not understand the organisational structure of their major accounts, and have adopted various pragmatic strategies for account management. This means that they do not always appreciate the need to measure at site or establishment level to obtain inventory data, nor the influence of subsidiary sites on the decision-maker. Good practice indicates that market measurement research should always start at site (establishment) level, since the site is the end-user and equipment can be in only one place. The handling of non-decision-making sites is then a matter of meeting the survey objectives. Generally, minimal inventory and usage information is collected at non-decision-making sites, and data on the decision-making process are collected at decision-making sites. This 'bottom up' approach to sampling may appear extravagant but is the only way to ensure complete coverage of the universe given the current state of sampling databases available.
The public sector
Traditionally defined as central, regional and local government plus health and education, the public sector is a maze of complexity, and one that changes over time with government policy.
Doctors' surgeries, local security offices and job centres are as much part of this market as the Ministry of Defence or University of London. Decision-making autonomy changes over time from centralised to decentralised and back again, depending on political agendas. A great deal of procurement is done through consortia established specifically for the purpose, and hence the need to understand the decision-making channel. Libraries, parks and gardens, and leisure centres belonging to local authorities are included in this sector.
There are also issues as to whether private sector health and education establishments should be included within the public sector (for practical reasons), or in the SME segment. The key for survey research purposes is to follow the client's own definitions, since this eliminates potential confusion. Sometimes other non-profit-sector organisations -- charities, political parties, etc. -- are also included, for convenience, although these can equally well be treated as SMEs. Again, good practice indicates that, as for large global and national organisations, an establishment-based approach is optimal for market measurement.
This includes every business between SOHO and large national and global organisations. At the top end are large, multi-unit organisations which behave similarly to the large corporates. At the bottom end are small businesses, such as butchers and cafes, which have little requirement for business products and services, but also other small businesses such as PR agencies, solicitors, etc., who are large consumers in business markets.
In other words, although this category is typically treated by clients as a homogeneous group, there is, in fact, a wide range of attitudes and behaviour both across and within the sector, and it does therefore contain a great deal of variability and complexity for business market studies. The most appropriate way of dealing with the complexity is to further stratify the category by size (in terms of numbers of employees). The researcher should also consider additional stratification by industry, if appropriate for the study.
This segment was unheard of 30 years ago and basically covers small commercial offices and businesses plus any business run from a home. It is the 'home office' which presents some problems for business sampling and market sizing, since even businesses involving a few hours of work per week are included in the official statistics, and sometimes several 'businesses' can be run from a single home. Further, there are no databases available for sampling purposes.
It should be noted that there is no 'official' definition of SOHO and the cut-off between SOHO and SME can vary.
In summary, there is a need to be clear about the exact definition of the population to be surveyed, and to define a broad segmentation along the lines described above before starting to identify a suitable sample frame.
Variations in definition of the business universe by country
In pragmatic terms a real live business is defined the same way in all countries; however, the business researcher must be aware of the different definitions used in official statistics in various countries and in available databases. Typical problems which can arise are:
* Zero employee establishments: these are defined generally as businesses run by a self-employed person, but definitions do vary across countries. Data protection rules are sometimes applied to these businesses, although nor to other (with employee) businesses.
* 'Paper' companies: this is where a company name is registered but no employees are retained, and no work is done. In Swiss databases a solicitor or lawyer can hold the registrations of hundreds of paper companies.
* Another example occurred in Finland, using the official government business database. A financial service company with zero employees is selected. This turns out, on contact, to be the registered pension company of a major bank. In effect there is no separate zero employee business -- there is only the major bank. This has an interesting effect on universe estimation.
* Convenience registrations: Sometimes, when using an official database, anomalies will be identified which are allowed for administrative convenience. For example in Belgium all education employees in a local authority are recorded as one 'business' when in effect there may be several schools. In the UK Census of Employment it was the case that monthly paid employees might be registered in one entry and weekly paid workers in a second entry.
* Commercial databases frequently have unclear definitions of businesses and are not complete.
* Databases which are derived from telecom company customer lists may have idiosyncratic definitions of a business. The BT definition -- customers who have a business line -- is not consistent across the country. There is a certain amount of self-definition involved, and many home businesses are run on domestic telephone lines.
These examples are specified to alert business researchers to definitional issues, and to remind them that vigilance is necessary in all cases if accurate measurement is to be achieved.
Business market structures
The different channels used in business markets
An additional complexity of business markets is the role of the intermediary, which is more complex than in consumer markets. Again, this was not a particular problem in 1970 when business products and services were very expensive relative to salaries, and, since margins were relatively high, most were sold directly to the business customer, by a face-to-face sales force.
In current times, manufacturers and suppliers of business products or services may have little contact with their end-users. They may operate a completely indirect model, selling entirely through third parties; or they may operate a direct model with major corporates and the public sector, and an indirect model with SMEs and SOHOs, or some combination of these; or they may, as some major organisations do, act as a wholesaler to their competitors, which can add a whole new layer of complexity to market measurement.
Consortia may be set up to buy for groups of businesses, or local authorities, and these too may deal directly or indirectly with the manufacturer. Consultants play a significant role in decision-making, specification and implementation, particularly in large organisations.
Identifying decision-makers, and even where decisions are made, can therefore be a significant problem for research design, and different sample designs may be required to meet a market measurement research objective compared with an 'understanding decision-making' research objective.
Integrating research into customer relationship management systems
A current challenge for business-to-business (B2B) researchers is the development of customer relationship management (CRM) systems for corporate and public sector customers. (CRM systems are electronic systems which hold all details relating to every customer; they are typically used to ensure that all interactions with the customer are consistent and knowledge driven.)
A key development in the marketing of telecoms and technology products in particular, is the fact that the concept of the specialist decision-maker has been completely eroded. Any senior manager can identify the requirement for a new technology product or service, and can drive the decision through, almost (but not advisedly) without reference to the comms or IT specialist within the organisation.
Face-to-face marketing to this new expanded universe of business decision-makers is clearly uneconomical, and suppliers have to market to them in different, more cost-effective, ways - hence the need for CRM systems. In order to understand the efficiency of the marketing and communications, research is required, and clearly, to optimise the learning, an integration of the two data sources is critical, with all that that entails with regard to data protection.
A new role is emerging for business research as creators and researchers of these databases.
Business universes - statistics, sample frames and sampling
Government statistics and what they mean
McIntosh and Davies (1970) referred to a planned government Central Register of Businesses which would mean that there would be 'no further need for discussion of the imperfection of non-domestic sampling frames'. The file was more than 25 years away at that point, and even now is not generally available for sampling purposes. Nevertheless, there has been a significant improvement in the scope and quality of the official statistics now available on the business universe in the UK as a result of the creation of the Interdepartmental Business Register (referred to as the IDBR) in the mid-1990s. This register is maintained by the Office of National Statistics on behalf of all those government departments dealing with business, and combines the (various) details provided by such diverse agencies as Customs and Excise (basis VAT returns), Inland Revenue (basis PAYE returns) and various surveys conducted by the ONS, mainly on behalf of the Department of Trade and Industry (Figure 1).
The IDBR covers 2.4 million 'locations' including 1.7 million enterprises. Codes have also been established to identify 'enterprise groups' - those businesses which own a number of enterprises. There are approximately 43,500 such enterprise groups: 34,000 UK owned and 9,500 foreign owned.
Information from the IDBR file is restricted - it is only available in summary form to a detail which ensures that individual 'records' are not identifiable. It is possible to establish the market size for almost any product (out of 4800 products), with the ONS providing data on UK production (p), exports (e) and imports (i). This is available for both units and value, ensuring that companies can use their own data to obtain a reasonable estimate of market share in both unit and revenue terms (total market = p + i - e).
The government statistical service publishes a statistical bulletin each July from information compiled by the SME Unit of the DTI. The latest data (as at end of 2001) are from SME Statistics for the UK 2000, and comprise information as at the beginning of 2000 (it is always 18 months behind). This indicates that there are an estimated total of 3.7 million enterprises in the UK, 2.3 million of which have zero employees.
In evaluating the DTI statistics (Figure 1) it is important to understand that they are compiled from several files:
* Private sector = 1.7 million enterprises, 2.3 million locations (establishments) (this is the IDBR file).
* Central government/local authorities/non-profit sector = 192,000 locations.
* 'Zero' employees estimated from Labour Force Survey (2.2 million self-employed, without employees, less some deductions).
The statistics are not perfect, but they are better than they ever have been.
Enterprises versus establishments
It is important in using the statistics to understand the difference between enterprises and establishments. Enterprises can comprise many establishments (locations). Table 1 shows the quantitative comparisons. Note that the published data show enterprises -- the establishment numbers are from a special analysis. Reassuringly, the establishment numbers compare reasonably well with the last Census of Employment (1995) -- note that the Census of Employment covered PAYE enterprises only.
We have universe statistics of 4.3 million establishments and 3.7 million enterprises. The IDBR is not generally available for sampling purposes (though it can be used for government surveys), so how do we sample from the universe?
If the universe comprises 4.3 million establishments, there are only 1.9-2.0 million business establishments with a business telephone line and lists are available for (most of) these. In addition there are 1.5-1.7 million 'zero' employee businesses (estimated), for which no good lists are available. If the research is to cover the 'total' UK business market then it is necessary to decide if the 1.9-2.0 million covers all that is needed, or the 'zero' employee sector is important, in which case a sample frame for this universe is required. SOHO can be defined in many ways, but it is basically small independent businesses, and sampling is very problematic.
Research which is concerned with the major part of the UK enterprise population -- but not intended to fully cover the SOHO market -- can employ a variety of sources, the principal ones being:
* Thomson Directories: this file has 2.1 million records (locations). The information available covers full contact details for the location, other than a named person, their Thomson classification (1900 codes) and the number of employees at the site. No 'ownership' data are coded.
* Market Locations: also with 2.1 million records -- in fact they license the Thomson Directories File, though this is not their only data input source. They have added 1992 Standard Industrial Classification (SIC) code, US SIC code, premise type (head office, single site, branch status) and various named decision-makers (by function).
* TDS: with 1.7 million records (also utilise Thomson as input).
* The Business Database: with 1.9 million records in total, 300,000 of which are suppressed, thus access is available for 1.6 million only. This is based on Yellow Pages, part of BT, but is not exclusively BT customers. Information available is similar to Thomson with the addition of 1992 SIC.
* Dun & Bradstreet (D&B): have information on over 1.8 million actively trading UK businesses, including 580,000 sole traders and 230,000 branches. Their information also incorporates the Thomson Directories data, and is similar to Market Locations in relation to the additional data available, also giving a wide variety of financial data (including turnover).
In addition to the above sources there are a number of specialist database suppliers for specified subgroups of the business population, e.g. Compubase for IT research, etc.
Sample frame evaluation
In order to evaluate any sample frame it is necessary to ask the following questions:
* Is it complete?
* Does it define units in the way you require?
* Are any units duplicated on the list?
* What information is available for each unit?
* How up to date is it?
* Is it available?
* Is it easy to use for sampling purposes?
* Is it consistent over time? (Important for tracking surveys!)
Since 1970 databases have become available which include much more information that can be used for sample stratification purposes, and controlling the sample has become much more straightforward using computer-assisted telephone interviewing (CATI) sample management systems. The possibility of efficient sample design is therefore significantly improved.
Comparisons of the files
Table 2 shows a comparison of the number of businesses on the files (at end 2000) by number of employees, and highlights some interesting points:
* D&B tends to have company employee data rather than establishment employee data (this accounts for the high number of 500+ employee businesses). All other files have establishment employee data.
* All other files are basically similar and compare reasonably well to the DTI data, though only covering 1.5-2.0 million businesses, as noted earlier.
* The level of records with unknown employees varies between the files.
It should be noted that employee data are only 70-80% accurate on the files, i.e. an interview conducted with the establishment will record the same employee band as is coded on the sample database in about 70-80% of cases. 'Unknown employee' records have a similar distribution of employees to the remainder of the file. The differences between the establishment-based files (i.e. non-D&B files) and the DTI statistics are accounted for by: ex-directory or suppressed establishments on the telecom-based files; some establishments being included in the unknown employee group; and exclusion of many home-based businesses from the 'business' telecom files.
'The whole situation has become remarkably more practical of late.'
This sentence from McIntosh and Davies (1970) is cited mostly because it raised an ironic smile -- but also because it is a fact that with B2B sampling, pragmatism has to rule. If you are doing continuous work with a small population (e.g. of major corporates) then sensible rules on frequency of contact, definition of respondent, etc. have to be applied, sometimes to the detriment of perfect sample design.
Thirty years ago industrial research practitioners were fond of quoting the 80:20 rule, i.e. if 80% of the business comes from 20% of the customers then research can focus on the 20%. The 80:20 rule tends to apply less in mass business markets, and we tend to cover the whole market in market measurement surveys.
This approach is necessary because of the increased breadth of business markets, and is possible because of improved databases. The best sample strategy is to determine a sample stratification by size and type of organisation, to sample disproportionately to reflect market revenues and analysis requirements, and to control the sample closely using the CATI sample management systems available, throughout fieldwork
SOHO comprises small businesses and home businesses (single-unit companies). There are two alternatives for sampling:
(1) Screen small businesses sampled from one of the business databases, and add a residential screen sample to identify home-based businesses.
(2) Use lifestyle databases.
To extend any business sample to include effective (i.e truly representative) coverage of the SOHO market it is necessary to undertake home-based surveys. If the research objectives do not justify the (high) cost of an extensive screening of a residential sample to identify home businesses, a pragmatic solution is to select from one of the major lifestyle database companies:
* Experian -- this file has around 750,000 records of self-employed persons and around 450,000 other records of people who work from home.
* Consumer surveys -- these have a total of around 125,000 self-employed on file and around 165,000 people who work from home.
* Claritas has about 300,000 self-employed on its file.
It should be recognised that, when selecting samples from lifestyle databases, there are some definite biases in relation to age and income groups; this is less significant for the SOHO market than for some other universes, but nevertheless it is important to check the 'profile' of respondents' activity with that derived from the Labour Force Survey.
A note on industry coding in the available databases
As McIntosh and Davies noted, SIC coding is not as useful as it first appears, because companies can have more than one SIC code. The SIC system has been updated to take account of new industries, but is still very biased towards manufacturing and less useful for services.
The key point to note about the publicly available databases is that they all claim SIC coding, but their SIC codes are all based on automated recodes from Yellow Pages or Thomson classifications, and therefore can be very inaccurate.
It is also the case that some industry classifications tend to be more complete than others on the files -- for example public administration and legal services. Public administration is incomplete because many local authorities now publish a single telephone number for public access and all other numbers are ex-directory and therefore not available on the file. Legal services tend to remove themselves from telephone directories such as the Yellow Pages since they do not wish to be seen to advertise.
It is always advisable to collect industry data on the survey and code to SIC. The industry codes on the sample databases are not particularly reliable.
Sample size issues
It is obviously necessary to ensure that the sample size for a survey is sufficient to deliver the overall required levels of accuracy. However, it is also necessary to understand the level of subgroup analysis which will be required, and ensure that the subgroups are adequately represented. Consider also whether there is an 80:20 issue, and over-sample or under-sample appropriately. Further, ensure that important niche subgroups are properly represented, for example commercial printers and copy bureau in the office copier and print market, since otherwise the sample design might not cover them adequately.
Summary -- progress in recent years, and the future
In 2001 the industry has available to it much better universe statistics than in 1970, but they must be used with care. The quality and quantity of business databases have improved beyond recognition since 1970, but the commercially available lists still cover only 1.6 million business establishments. Public sector and SOHO sampling are still major problems, which are likely to remain unresolved for some time. Sampling at organisation level is problematic and is not likely to improve. Response rates are falling, and the industry is doing little to address the issue. Nevertheless, the situation has vastly improved since 1970, when there were essentially no business databases available for sampling, and certainly none which included useful stratification data for sampling. We can now select good stratified samples for business surveys. A further real quality improvement achieved in business research sampling in the last 30 years is due to the extensive use of CATI and automated sample management systems, which have enabled sample control and stratification to be refined to an extent which was not feasible in the days when most fieldwork was conducted face to face with the consequent need for two-stage (clustered) sample designs.
It is anticipated that the quality of sample frames will improve over time, but the real challenge for the immediate future is web-based surveys. As far as email databases covering the whole internet-enabled business population are concerned, we are in a similar position to where we were with telephone/address databases in 1970, and yet there is considerable pressure to use what is available for sample surveys.
The key problem is that response to a web-survey is self-selecting, and response rates are generally low; therefore, any sample is unlikely to be representative of the total population. The bias can be identified by comparison of responses to key questions against those obtained from a representative sample. In general the biases appear to be that web-based samples tend to include high proportions of heavy internet users and this affects the representativeness of the attitudes measured. Some US agencies have developed methodologies to 'calibrate' web-based consumer surveys against representative samples of the population and can produce accurate estimates by 're-weighting' the web-based sample data. It is possible that similar approaches might be applicable to business surveys; however, considerable statistical analysis of comparable surveys will be required before such approaches can be validated.
In the meantime we need to continue to educate research buyers in the value of proper representative sampling, and use the web as an alternative or additional method of data collection, not as a replacement for proper representative surveys. Incomplete databases are not, in any case, suitable for market measurement.
Collecting usage information
In this section we deal with the specific problem of estimating the business market for a specific product or service for which usage is extremely variable across the business universe. Examples include: total telecoms spend; number of inland and international parcels sent; total copies and prints made; IT expenditure; and internet usage. The key defining feature is that the usage per business per month can vary from tens to millions of units or [pounds sterling] value.
From a statistical point of view the problem can be seen in two parts. First, a good sample design has to be developed such that the data can be accurately grossed up to universe levels to produce a market estimate. Second, an accurate usage figure has to be obtained for each sampled unit. We deal with the second issue first.
Obtaining accurate usage data from each sampled unit
The preliminary issue is that not all businesses in the selected sample frame may participate in the market under consideration, so it is necessary to screen a random sample to establish penetration. Here it is critical to avoid any over or under-reporting bias in the response rate, i.e. interviewers reporting non-usage as refusal -- as in 'of course we don't send parcels, we're a hairdressers, and kindly get off the phone and stop wasting our time'. Of course, it would be easy for this to be simply recorded as a refusal, when, in actuality, it is a 'don't use, refusal' -- a fact which should be taken into account in the grossing up.
In market measurement studies it is more than usually crucial to record non-response in more detail than is normally necessary. Otherwise valuable information is lost. Assuming a willing respondent, the second issue is to obtain an accurate estimate of usage at the business establishment. The first problem is that the respondent may genuinely not know, the second is that he/she may not be aware of total usage at the business, the third that inaccurate information may be given inadvertently; a fourth is that the interviewer may record the information incorrectly.
Respondents who genuinely do not know the usage levels can be dealt with by:
* Asking if someone else knows and transferring to that person.
* Asking the respondent to look up the data and calling back to collect it.
* Asking for an estimate. In this case, starting with a prompt-list of broad ranges (e.g. 'is it above or below [pounds sterling]1000?') and leading the respondent through from the broad range to more detailed ranges is frequently a good strategy to get a reasonable estimate.
Respondents not responsible for total usage can be checked by asking questions such as, 'are you responsible for all the telecom bills at this site?', and if the answer is 'no' identifying secondary respondents to follow-up. The problem can be particularly complex if the market being measured is naturally segmented in some way which causes it to be handled by different people in the business, e.g. slower parcels go through the post-room but urgent parcels through reception, or 'office parcels' go through the post-room but 'product' despatches through a dedicated despatch department.
Inadvertently incorrect answers are usually given when, for example, the respondent gives an annual figure instead of a monthly figure, or the figure for several sites for which he/she is responsible, rather than the sampled site that you are trying to cover in the interview.
Incorrect recording of answers by interviewers is most frequently encountered when recording actuals. A couple of extra zeros can easily be entered, and unless checked via a logic check or asking the interviewer to also code a range, can easily go undetected and cause problems at a later stage in the analysis.
'Logic' checks or checks of reasonableness are always useful. As in '1000 parcels a week is above average for a business of your size; can I just confirm that I heard you correctly?'. This is clearly a case where good intelligent interviewing is a necessity, and those who force interviewers into too rigid an adherence to a script can end up with poor results. The computer can flag that the answer is outside a 'normal' range; the phrasing of the follow-up question should allow for some interviewer discretion and common sense.
Some researchers may prefer to review results of key questions on a daily basis -- identify outliers and have them checked by a different process. The key point is to identify and check that any apparent 'outliers' are correctly recorded before moving to the weighting and grossing-up stage.
Grossing the data to universe levels
To obtain a market estimate the data have to be grossed to universe levels. First, any bias in overall response rates must be dealt with (see Eliminating bias due to non-response). Second, any non-response on key variables must be estimated (see Eliminating bias in market estimates caused by non-response on usage data). The first grossed-up estimates of the market are then obtained. These must then go through a rigorous checking process.
Grossing data to universe levels is most properly undertaken via inverse probability weighting. This is time-consuming since in most cases the universe must be re-estimated to take account of duplication, non-existent businesses, etc. in the sample frame employed, and there is the problem of 'non contacts' -- do they in fact exist? Generally a published target matrix can be employed, if the researcher is confident that it is a good estimate of the universe.
When checking estimates against other data, ideally there are some (reliable) published data available from another source, which can be used to confirm the estimates. Beware non-reliable data, particularly anecdotal information, which can be completely incorrect. Beware even your clients' own sales figures; they can be incomplete.
It is useful always to list systematically any records that are contributing more than x% to any universe estimate. (Define x based on the overall sample size. If the sample size is 100, check on all records contributing more than 5% of the estimate, if 500 more than 1%, etc.) Double-check that the data have been recorded correctly. It is then sensible to adopt a policy for outliers, e.g. no single record must account for more than n% of any market estimate, and then adjust the weight factors to achieve this result.
If the data do not check with other 'published' data, think why this might be the case. Is there something included in the published data which has not been covered in the survey? (For example in the parcels market the top five senders (mail-order companies) account for a large percentage of the market - they are not covered in the market surveys, hence their volume has to be removed from any overall market figures used for checking purposes.) Another possibility is that the sample frame used was incomplete, and some key parts of the market have been excluded from the research in error. The effect of outliers on the overall total should also be reassessed.
Basically, if the sample is representative, and the data have been recorded and weighted correctly, the market estimate should be correct.
Response rates and dealing with non-response
Can response rates be improved?
There is a considerable body of intelligence which is telling us that response rates are falling across the market research industry, and this is particularly so for time-poor business executives, especially those in ICT and those with financial responsibilities. The volume of business research has increased exponentially since the late 1960s, and so it is inevitable that some decline in response rates will occur.
However, it is also the case that good response rates can still be achieved with good sample management, good interviewers and time. The problem is that frequently time and productivity pressures are such that it is necessary to 'throw' sample at a survey and accept what is achieved as representative. This is particularly the case with continuous surveys where monthly sample has to be achieved and management of the more difficult 'top-end' (i.e. large organisation) sample has to be 'pragmatic'.
Does all this matter? Of course it does - if we do not maintain a real understanding in the industry of the necessity of representativeness, then all too quickly junior practitioners will forget it, and the need for productivity will supersede all other considerations. How can the industry address the issues? Clearly the stratification of the sample into relevant sub-cells is critical. Management of 'top-end' sample via appointments which can run into subsequent months is a good strategy for continuous work. The key issue is the 'what's in it for me' element for respondents, i.e. 'giving back' to respondents something of value in return for their time. This 'giving back' includes the interview experience - which should be as pleasant as possible - but also some real 'value' in return for the time spent.
Clearly the industry and our clients cannot support the widespread use of incentives (except for small specialist populations such as doctors), and in any case senior respondents do not want token amounts of money, or what some see as the emotional blackmail of charity donations. Research shows that feedback on the survey is identified by respondents as the most useful benefit.
Historically, mailing of feedback to respondents has always been fraught ('what will the client allow?', 'what will be of interest?') and slow, since clearly the client must see results first and approve whatever is sent, and it is hard to make such a mailing a high priority; but now that we have the internet, surely we could at least inform respondents that the feedback will be available on a web-site on a given date, and get clients to participate in the process? Or offer some form of information helpline as a quid pro quo?
Without positive action on the part of the business research industry there is a great danger that response rates will continue to fall increasingly over time.
Eliminating bias due to non-response
To some extent we can never know the effect of bias caused by refusals. However, we can minimise the effect by appropriate segmentation of the sample and by capturing as much information as possible on non-respondents.
Segmenting the sample
The statistical theory here is that the sample should be segmented or stratified by variables which reflect the maximum variance in the variable to be measured. The most efficient design then is the one which minimises the standard error on the variant for a given sample size. As McIntosh points out, this is all very well when the data on variance are known, and when only one variable is being measured. In real life this is rarely the case (McIntosh & Davies 1970).
In general terms therefore it has become standard practice to stratify on the most appropriate 'size' variable, and over-sample larger units relative to smaller units, the sample fractions being defined on the base of a mix of knowledge of the variances of the key variables to be measured and pure pragmatism, in terms of sample available, likely response rates, etc. This practice, advocated by McIntosh and Davies (1970), is still the best approach, but it is always sensible to add a further dimension which takes account of both the client's own segmentation of the market by vertical sector and the likely variation in the variable by sector within size. A classic example is to identify a separate segment of commercial printers and copy bureaux when measuring copy or print volume, since these are likely to have high volumes though comprising only small businesses (in terms of numbers of employees); similarly for small mail-order companies when measuring the parcels market.
The basic rule is to think carefully about the universe before designing the sample. Are there any parts of that universe which are unlikely to exhibit unusually high usage relative to their size, or generally behave differently? If so, then these groups should form separate sample strata.
Collecting information on non-response
The key point here is that when screening for usage or eligibility to remember that this screening is a critical part of the market measurement process. Those who are not eligible are not simply to be disregarded - but to be carefully measured so that the true universe of users can be calculated. Two rules apply:
(1) Collect basic demographics (at least size and industry) for as much of the sample as possible. Do not rely on the size and industry data supplied by the sample database, since these are frequently inaccurate.
(2) Collect reasons for non-response and refusal. For non-response the critical thing to establish is whether or not this business unit exists, since this affects the size of the universe. Not all businesses operate 9am-5pm office hours, therefore interviewers should call at different times and in the evenings. For refusals the key question is does the business qualify or not, so try at least to ask the one question of someone who is likely to know the answer, and record the information in the outcome coding.
Eliminating bias in market estimates caused by non-response on usage data
Non-response at the overall contact level is obviously of critical importance to sample representativeness; elimination of bias is dealt with in the subsection entitled 'Grossing the data to universe levels', where weighting and grossing-up procedures are covered.
Non-response also occurs throughout the questionnaire where respondents genuinely do not know the answers to questions. This happens in all types of research but in business research, particularly for quantitative usage questions, it is especially important because non-response tends to be highly correlated to the level of usage, i.e. the higher the usage the higher the level of non-response.
In producing a market estimate of usage this non-response has to be taken into account. The best approach is to allocate an average to each non-respondent based on appropriate size/segment criteria, and feed these estimates into the computer tabulation programme. Otherwise complex calculations have to be made, applying averages to non-response within detailed demographic breakdowns to come to a total estimate. If estimates are required by other criteria such as industry, then manual calculations become overburdensome -- and can result in different answers (since they are only estimates).
Calculating standard error
Calculating the standard error on estimates derived from a complex business sample is not straightforward. Essentially no better methods have been found than techniques such as Deming and Tukey's Jack-knife which was developed along with others in the 1950s and 1960s (see McIntosh & Davies 1970).
In general, the accuracy of the estimate in market size studies is judged by comparison with published data, or the client's internal information, and the complex calculations required to estimate standard errors is not necessary.
An interesting feature of business markets is that the equivalent panel operations developed to track consumer markets have not emerged for business markets -- mainly because measurement is difficult and clients' requirements are themselves very varied, with little agreement on standard demographics or segmentations.
Most business markets are measured via data-sharing consortia or compilations of sales data by neutral parties. These data generally suffer from definitional issues, and cannot give the detail which is available to consumer marketeers.
McIntosh and Davies (1970) set out to define a 'variety of practical and theoretical techniques' which were available to increase the precision of survey estimates in mass business markets. They also recognised the clear need for further development and experimentation in the area of business research.
The intervening 30 years have seen great improvements in the quality and availability of government statistics, and business sample frames. The use of CATI systems for the majority of business work has also meant that it has been possible to improve the quality and precision of sample management and data collection accuracy, beyond anything dreamt of at that time, when telephone interviewing was as new and treated with the same suspicion as internet interviewing is today. Nevertheless, the majority of technical problems identified in 1970 have been addressed and understood to a great extent.
However, a new set of issues has arisen, the technical challenges facing business researchers in the next decade can be summarised as follows:
* Response rates must be maintained and, if possible, improved.
* The pressures to save costs and reduce delivery times will adversely affect sample quality unless good sample management measures are utilised.
* The application of learning concerning the use of incomplete sample frames needs to be extended to web sample surveys.
* The extension of the business universe, in terms of home businesses and respondent (decision-maker) definition, needs to be better understood.
Researchers must continue to monitor and adapt the techniques they use, and not forget that business markets do present complex sampling problems. Having adopted consumer sampling techniques for use in business sampling, perhaps of greatest interest for the future will be the application to business markets of consumer research thinking and theory on advertising, brand strategy and CRM.
Phyllis Macfarlane is currently CEO of the NOP Research Group, the UK's largest ad hoc research company. Starting as Assistant Statistician at MIL Research in 1969, Phyllis has spent most of her career on B2B research and specifically on business universe and market measurement issues, working for major organisations such as BT and Xerox, both in the UK and internationally.
(1.) Stratification is defined as selection from several sub-populations called strata, into which the population is divided. Strata are typically defined as homogeneous sub-populations within the total universe. Selection from strata can be proportionate or disproportionate (Kish 1965).
Deming, W.E. (1960) Sample Design in Business Research. New York: John Wiley & Sons.
DTI (2001) SME Statistics for the UK, 2000. London: Department of Trade and Industry.
Kish, L. (1965) Survey Sampling. New York: John Wiley & Sons.
McIntosh, A.R. & Davies, R.J. (1970) The Sampling of Non-Domestic Populations. Reprinted journal of the Market Research Society (1996) 38,4.
Table 1 DTI statistics: enterprises vs establishments Enterprises (000) (+) Establishments (000) Number of employees 1999 1999 None 2340 618 + 1787 (*) 1-4 923 1183 5-9 204 326 10-19 112 190 20-49 48 117 50-99 15 38 100-199 8.1 19.6 200-499 4.7 10.2 500+ 3.4 3.4 Total 3658 4292 1995 Number of employees Census of employment (000) None - 1-4 948 5-9 10-19 181 20-49 80 50-99 38 100-199 18.1 200-499 9.9 500+ 3.6 Total 1279 (+)Excludes central government/local authorities/non-profit sector. (*)Self-employed without employees working 15+ hours per week. Table 2 Comparisons of sizes of business sample files by number of employees Size band DTI (SME (number establishments) The Business of employees) (000) D&B (000) Database (000) 0 (*) 2405 * * 1-4 1183 731 843 5-9 326 200 247 10-19 190 110 114 20-49 117 73 113 50-99 38 22 32 100-199 19.6 12 13.6 200-499 10.2 8.4 9.6 500+ 3.4 6.9 Unknown - 448 136 Total 4292 1611 1508 Size band (number Market Thomson TDS of employees) Location (000) Directories (000) (000) 0 (*) * * * 1-4 895 1132 905 5-9 234 292 10-19 161 576 210 20-49 105 87 50-99 32 36 34 100-199 15 15.2 17 200-499 8.7 8.3 6.2 500+ 3.4 3.8 3.8 Unknown 225 297 131 Total 1679 2068 1686 (*)All database companies count zero employee/companies as one-person companies.
|Printer friendly Cite/link Email Feedback|
|Publication:||International Journal of Market Research|
|Date:||Jan 1, 2002|
|Previous Article:||Guest editorial.|
|Next Article:||Egotists, idealists and corporate animals -- segmenting business markets.|