Methodology for the estimation of annual population stocks by citizenship group, age and sex in the EU and EFTA countries.

The paper addresses selected computational issues related to the challenge of dealing with poor statistics on international migration. Partial results of the on-going

 Eurostat-funded project on "Modelling of statistical data on migration and migrant  
 population" (MIMOSA
) are presented. The focus is on the data on population stocks by broad group of citizenship, sex and age. After a brief overview of the main problems with data on population by citizenship for 31 European

 countries (27 European Union

 countries, Iceland, Liechtenstein, Norway and Switzerland), a range of computational methods is proposed including cohort-wise interpolation

cohort-component projections, cohort-wise weights propagation and proportional

 fitting methods, as well as other, auxiliary

 methods. The algorithm


 for choosing the best method for estimating missing data on population stock by broad citizenship (nationals, foreigners--EU27 citizens, foreigners--non EU27 citizens), five-year age group (up to 85+) and sex on 1st January 2002-2006 is proposed and illustrated by examples of its application for selected countries.

Povzetek: Opisane so razlicne metode za ocenjevanje demografskih podatkov.

Keywords: population estimates, stocks by citizenship, Europe, missing data, cohort-wise methods, fitting methods

1 Introduction

 and widely discussed in the literature [1, 7]. The aim of the paper is to contribute to the works on dealing with these shortcomings

 and to propose a set of computational methods, as well as an algorithmic  
 procedure of selecting the best one, for the estimation

 of population stocks as of 1st January in a breakdown by sex, age group and broad citizenship category, for the countries for which information is unavailable or incomplete.

The study was carried out within a Eurostat-funded project on "Modelling of statistical data on migration and migrant population" (MIMOSA). It covers 31 European countries, of which 27 belong to the European Union (as per 1st January 2007), and further four--to the EFTA EFTA: see European Free Trade Association.  (Iceland, Liechtenstein, Norway and Switzerland). The period of interest is 2002-2006. The citizenship groups under study are: nationals, European Union (EU27) foreigners and non-EU27 foreigners, while the age groups are five-year, with the last, open-ended category being 85 years or more.

Generally, the proposed estimation methods aim to combine data from different sources (population census, vital statistics, data on acquisition of citizenship, specific surveys, etc.). In principle, the data that are already available are not modified (for example, in order to harmonise definitions, or for any other reason), unless in the case of inconsistencies between the sources. In the latter cases, the demographic data, provided to Eurostat by national statistical institutes (NSIs), are given priority.

 examples for selected countries. The discussion is summarised in Section 5.

: Joint Migration Questionnaire) of Eurostat, UN Statistical Division, UN Economic Commission for Europe, the Council of Europe and the International Labour Office. 'LFS' depicts the Labour Force Survey.

2 Availability of the 2002-2006 data on population stocks for 31 European countries

* Table 7a (for 2000-2003, 8a): Usually resident population by citizenship and age, both sexes;

* Table 7b (for 2000-2003, 8b): Usually resident population by citizenship and age, males;

* Table 7c (for 2000-2003, 8c): Usually resident population by citizenship and age, females.

Data on total population stock on 1st January, not disaggregated Broken up into parts.  by citizenship, are also collected by Eurostat within the framework of the Annual Demographic Statistics Among the kinds of data that national leaders need are the demographic statistics of their population. Records of births, deaths, marriages, immigration and emigration and a regular census of population provide information that is key to making sound decisions about national policy.  data collection. These data, disaggregated by sex and age, are located under the Population and social conditions theme, in the Demography demography (dĭmŏg`rəfē), science of human population. Demography represents a fundamental approach to the understanding of human society.  (DEMO) domain of the database, table demo_pjan. The results of the review of the availability of these data for the years 2002-2006 revealed that the data on total population stock by sex and five-year age group (up to 85+) are available for all 31 countries, with the following exceptions: there is no 2006 data by age for Belgium and Italy, while for Romania the highest age group in 2004 data is 80+.

 in the Eurostat database is very limited and the reliability of data is probably not high, due to the nature of the data source. By definition, the LFS statistics are estimates and thus bear certain errors, which tan be relatively high for disaggregated categories (e.g., for population broken down by age, sex and citizenship groups). However, some use of the LFS data could be considered as an alternative to the proposed methods in the countries where data on total nationals and total foreigners are missing.

In the Eurostat database, the LFS tables are located under the Population and social conditions theme, the Labour market (LABOUR) domain, in the table with population data containing the nationality nationality, in political theory, the quality of belonging to a nation, in the sense of a group united by various strong ties. Among the usual ties are membership in the same general community, common customs, culture, tradition, history, and language.  dimension (population by sex, age groups, nationality and labour status, table lfsa_pganws). However, the table does not contain data on the level of individual countries of citizenship and only data on total population and on nationals could be useful for this project. Estimates of the 2002-2006 stock of the EU27 citizens cannot be prepared using the LFS tables in the Eurostat database. These considerations need to be taken into account when proposing computation Computation is a general term for any type of information processing that can be represented mathematically. This includes phenomena ranging from simple calculations to human thinking.  methods for the current study.

3 Proposed methods of estimating population stocks by citizenship, sex and age

The current section presents a theoretical background of the methods proposed for the calculations of the missing elements in population stocks by age, sex and citizenship group. After a brief summary of the notation notation: see arithmetic and musical notation.

the following methods are discussed: interpolation of five-year into one-year age groups, regarded as a preparatory
 proportional fitting. Section 3 concludes by presenting some auxiliary methods for dealing with the Unknown categories, and for the estimation of missing elements of age distributions (3.7).

3.1 Notation and basic concepts

Throughout the paper, the notation used for population variables follows a common convention presented below. In all cases, the superscript Any letter, digit or symbol that appears above the line. For example, 10 to the 9th power is written with the 9 in superscript (109). Contrast with subscript.  n indicates one of the three broad groups of citizenship: nationals, EU27 foreigners or non-EU27 foreigners, abbreviated as N, EU and nEU, respectively, thus reflecting the composition of the European Union as of 1st January 2007. The non-EU27 group includes also the stateless persons Noun 1. stateless person - a person forced to flee from home or country
displaced person, DP

refugee - an exile who flees for safety

Civilian who has been denationalized or whose country of origin cannot be determined or who cannot establish a
. An abbreviation FOR is used for all foreign population (EU27 and non-EU27 together). For the transparency (1) The quality of being able to see through a material. The terms transparency and translucency are often used synonymously; however, transparent would technically mean "seeing through clear glass," while translucent would mean "seeing through frosted glass." See alpha blending.  of presentation, the country index is skipped, as all calculations proposed in the paper are always country-specific. The variables in question are as follows:

Stock variables:

[P.sup.n](x, t)--Population in broad citizenship group n, in the age of x years on 1st January, year t.

[P.sup.n](x, c) - Population in broad citizenship group n, in the age of x years at the census date c.

Event variables:

[B.sup.n](t)--Births during calendar year t in citizenship group n;

[D.sup.n](x, t)--Deaths of persons aged x years, belonging to citizenship group n, during calendar year t;

[l.sup.n](x, t)--Registered immigration immigration, entrance of a person (an alien) into a new country for the purpose of establishing permanent residence. Motives for immigration, like those for migration generally, are often economic, although religious or political factors may be very important.  of persons in citizenship group n, aged x years, during calendar year t, regardless of the country of origin;

[E.sup.n](x, t)--Registered emigration emigration: see immigration; migration.  of persons in citizenship group n, aged x years, during calendar year t, regardless of the country of destination;

[R.sup.n](x, t)--Outcome of the regularisation Noun 1. regularisation - the condition of having been made regular (or more regular)

condition, status - a state at a particular time; "a condition (or state) of disrepair"; "the current status of the arms negotiations"

 of the status of formerly irregular HEIR, IRREGULAR. In Louisiana, irregular heirs are those who are neither testamentary nor legal, and who have been established by law to take the succession. See Civ. Code of Lo. art. 874.  residents (cf. [4]) aged x, in year t, by definition referring only to foreigners, n [member of] {EU, nEU}, thus with [R.sup.N](x, t) [equivalent to] 0;

[S.sup.n](x, t)--Statistical adjustment (official correction) of the size of population aged x, in year t, due to the reasons other than regularisations;

[A.sup.n](x, t)--Acquisitions of citizenship by the population aged x, in year t, by definition referring only to foreigners, n [member of] {EU, nEU}, with [A.sup.N](x, t) [equivalent to] 0.

In word processing and scientific notation, a digit or symbol that appears below the line; for example, H2O, the symbol for water.

left-hand side left nlinke Seite f

3.2 Interpolation of five-year age groups into one-year groups

 of five-year age groups of population (or events) into single years. This has to be performed in order to enable cohort-wise interpolations, cohort-component projections with yearly steps, or cohort-wise weights propagation, as described in Sections 3.3, 3.4 and 3.5.

If auxiliary information is available from a different source (e.g. from a census, from the previous or next year, etc.), the population size or the number of events can be disaggregated using a 'Prorating' method [11, p. 5-61], whereby the relative distribution from the auxiliary source is imposed on the data in question. The obtained distribution might need to be further adjusted to marginal totals, by means of proportional fitting procedures, described in Section 3.6.

[n.sub.1] = 0.064 [N.sub.i-1] + 0.152 [N.sub.i] - 0.016 [N.sub.i+1],

[n.sub.2] = 0.008 [N.sub.i-1] + 0.224 [N.sub.i] - 0.032 [N.sub.i+1], etc.

n the fundamental reasons used as the basis for a decision or action.
 is an assumption that non-vital events are equally spread over the period-age squares of the Lexis diagram. In any case, the estimates for the eldest ELDEST. He or she who has the greatest age.
     2. The laws of primogeniture are not in force in the United States; the eldest child of a family cannot, therefore, claim any right in consequence of being the eldest.
 cohorts would be close to zero for all practical migration-related applications.

Regardless of the method, if the disaggregation is performed on data broken down by sex or citizenship, the final estimate might need to be obtained by proportional fitting methods (described in Section 3.6), in order to ensure the summation to available marginal totals.

3.3 Cohort-wise interpolation of population stocks

Given the information on the age structures of the population for two non-adjacent moments of time, a simple idea to obtain the missing figures for in-between moments would be to apply interpolation techniques. In this case, we propose cohort-wise interpolation for all cohorts apart from the youngest and oldest one, which are discussed separately. Overall, this method requires much less information on input than the cohort-component projections presented in the next section, but it requires information about population both before and after the moment for which the estimates are to be done. The interpolative approach is recommended for the cases where (a) the span between two points with available data is not wide (say, two-three years), and (b) no information on the distribution of vital and migratory migratory /mi·gra·to·ry/ (mi´grah-tor?e)
1. roving or wandering.

2. of, pertaining to, or characterized by migration; undergoing periodic migration.


emanating from or pertaining to migration.
 events by citizenship is available.

In practical applications, as the ones described in Section 4, it often happens that the data are available for year t from the census conducted at time c (t [less than or equal to] c < t+1), and for 1st January of the year t+k, not immediately following the census. Such situations can be put in a general framework illustrated on a Lexis diagram in Figure 1, where [alpha] denotes the fraction of a year remaining after the census until 31st December. Figure 1 and the methodology proposed below cover also the situations when data come from other sources than the census, and the situations when the reference date of the data for year t is 1st January. In the latter case it suffices to set [alpha] = 1.

 at the census date c, the interpolation can follow various patterns, but an arithmetic and geometric pattern of growth [3, 10] will be the most frequent choices. As noted by Rowland [10, p. 50], "under arithmetic growth, successive population totals differ from one another by a constant amount [, while] under geometric growth they differ by a constant ratio". For short-period interpolations, both approaches should yield similar results, although this is an empirical issue, and there is no convincing argument in favour of either of them. Hence, a selection of appropriate methods should rely on case-specific judgements.


It has to be noted that the cohort aged x completed years on 1st January t+k was split at the census date between two age groups: the younger one (aged x completed years) and the older (aged x+1), as shown in Figure 1. Therefore, the interpolative estimate of [P.sup.n](x, t+i) depends on [P.sup.n](x, c), [P.sup.n](x + 1, c) and [P.sup.n](x, t+k).

Given the above, the formula for an interpolative estimate of population sizes belonging to a particular age group x+i and citizenship group n, assuming the linear pattern of change, is as follows:

[P.sup.n](x+i, t+i) = (k-i) / (k-1+[alpha]) x [[alpha] x [P.sup.n](x, c) + (1-[alpha]) x x [P.sup.n](x+1, c)] + (i-1+[alpha]) / (k-1+[alpha]) x [P.sup.n](x+k, t+k), (1a)

while for the geometric change:

[P.sup.n] (x+i, t+i) = [{[[[alpha] x [P.sup.n](x, c) + (1-[alpha]) x [P.sup.n](x+1, c)].sup.k-i]. x [P.sup.n][(x+k, t+k).sup.i-1+[alpha]]}.sup.1/(k-1+[alpha]). (1b)

For the youngest and oldest cohorts, for which interpolation as proposed above is not possible, a simplified solution is proposed. In such cases, we suggest to take the average shares (proportions) of the sizes of the respective age groups in the total population, calculated from the data available for neighbouring periods, weighted by the distance between the available data points and estimation point.

In order to ensure consistency of the results and summation of the age-specific estimates to the marginal totals by sex or citizenship group, whenever available, the estimates have to be adjusted by the means of proportional fitting, presented in Section 3.6.

3.4 Cohort-component projections

As concerns projections, let us denote by [X.sup.n](x, t) a sum of all event variables not related to the natural change of population stocks (i.e. all but births and deaths), thus:

[X.sup.n](x, t) = [I.sup.n](x, t) - [E.sup.n](x, t) + [S.sup.n](x, t) + [[summation].sub.k[member of]]{EU, nEU}] [A.sup.k](x, t), for n = N; (2a)

[X.sup.n](x, t) = [I.sup.n](x, t) - [E.sup.n](x, t) + [S.sup.n](x, t) + [R.sup.n](x, t) - [A.sup.n](x, t), for n [not equal to] N. (2b)

Given (2a) and (2b), the population accounting equations for each broad citizenship group are:

[P.sup.n](0, t+ 1) = [B.sup.n](t) - [D.sup.n](0, t) + [X.sup.n](0, t); (3a)

[P.sup.n](x, t+1) = [P.sup.n](x-1, t) - [D.sup.n](x, t) + [X.sup.n](x, t), for x [member of] {1, 2, ..., [x.sub.max]-1 }; (3b)

[P.sup.n]([x.sub.max], t+1) = [[P.sup.n]([x.sub.max]-1, t) + [P.sup.n]([x.sub.max], t)] - [D.sup.n]([x.sub.max], t) + + [X.sup.n]([x.sub.max], t). (3C)

In (3c), [x.sub.max] stands for the highest (open-ended) age group for which information is available. Note also that deaths and other event variables in age group [x.sub.max] refer to the trapezoid trapezoid, closed plane figure bounded by four line segments, or sides, two of which are parallel and two of which are nonparallel. The parallel sides of a trapezoid are called bases and the nonparallel sides legs; in an isosceles trapezoid the legs are of equal  on the Lexis diagram rather than to a parallelogram parallelogram, closed plane figure bounded by four line segments, or sides, with opposite pairs of sides parallel and equal in length. The rhombus, rectangle, and square are special types of parallelograms. , while for age group 0--to a right triangle, as shown in Figure 2.


a) lus sanguinis

[B.sup.n](t) = [B.sup.*](t) [[summation].sub.x] [f.sup.n](x) [P.sub.f.sup.n] (x, t) / [[summation].sub.k,x] [f.sup.k](x) [P.sub.f.sup.k](x, t), (4)

b) Ius soli

If the child automatically acquires the citizenship of a given country, then the balance equation for the youngest age group, (3a), becomes, depending on the citizenship in question:

[P.sup.N] (0, t+1) = [B.sup.*](t) - [D.sup.N] (0, t) + [X.sup.N](0, t), for n = N; (3a')

[P.sup.n] (0, t+1) = [X.sup.n](0, t) - [D.sup.n](0, t), for n [not equal to] N. (3a")

In mixed cases, it is recommended to project one part of births according to formulas for ius soli and another part according to the ius sanguinis principle.

Note also that losses of citizenship are not accounted for, as they in most instances concern persons in reality either already living abroad, or emigrating (and counted in E). For acquisitions of citizenship, we assume that non-nationals fall in the category of nationals upon naturalization naturalization, official act by which a person is made a national of a country other than his or her native one. In some countries naturalized persons do not necessarily become citizens but may merely acquire a new nationality. , in order to count the same people only once, regardless of the number of citizenships they have.

, whereby the citizenship distribution of the considered cohort in the previous year would directly apply to all cohorts except the first and the last one in each year. In particular, this situation applies if the following four conditions hold:

1. Total population by age, [sub.5] [P.sup.*](x, t), is known for successive years, but the citizenship structure is missing;

2. We may assume that the distribution of deaths and migration flows by broad citizenship is the same as the citizenship composition of the population;

3. Acquisitions of citizenship may be ignored;

In such cases, the projection equation (3b) combined with proportional fitting is equivalent to proportional decomposition of [sub.5] [P.sup.*](x, t) by citizenship group described in Section 3.6.1. The estimations can be performed using the formula:

[sub.5][P.sup.n] (x, t) = [sub.5] [P.sup.*](x, t) x [sub.5] [P.sup.n](x-1, t-1) / [sub.5] [P.sup.*](x-1, t-l). (5)

The first and the last cohort may be disaggregated using the citizenship composition of the first and last age group in the previous year. In such cases, the following formulas apply:

[sub.5][P.sup.n] (0, t) = [sub.5][P.sup.*] (0, t) x [sub.5][P.sup.n] (0, t-1) / [sub.5][P.sup.*] (0, t-1), or: (6a)

[sub.5][P.sup.n] ([x.sub.max], t) = [sub.5][P.sup.*] ([x.sub.max], t) x [sub.5][P.sup.n] ([x.sub.max], t-1) / [sub.5][P.sup.*] ([x.sub.max], t-1). (6b)

3.5 Cohort-wise weights propagation

In some cases, too much information on the age-sex-citizenship distribution of the components of population change is missing, which renders projections too dubious with respect to the number of assumptions that need to be made. In practice, in such instances the only reliable information comes from the population census and from annual population stocks available in the DEMO domain of the NewCronos database. Hence, the proposed procedure is as follows.

For the census population, apply the structure by citizenship, taken from each five-year age group, to the respective single-year age groups (i.e. from age group 0-4 to single ages 0, 1 ..., 4; from 5-9 to 5, 6, ..., 9 etc.). Let [w.sup.n](x, c) = [P.sup.n](x, c) / P(x, c) denote the age-specific shares ('weights') of citizenship group n in the census.

Further, set [alpha] as a fraction of the calendar year before the census date. It is implicitly assumed that the census population in single-year age groups can be divided between 'older' and 'younger' cohorts using the [alpha] and (1-[alpha]) partition A reserved part of disk or memory that is set aside for some purpose. On a PC, new hard disks must be partitioned before they can be formatted for the operating system, and the Fdisk utility is used for this task. . For the census date, use the following formula to calculate the share of citizenship group n in the cohort that was aged x years on 1st January of the census year:

[w.sup.n](x+[alpha], c) = [(1 - [alpha]) x [P.sup.n](x, c) + [alpha] x [P.sup.n](x+1, c)] // [(1 - [alpha]) x [P.sup.*](x, c) + [alpha] x [P.sup.*](x+1, c)], for x < [x.sub.max]; (7a)

[w.sup.n]([x.sub.max] + [alpha], c) = [P.sup.n]([x.sub.max, c) / [P.sup.*]([x.sub.max], c). (7b)

For the 1st January of the census year assume that the weights [w.sup.n](x, t) = [w.sup.n]([x+[alpha], c). For the 1st January of the year following the census year (t > c), assume in turn:

[w.sup.n](x, t) = [w.sup.n](x-1+[alpha], c), for 0 < x <[x.sub.max]; (8a)

[w.sup.n]([x.sub.max], t) = [[P.sup.n]([x.sub.max]-1, c) (1-[alpha]) + [P.sup.n]([x.sub.max], c)] // [[P.sup.*]([x.sub.max]-1, c) (1-[alpha]) + [P.sup.*]([x.sub.max], c)]. (8b)

For the youngest age group assume [w.sup.n](x, t) = [w.sup.n](0, c), or alternatively that the shares are the same as the shares of citizenship group n in the births during the census year, so as: [w.sup.n](0, t) = [B.sup.n](t-1) / [B.sup.*](t-1). For consecutive years calculate:

[w.sup.n](x, t)= [w.sup.n](x-1, t-1), for x = 1, ..., [x.sub.max]-1; (9a)

[w.sup.n]([x.sub.max], t) = [[P.sup.n]([x.sub.max]-1, t-1) + [P.sup.n]([x.sub.max], t-1)] // [[P.sup.*] ([x.sub.max]-1, t-1) + [P.sup.*]([x.sub.max], t-1)]; (9b)

[w.sup.n](0, t) = [w.sup.n](0, t-1), (9c)

or, as an alternative to (9c): [w.sup.n] (0, t) = [B.sup.n] (t-1) /[B.sup.*](t-1).

Subsequently, calculate populations for all years using the above shares and total populations (available e.g. from DEMO), as: [P.sup.n](x, t) = [P.sup.*](x, t) x [w.sup.n](x, t). Finally, aggregate single-year age groups into five-year ones.

3.6 Proportional fitting methods

In addition to a potential application as the main estimation method, proportional fitting may be used, in almost all the countries for which estimations are needed, as the final stage of the estimation procedure, in order to adjust the initial estimates to known aggregates or marginal totals. The initial estimate might be obtained for example using interpolation or projection, or assumed to be the same as at some different time (e.g. the same as at the census date). Such an initial estimate has to be subsequently adjusted for example to the known total population size by age and sex.

3.6.1 Proportional adjustment / decomposition

For example, if the aggregates [P.sub.g.sup.*] (x, t) and an initial estimate of the citizenship structure [P'.sub.g.sup.n](x, t) are known, then the final estimate [P".sub.g.sup.n](x, t) may be obtained as:

[P".sub.g.sup.n](x, t) = [P'.sub.g.sup.n] (x, t) x [P.sub.g.sup.*](x, t)/[P'.sub.g.sup.*](x, t). (10)

In particular, if one wants to estimate the breakdown by citizenship using the citizenship structure taken from the census, [P.sub.g.sup.n](x, c), then (10) becomes: [P.sub.g.sup.n](x, t) = = [P.sub.g.sup.*] (x, t) x [P.sub.g.sup.n](x, c) / [P.sub.g.sup.*](x, c).

3.6.2 Direct proportional fitting

3.6.3 Iterative proportional fitting

[P.sub.g.sup.(2k)n](x, t) = [P.sub.g.sup.(2k-1)n](x, t) x [P.sub.g.sup.*](x, t) / [P.sub.g.sup.(2k-1)*](x, t); (12a)

[P.sub.g.sup.(2k+1)n](x, t) = [P.sub.g.sup.(2k)n](x, t) x [P.sub.[??].sup.n]](*, t) / [P.sub.[??].sup.(2k)n] (*, t). (12b)

The procedure defined by (12a) and (12b) is repeated iteratively till some convergence criterion is achieved. For example, the estimates yielded by consecutive steps should differ by no more than by an arbitrarily-selected small number [epsilon]. More details of the method have been discussed by Willekens [13, pp. 69-71], Willekens et al. [14], Rees [8] and Norman [6].

Although the IPF method is purely mechanical, its main advantage is that it does not require any additional information (such as data on vital events or migration) or excessive labour resources, and the obtained results (in terms of joint distributions by all variables under study) are automatically coherent with marginal distributions of particular variables. Moreover, under some general assumptions, the IPF estimates can be interpreted from a statistical viewpoint as joint probability joint probability
The probability that two or more specific outcomes will occur in an event.

Noun 1. joint probability - the probability of two events occurring together
 distributions obtained using the maximum likelihood or entropy entropy (ĕn`trəpē), quantity specifying the amount of disorder or randomness in a system bearing energy or information. Originally defined in thermodynamics in terms of heat and temperature, entropy indicates the degree to which a given  maximisation methods [2, pp. 83-97; after: 13, p. 70].

3.7 Auxiliary methods

Among the auxiliary methods proposed in the current study, the foremost one is the decomposition of the Unknown category wherever it appears (i.e., with respect to age, citizenship, or even sex, as in the case of Greece for 2005). The universal solution proposed in such cases is a proportional disaggregation: population belonging to the Unknown category is broken down proportionally to the existing, well defined categories (citizenship groups, age groups, etc.) and the resulting parts are attached to these categories. For example, if total population P consists of n well-defined groups [P.sub.1], ..., [P.sub.n], and the Unknown category, [P.sub.unk], such that P = [[summation]sub.i] [P.sub.i] + [P.sub.unk], where i = 1, ..., n, then the following corrections apply:

[P'.sub.j] = [P.sub.j] + [P.sub.unk] x [P.sub.j] / [[summation].sub.i] [P.sub.i] = [P.sub.j] (1 + [P.sub.unk] / [[summation].sub.i] [P.sub.i]), for all j, with i = 1, ..., n. (13)

If some elements of age structures are missing (e.g. tails of respective age distributions, or a breakdown into five-year groups given the availability of broader ones), we may either use a structure from a different year or fit a mathematical function

The parameters c, [a.sub.1], [[alpha].sub.1], [a.sub.2], [[alpha].sub.2], [[lambda].sub.2] and [[mu].sub.2] can be estimated separately for each sex, for example using the ordinary least squares method least squares method

Statistical method for finding a line or curve—the line of best fit—that best represents a correspondence between two measured quantities (e.g., height and weight of a group of college students).
In either case, when adjustment to broader age groups is needed in order to ensure summation to respective totals (e.g. for functional age groups), it can be done via proportional fitting presented in Section 3.6.

4 Estimating population stock for EU27 and EFTA countries

The current section briefly summarises the algorithm for the selection of an appropriate method of computations for a given country (Section 4.1), followed by a brief illustration of the proposed approach employed for the 31 countries under study, and a selection of the results (4.2).

4.1 Procedure for selecting an estimation method

In the light of the overview of data availability presented in Section 2 and the methodological discussion presented in Section 3, it is suggested to inspect the following general options of data availability, in order to apply the relevant data estimation procedures:

Option 1. All the required data are available in the Eurostat database

Whenever all data are available in the Eurostat database, the following five-step procedure is recommended:

1. Organize the data in a database;

2. Verify (1) To prove the correctness of data.

3. Deal with the Unknown categories (if applicable);

4. Calculate the required aggregates;

5. Check the results.

This option includes cases when there is a need for combining data from various parts of the Eurostat database (e.g. in DEMO and in JMQ), and the cases where there is an 'Unknown' category, which has to be disaggregated proportionally among the well specified categories, as described in Section 3.7 on 'Auxilliary methods'.

In this case, two situations are possible:

Option 2a. All the missing data may be obtained without contacting the NSI

If all the missing information is publicly available, for example from the NSI webpage, it should be downloaded and combined with the Eurostat data. Such an overall dataset should be then subject to a procedure described under Option 1, points 2. through 5.

Option 2b. Some (or all) missing data are (or are suspected to be) available either from the NSI or from other sources

If some missing information is downloadable from sources like the NSI webpage, it should be collected and merged with the Eurostat data. Nonetheless, there are cases when data are not publicly available but it can be suspected that either some, or even all the missing information is in the possession of the NSI. In such case, the undertaken actions should be as follows:

1. Contact the NSI in order to obtain the missing data. If successful, proceed as in Option 2a;

2. For the data that are still unavailable, but can be estimated, proceed as in Option 3;

3. For the data that are still not available and cannot be estimated, look at Option 4.

Option 2 includes cases when data from various national sources has to be combined, for example aggregated data obtained using the component method and data on citizenship composition from the register of foreigners.

Option 3. Some data are not available anywhere but can be estimated

Even if some auxiliary data can be collected from whichever source, in many instances the available information can be still incomplete. In a vast majority of such cases, the missing information can be still estimated, either following the methodological guidelines guidelines, a set of standards, criteria, or specifications to be used or followed in the performance of certain tasks.
 and techniques outlined in Section 3, or by means of more straightforward and easy-to-apply solutions. The order of undertaken actions is then as follows:

1. Organize the available data (from Eurostat and other sources);

2. Verify the data (perform data validation and internal consistency checks);

3. Deal with the Unknown categories (if applicable);

4. Calculate the required aggregates for the available data;

5. For each year for which data are missing select the best method to estimate missing data;

6. Collect supplementary data needed for the estimations;

7. Estimate the missing data;

8. Check the results.

For the estimations (item 7), various methods can be used, depending on the range and type of the missing information and data availability. In general, fsfive broad groups of methods can be distinguished here, following the outline presented in Section 3:

a. Proportional fitting methods (Section 3.6);

b. Cohort-wise weights propagation (Section 3.5);

c. Cohort-wise interpolation of population stocks (Section 3.3);

d. Cohort-component projections (Section 3.4);

e. Other solutions, not listed above, or combined approaches.

The methodology of interpolating the five-year into one-year age groups, presented in Section 3.2, as well as some methods described in Section 3.7 should be treated as auxiliary to all the remaining ones rather than constituting separate estimation methods per se.

Option 4. Data are not available, and no or only very rough estimates can be produced

Figure 3 presents a decision tree summarising the procedure for selecting the estimation methodology, taking into account all the above options.

4.2 Application of the methodology, examples, selected results

The decision tree presented in Figure 3 has been used to select the best estimation method for each of the 31 EU and EFTA countries, accounting for the availability of data in the Eurostat database (either on-line or in the JMQs), in the NSI databases, and at other sources. It turned out that complete data needed to estimate population by broad group of citizenship, sex and age on 1st January 2002-2006 were available in the JMQs for nine countries: Austria, the Czech Republic Czech Republic, Czech Česká Republika (2005 est. pop. 10,241,000), republic, 29,677 sq mi (78,864 sq km), central Europe. It is bordered by Slovakia on the east, Austria on the south, Germany on the west, and Poland on the north. , Denmark, Finland, Hungary (3), Norway, Slovenia and Sweden. For additional four countries it was possible either to collect all the missing data from the NSI websites (Belgium and Iceland), or to get them by contacting directly the NSI (Lichtenstein and Switzerland).


For the remaining 18 countries some estimations were necessary. The method that proved to useful in the largest number of cases was some sort of proportional fitting (one of the three versions presented in Section 3.6). It was used as the main method for estimating population by broad citizenship in Cyprus, France, Germany, Greece, Italy, Latvia, Luxembourg, Malta, Slovakia, Spain and the UK. In all cases the total population was assumed to be as reported by the NSI in their demographic statistics, while the citizenship structure was taken from varied sources, for example the JMQ data for the same year, data taken from the NSI website (Italy), the census data (Cyprus, France), the data for another year (Romania, Spain), the LFS data (Cyprus, France) or the data from the register of foreigners (Germany) (see also examples below).

The cohort-wise interpolation method was used for Ireland, Lithuania and Portugal. For Bulgaria, Estonia and Poland, where only data from the census were available, the cohort-wise weight propagation was applied. For Cyprus, Lithuania, Luxembourg and Poland it was originally planned to use a projection method, however it was decided that it would require too many assumptions that would be difficult to justify, and that the final result would not be reliable enough to justify the additional effort required when using this method.

Estimations done for Romania do not fit any of the above groups. They involved simple combination of data coming from various sources.

Below, more details about the estimation procedures are provided for selected countries. In doing so, we have tried to give an example for each estimation method. The resulting numbers in terms of the estimated citizenship structures of the populations of 18 European countries (all being EU Member States) on 1st January 2006, are presented in Table 3.

Table 3: Estimated population by broad group of citizenship in 18 EU countries, as of 1st January 2006.

As concerns particular examples: in Germany, data on foreigners come from two different sources. The component method (Bevolkerungsfortschreibung), based on the last traditional German census of 25th May 1987, is used by the NSI to produce annual figures on total population, total nationals and total foreigners, as well as nationals and foreigners by sex and age. The other source is the Central Register on Foreigners which contains data on foreigners by citizenship, sex and age. The total numbers of foreigners and their sex and age structures differ between both sources. In order to obtain a single set of estimates, the total number of German citizens, the total number of foreigners, as well as the age structures of Germans and foreigners were taken, following the NSI procedures, from the Bevolkerungsfortschreibung data. The distribution of foreigners into EU27 and non-EU27 foreigners was done in proportion to their shares in respective age groups according to the data from the Central Register of Foreigners. Thus, all in all, the proportional decomposition method In constraint satisfaction, a decomposition method translates a constraint satisfaction problem into another constraint satisfaction problem that is binary and acyclic. Decomposition methods work by grouping variables into sets, and solving a subproblem for each set.  was used.

In Latvia, no joint distribution of population by citizenship and age was available for 1st January 2002, only the structures by age and by citizenship separately. However, the full joint distribution was available for 2003. The iterative proportional fitting method was selected to deal with this case. The joint distribution by citizenship group and age on 1st January 2003 was taken as the starting point Noun 1. starting point - earliest limiting point
terminus a quo

commencement, get-go, offset, outset, showtime, starting time, beginning, start, kickoff, first - the time at which something is supposed to begin; "they got an early start"; "she knew from the
 for estimating the 2002 structure of population, which was then iteratively adjusted to the known marginal totals.

Lithuania is an example for the application of a cohort-wise interpolation method. In this country, the joint distribution of population by sex, age and citizenship was available for the Census date (6th April 2001), as well as for 1st January 2005. The cohort-wise interpolation, as described in Section 3.3, was used to obtain the initial estimates of males and females on 1st January 2002, 2003 and 2004. In the next step those initial estimates where proportionally adjusted to the known numbers of males and females by age, taken from the Eurostat demographic database.

In Bulgaria, annual data on population by citizenship were not available. The only information on citizenship structure came from the census of 1st March 2001. There are also annual data on population by age and sex prepared by the NSI using the component method, available from the Eurostat database. The estimates of annual 2002-2006 population by citizenship, sex and age were prepared using the cohort-wise weight propagation method. The census data were used as the starting point for calculating the initial shares (weights) of citizenship groups in each age cohort. These shares were iteratively propagated forward as described in Section 3.5 and the resulting weights were combined with the available data on population by sex and age to calculate the required joint distribution by citizenship, sex and age.

5 Conclusion

As it can be seen from the country-specific overview of problems with data on population stocks by age, sex and citizenship, there is no universal solution for estimating the missing pieces of information in the European countries under study. Nevertheless, depending on the availability of data at hand, either in the Eurostat / JMQ, or in the respective national statistical institutes, several estimation procedures can be proposed and applied, as mentioned in Sections 3 and 4.

The methods and algorithm we proposed for this purpose do not, however, consider the issue of the harmonisation Noun 1. harmonisation - a piece of harmonized music

musical harmony, harmony - the structure of music with respect to the composition and progression of chords
Received: March 9, 2008


[5] Castro, L.J. and Rogers, A. (1983). Patterns of Family Migration: Two Methodological Approaches. Environment and Planning The Environment and Planning journals are four influential academic journals. They are described as as 'interdisciplinary', though they have a highly spatial focus, meaning that they are often of most interest to human geographers.  A, 15 (2), pp. 237-254.

[8] Rees, P. (1994). Estimating and projecting the populations of urban communities. Environment and Planning A, 26 (11), pp. 1671-1697.

[12] Willekens, F. (1977). The Recovery of Detailed Migration Patterns from Aggregate Data: An Entropy Maximizing Approach. IIASA Report RR-8130. International Institute for Applied Systems Analysis, Laxenburg.

[14] Willekens F.J., Por A. and Raquillet R. (1981). Entropy, multiproportional, and quadratic quadratic, mathematical expression of the second degree in one or more unknowns (see polynomial). The general quadratic in one unknown has the form ax2+bx+c, where a, b, and c are constants and x is the variable.  techniques for inferring patterns of migration from aggregate data. In: A. Rogers (ed.), Advances in Multiregional Demography. IIASA Report RR-81-6. International Institute for Applied Systems Analysis, Laxenburg, pp. 83-124.

15] Wilmoth, J.R., Andreev, K., Jdanov, D., Glei, D.A., with the assistance of C. Boe, M. Bubenheim, D. Philipov, V. Shkolnikov and P. Vachon (2005). Methods Protocol for the Human Mortality Database, version 4. Department of Demography, University of California, Berkeley. Available from: (accessed on 17th May 2007).

(1) Research project THESIM: Towards Harmonised European Statistics on International Migration, funded by the European Commission European Commission, branch of the governing body of the European Union (EU) invested with executive and some legislative powers. Located in Brussels, Belgium, it was founded in 1967 when the three treaty organizations comprising what was then the European Community  through the Sixth Framework Programme The Sixth Framework Programme (abbreviated FP6) was the Framework Programme for Research and Technological Development from 2002 till 2006 set up by the European Union (EU) in order to fund and promote European research and technological development.  and executed by a research consortium led by Groupe d'etude de demographie appliquee (GeDAP), Universite Catholique de Louvain.

(2) After: 'Ockham's razor', in: Encyclopoedia Britannica Online, accessed on 21st May 2007.

(3) For Hungary, data on total population and on the number of Hungarian citizens were hot always provided in the JMQ and therefore hot available in the migration part of the Eurostat database. However, data on total population were available in the demographic part of the Eurostat database and the number of Hungarian citizens could be calculated directly as a difference between total population and total foreigners, the latter taken from the JMQ.

Jakub Bijak and Dorota Kupiszewska

Central European Forum for Migration and Population Research (CEFMR)

ul. Twarda 51/55, 00-818 Warsaw, Poland

Table 1: Availability of data on population stock by citizenship,
sex and age in the JMQ, 31 countries, as of 1st January 2002-2006.

Country                  2002               2003             2004

Austria                   +                   +                +

Belgium                   +                   x                -

Bulgaria                  -                   -                -

Cyprus                   dref                 -                -

Czech Republic            +                   +                +

Denmark                   +                   +                +

Estonia                   na                 na               na

Finland                   +                   +                +

France                    -                   -                -

Germany            for, broad age,     [+ or -] age, i      for, i
                  [+ or -] agesex, i

Greece                    -                   -                i

Hungary                  for                 for               +

Ireland            p, [+ or -] ctz,       p, +ctz,        p, [+ or -]
                      broad age,         broad age,       ctz, broad
                         dref               dref           age, dref

Italy                    dref                 -                -

Latvia                   -age                 +                +

Lithuania                 -                 -ctz             -ctz

Luxembourg                -                   -            tot, nat

Malta                     -                   -                -

Netherlands               +                   +                +

Poland                   dref                 -                -

Portugal             p, for, -age          p, for              -

Romania                  dref                 -                +

Slovakia                  -                   -               for

Slovenia                  +                   +                +

Spain                     +                   -                p

Sweden                    +                   +                +

United Kingdom            -             [+ or -] ctz,    [+ or -] ctz,
                                            dref           dref, a70

Iceland                   +                   +                -

Lichtenstein              -                   -                -

Norway                    +                   +                +

Switzerland               +                   +                +

Country                  2005               2006

Austria                   +                   -

Belgium                   -                   -

Bulgaria                  -                   -

Cyprus                    -                   -

Czech Republic            +                   +

Denmark                   +                   +

Estonia                   na                  -

Finland                   +                   +

France                    -                   -

Germany                   i                 P, i

Greece            for, [+ or -] sex          for

Hungary                   +                   +

Ireland            p, [+ or -] ctz,       p, +ctz,
                      broad age,         broad age,
                        d ref               dref

Italy                    -age              age, w

Latvia                    +                   +

Lithuania                 +                   +

Luxembourg          [+ or -] ctz,       [+ or -] ctz,
                    [+ or -] age,       [+ or -] age,
                     [+ or -] sex       [+ or -] sex

Malta                     -                   -

Netherlands               +                   +

Poland                   -ctz                 -

Portugal                  -                   -

Romania                   +                   +

Slovakia                  i                   i

Slovenia                  +                   +

Spain                     +                   +

Sweden                    +                   +

United Kingdom      [+ or -] ctz,             -
                      broad age,

Iceland                   -                   -

Lichtenstein              -                   -

Norway                    +                   +

Switzerland               +                   +

+ data provided to Eurostat; -data not provided to Eurostat; -age
no disaggregation by age; -ctz no disaggregation by citizenship; [+
or -] age age disaggregation only for a few citizenship categories;
[+ or -] agesex disaggregation by age not provided for Males and
Females; [+ or -] ctz data provided for a few citizenship
categories; [+ or -] sex disaggregation by sex provided for a few
citizenship categories only; a70 age provided only until 70 years,
with the open-ended group 70+; broad age data disaggregated by
broad age groups; dref reference date different than 1st January;
for data provided for foreigners only; i data inconsistency
problems; na data not available, nat data provided for nationals; p
provisional data; x problems detected in the data sent by the NSI,
tot data provided for Total.

Table 2: Coefficients for the Karup-King interpolation formula.

                           First group, [N.sub.0]

                   [N.sub.0]      [N.sub.1]      [N.sub.2]

First fifth         +0.344         -0.208         +0.064
Second fifth        +0.248         -0.056         +0.008
Third fifth         +0.176         +0.048         -0.024
Fourth fifth        +0.128         +0.104         -0.032
Last fifth          +0.104         +0.112         -0.016

                          Middle groups, [N.sub.i]

                  [N.sub.i-1]     [N.sub.i]     [N.sub.i+1]

First fifth         +0.064         +0.152         -0.016
Second fifth        +0.008         +0.224         -0.032
Third fifth         -0.024         +0.248         -0.024
Fourth fifth        -0.032         +0.224         +0.008
Last fifth          -0.016         +0.152         +0.064

                            Last group, [N.sub.K]

                  [N.sub.K-2]    [N.sub.K-1]     [N.sub.K]

First fifth         -0.016         +0.112         +0.104
Second fifth        -0.032         +0.104         +0.128
Third fifth         -0.024         +0.048         +0.176
Fourth fifth        +0.008         -0.056         +0.248
Last fifth          +0.064         -0.208         +0.344

Source: [11], Table C-1, p. 5-69.

Table 3: Estimated population by broad group of citizenship
in 18 EU countries, as of 1st January 2006.

Country              Total    Nationals          EU27     Non-EU27
                                           foreigners   foreigners

Bulgaria          7 718 750    7 693 214        3 855       21 681
Cyprus              766 414      678 114       52 217       36 084
Estonia           1 344 684    1 082 605        3 961      258 118
France           61 166 822   58 208 155    1 148 691    1 809 976
Germany          82 437 995   75 148 846    2 448 113    4 841 036
Greece           11 125 179   10 165 903      180 282      778 994
Ireland           4 209 019    3 779 755      295 165      134 099
Italy            58 751 711   56 081 197      538 853    2 131 661
Latvia            2 294 590    1 837 832        5 527      451 231
Lithuania         3 403 284    3 370 422        1 962       30 900
Luxembourg          469 086      280 938      171 876       16 273
Malta               404 962      392 850        7 022        5 090
Poland           38 157 055   38 115 920       18 660       22 476
Portugal         10 569 592   10 293 686       80 039      195 867
Romania          21 610 213   21 584 220        6 058       19 935
Slovakia          5 389 180    5 368 255       12 289        8 636
Spain            43 758 250   39 755 741    1 326 128    2 676 381
United Kingdom   60 393 100   56 990 704    1 365 190    2 036 807

Source: Own calculations based on the Eurostat and NSI data.
