Printer Friendly

The prediction of economic evolution through regression and extrapolation.


Predictive methods aim at the anticipation of future decisions based on the already existing information, in order to envisage certain scenarios.

Regression is a method of statistical analysis through which, once certain values are assigned to a dependant variable y and to one or more independent variables [x.sup.j] (j=1,...,m), a simple expression of the function that expresses the connection between them is sought.

Key words: prediction, linear regression, square regression, cubic regression


In technical terms, the connection between two or more variables is called a correlation, and establishing the type of connection that a dependent variable has with one or more independent variables is made through regression analysis.

Regression can be of three types: linear regression, square regression and cubic regression. Linear regression through the method of the least squares is the most widespread. It is the method called "regression", "linear regression", "multiple regression" or the "least squares" when a model is built.

The purpose of multiple regression (a term used by Pearson, 1908) is to highlight the relation between a dependent variable (explained, endogenous or resultant variables) and a lot of independent variables (explanatory, factor, exogenous, predictor ones). In the use of multiple regression there is often an attempt to answer the following questions: "what is the best prediction for...?", "what is the most efficient predictor for...?".


The method of multiple regression is generalized through the theory of the "general linear model", in which there are more dependent variables simultaneously, as well as variables that are not independent from a linear point of view.

The main stages in the construction of the regression model are:

1. Identification--the descriptive stage in which dependent variables are identified, along with the type of relations that they express

2. Specification--the stage in which the best expression of the variables is sought

3. Estimation of parameters for the model that is to be examined

4. Testing of the significance of parameters that get estimated in the model

5. Validation of the model under analysis

6. Use of the model in simulation and prediction operations.

In order to actually accomplish the predictions made on the basis of the regression analysis, one needs to go through the following stages:

* Formulation of the problem--the decision-making party (the manager) must define the problem that needs to be analyzed, in terms of the variables that have to be explained and the values of which are to be predicted. In this incipient formulation, the decisional problem is described and we identify the variables for which predictions will be made, along with the variables that are dependent on them.

* Choosing the economic indicators--after the identification of independent variables, we search for other adjacent factors that are liable to influence the respective variable and that can be included in the regression equation.

* The first analysis of the regression equation actually represents a statistical analysis of the equation components and is performed automatically, one of the results being to establish the correlation matrix with the help of the Microsoft Excel 2014 tools.

* The analysis of the simple correlation matrix is performed to the purpose of choosing the variables that need to be part of the regression equation. In this analysis, we must identify those variables that are strongly correlated with the dependent variable, but superficially correlated among themselves. At the end of this stage, 3 or 4 regression equations are taken into account for the analysis.

* The choice of a regression equation from the ones that we have identified--based on the available data, the computer will determine not only the regression coefficients, but also the elements that allow the test of their significance. Only the significant equations will be kept.

* The check of the regression conditions validity.

* Preparing the prediction--once we have chosen a regression equation whose correlation coefficient value is big enough and which is adequate from the perspective of the significance test, the decision-making party (the manager) can use this equation as a basis for analysis for the prediction that he intends to make. At this point, a trust interval for individual predictions needs to be established, as well as the precision of the value of every independent variable.

The main advantage of regression analysis consists in the fact that it is a statistical method, which presupposes an estimation of the degree of precision and significance, and it can be used in all types of causal relations on condition that the variable that we have in view depends on independent variables.

The fact that it is a statistical method is also its main disadvantage. That is why many decision-makers avoid this type of analysis. Another downside is the big volume of data and the costs for data collection in order to establish the initial regression equation and for the analysis of its validity in time (if a modification in the causal relation between an independent variable and the dependent one appears, the collection of new data is necessary, along with the redefinition of the regression equation).

EXCEL offers more statistical functions to be used in the making of predictions within linear regression: FORECAST, TREND, LINEST. There are also not only functions for other types of regression (for example, the LOGEST and GROWTH functions for exponential regression), but also the possibility to make predictions through graphs (by attaching tendency curves to graphic representations).

Regression is a method of statistical analysis through which, by the assignation of determined values to a dependent variable y and to one or more independent variables [x.sup.j] (j=1,...., m), a simple expression of the function that expresses the connection between them is sought.

* By extrapolation, based on this expression and on other values of independent variables, the corresponding values for the dependent variable are sought. If statistical data expresses an evolution in time, then the calculation of future values of the dependent variable y is called prognosis/prediction, and that of past values--retrognosis.

* In regression analysis we consider that the dependent variable y is a function of one or more independent variables of the type: y= f([x.sub.1],[x.sub.2],......, [x.sub.n])+[epsilon],

* [epsilon] represents the deviation of the theoretical variable from the experimental one.

* In regression analysis, we aim to determine the function f, so that [epsilon] should be a variable with a null average and minimum dispersion.

* The easiest model to follow is simple regression: y=ax+b.

* If this model is not satisfactory, we can use the multilinear regression of the type: y=[a.sub.1]x+[a.sub.2][x.sup.2]+[a.sub.3][x.sup.3]+...+[a.sub.m][x.sup.m]+b, where

* [a.sub.j] (j=1,....., m) and b need to be determined.

* The description of the relation that could exist between the two variables (x and y) analyzes if the ascending tendency of one entails an ascending/descending tendency of the other variable.

* If m=1--linear dependence, and the regression function is represented by a line whose angle is given by [a.sub.j], b gives the ordinate of the intersection point of the line with the oy axis.

* Various other dependence functions can be reduced to the multilinear function through various transformations. For regression, the most widely used method is that of the least squares, through which the minimization of the sum of the squares of distances between given values for y and those calculated through the regression function is sought.

* The final aim is prediction, under the condition that the analysis is possible, the two variables being indeed correlated; the correlation method is noted with R, which takes values between 1 and -1. The closer the value is to 1 or -1, the more relevant the regression. If R=0, it means that the two variables are unassociated and do not influence each other.


In this case study, we are interested in the evolution of sales of towel manufacturing companies, according to the values of hotel construction in a country (Romania) and to other construction contracts that have been made so far.

The data known from the hypothesis:

* The values of the amount of towel sales between 1997-2013

* The values of the hotel constructions between 1997-2013

* The values of the construction contracts closed between 1997-2013

* The probable values of the hotel constructions in the next 5 years 2014-2018

* The probable values of the construction contracts in the next 5 years 2014-2018.

What is needed: to calculate the evolution of towel sales between 2014-2018.

Independent variables: [x.sub.1]-values of the towel manufacturing (already made and probable in the next 5 years)

The regression function cannot be found in ribbon, it can be installed from the menu File-Options- Add Ins-Analysis ToolPak. As a result of the installation of the analysis tool, the Regression function is installed in the menu Data-Data Analysis.

The main page, where the data from the hypothesis has already been installed, has been called ENUNCIATION.

In order to interpret, economically and statistically, the results generated by the regression function of the Excel and their relevance, the main instruments of the analysis and their explanation are presented in order to understand the significance of these results.

Significance F--displays the critical probability of the test, so that if the column Sig.[<.sup.[alpha] (] , the hypothesis of lack of significance of independent variables is rejected, in favor of the hypothesis that the regression model is a salient one. We also say that the test is one of significance over [R.sup.2] (Adjusted R Square).

Standard Error--the standard error of the coefficient (the standard deviation in the coefficient random distribution)

R-Square ([R.sup.2])--explains x% of the variation of x in relation to y t Stat--the statistics of the coefficient significance test. If the given values are on the regression curve, then [R.sup.2]=1, meaning a perfect correlation. The closer [R.sup.2] is to 1, the more adequate the regression equation.

The F test indicates if the whole regression equation is significant. The calculated F value is [F.sub.c]. The theoretical value of the inverse of the statistical distribution of F, noted as [F.sub.t], takes determined values that are found in statistical tables from specialized textbooks. If [F.sub.c]>[F.sub.t] , then the regression equation is significant (namely, it checks the model validity).

P-value--the probability for the hypothesis that the estimated parameter should be equal to 0; if P-value is smaller than the significance threshold, then we reject this hypothesis.

Multilinear regression of the type: y=[a.sub.1]x+[a.sub.2][x.sup.2]+[a.sub.3][x.sup.3 (]+...+[a.sub.m][x.sup.1 (]+b, where [a.sub.j] (j=1,....., m) and b need to be determined.

As a result of making the prediction, whose equation is y=35.56x+10.72[x.sup.2]+22.749, the graph from figure 5 was drawn, in which we notice a parallel between towel sales and the Excel-generated prediction; as we can see from the graph, the prediction is quite faithful to reality, and the prediction for the next 5 years is optimistic, as it predicts growth from one year to another.

a) Linear regression: y=ax+b, a= interceptor, b=the regression coefficient/the angle of the regression line

From the linear regression summary output, the result is the function y=379x+287774. R is almost 0, so the implication is that the correlation between coefficients is almost inexistent. Thus, we will renounce to calculate the prediction, because it will bear no relevance, and the square regression will be resorted to instead.

b) Square regression: it has the syntax y=[a.sub.1]x+[a.sub.2][x.sup.2]+b, which will be used to calculate the prediction in case R Square has a relevant value, namely a value that is as close to 1 as possible; since the value is very close to 0, the equation is not relevant and is not faithful to reality; thus, we will renounce the calculation of the square regression and we will try the cubic regression.

c) Cubic regression: it has the syntax: y=[a.sub.1]x+[a.sub.2][x.sup.2]+[a.sub.3][x.sup.3 (]+b

As we can notice in figure 10, R Square has the value 0,07, which is close to 1, which leads us to the conclusion that both the square regression and the cubic one are not faithful to reality and we will renounce to the calculation of the prediction. The only regression that has brought relevant results has been the linear regression.


Statistical analysis instruments, such as regression, help the user make predictions. Although there are no perfect relations in the real world, through regression, predictions can be made of a variable, function of the value of another. Prediction is the process of estimating the value of a variable by knowing the value of another.

Regression is tightly connected with the concept of correlation. A powerful association between two elements leads to an increase in the precision of a variable prediction function of another.


Sarbu C., Jantschi l., Statistic validation and evaluation of the analytic methods by comparative studies. I. Analytical methods validation by regression analysis, ISSN 0034-7752, Roumain Breaz N., Aldea M., On the smoothing spline regression models, No 15/2008

Marinoiu C., Choosing a smoothing parameter for a curve fitting by minimizing the expected prediction error, Acta Universitatis Apulensis, Mathematics-Informatics, no 5, 91-96, 2003

Lecturer Sion Beatrice, Ph.D

Professor Cezar Mihalcescu, PhD

Assistant Alexandra Marginean, Ph.D (1)

(1) School of Domestic and International Economy of Tourism, Romanian-American University,,,
COPYRIGHT 2014 Romanian-American University
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2014 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Beatrice, Sion; Mihalcescu, Cezar; Marginean, Alexandra
Publication:Journal of Information Systems & Operations Management
Date:Dec 1, 2014
Previous Article:Super resolution from multiple low resolution images.
Next Article:Facial emotion recognition using Kinect.

Terms of use | Privacy policy | Copyright © 2020 Farlex, Inc. | Feedback | For webmasters