# A model for predicting the performance of a bank's mortgage loan portfolio.

INTRODUCTION

It is essential for sound operations of banks and lending institutions to have models and analytic tools available by which they can measure the performance (or health status) associated with a certain loan portfolio as well as predict this status over time from prevailing macroeconomic factors. A Markov chain defined on different payment states of a mortgage loan allows one to define and calculate a health index on the loan portfolio which can be used as a performance measure of that portfolio.

A performance measure, such as a health index measure, for a mortgage portfolio will be useful for a bank or lending institution in its loan or credit policy. It will help the management to monitor the performance of its portfolio over time. Furthermore, an empirical model that can relate a health index to macroeconomic factors will be useful in forecasting performance level. In a previous study (Liu et al, 2010) a Markov chain approach was developed to determine the transitions among payment states of a mortgage loan. Based on the probabilities of transitions among states, a loan health index was defined as a measure of its performance. In this paper, we will build on the previous study and develop an empirical model relating certain macroeconomic factors to the health index of the loan for forecasting purposes.

LITERATURE REVIEW

Soyer and Feng (2010) considered reliability models for assessing mortgage default risk. White (1993) presented several models employed in the banking industry. These included discriminant analysis, decision tree, expert system for static decision, dynamic programming, linear programming, and Markov chains for dynamic decision making. Markov chain modeling is a common approach used in the analysis of credit risk. As discussed by White (1993), Markov decision models have been used extensively to analyze real world data in (1) Finance and Investment, (2) Insurance, and (3) Credit area.

Cyert, Davidson and Thompson (1962) developed a finite stationary Markov chain model to predict uncollectible amounts (receivables) in each of the past due category. The states of the chain were defined as normal payment, past due, and bad-debt states.

Grinold (1983) used a finite Markov chain model to analyze a firm's market value. Lee (1997) used an ARMA model to analyze the linkage between time-varying risk premia in the term structure and macroeconomic state variables.

Esbitt, (1986) provided empirical evidence that a bank's portfolio quality has close relationship with the macroeconomic situation. Examples include the state-chartered banks' failure and the Great Depression in Chicago between 1930 and 1932.

McNulty, Aigbe, and Verbrugge (2001) proposed an empirical regression modeling approach to study the hypothesis that small community banks have an information advantage in evaluating and monitoring loan quality.

Hauswald & Marquez (2004) studied the relationship between the current regulative policy and the loan quality, or risks encountered by a financial institute.

Gambera (2000) used a vector-autoregressive (VaR) model to predict the loan quality in business cycles. D'Amico et al. (2005) applied Semi-Markov reliability models to the study of credit risk management. Douglas et al. (1996) proposed the use of non-stationary Markov and logistic modeling approaches to predict the performance of credit home mortgage portfolios. Pennington-Cross (2008) used a multinomial logit model to study the duration of foreclosure in the subprime mortgage market. Burkhard and De Giorgi (2006)used a non-parametric approach to model the probability distribution of defaults in residential mortgage portfolios. Hayre et al. (2008) presented a model that forecasts default rates as a function of economic variables and mortgage and borrower characteristics. Green and Shoven (1986) used a proportional hazard model to study the effects of interest rates on mortgage prepayment.

Deng et al (2000) used the option theory approach to predict mortgage termination by prepayment or default. They showed that the model performed well, but was not sufficient by itself. Heterogeneity among homeowners must be taken into account in estimating or predicting the prepayment behavior. Schwartz and Tourous (1993) applied a poisson regression to estimate the proportional hazard model for prepayment and default decisions in a sample of single-family fixed rate mortgages.

THE MODEL

In this model, we consider the pool of mortgage loans in the portfolio of a commercial bank in china. Based on the bank data and loan policy, each loan is classified into three states according to the mode of payment. State [S.sub.1] is the normal state, which is 0-30 days past due. State S2 is 30-90 days past due, and state S3 is more than 90 days past due. If a loan is in state [S.sub.1], it can stay in [S.sub.1] or transit to S2. A loan in state S2 can remain in S2 or transit to [S.sub.1] or to S3. A loan in state S3 can remain in S3 or transit to S2.

Given the transition probability matrix of the Markov chain, one can calculate the expected duration of stay in each of the three states ([S.sub.1], S2, and S3). A health index of the portfolio can be calculated by taking into consideration the expected duration of stay in each state and the transitions from S2 and S3 to the normal or health state, [S.sub.1].

Loan Health Index

Let H be the health index of a portfolio (population or collection of all mortgage loans held by the bank), which at time t has the three states , [S.sub.1], S2 and S3.

The health index over a given time interval (0, t) is defined as

H = [e.sub.2] [[theta].sub.2,l] + [e.sub.3] [theta].sub.3,1] + [e.sub.1] [[theta].sub.1,1] (1)

where, [e.sub.j] refers to the expected duration of stay in state j: j = 1,2,3 and [[??].sub.j,1] is an intensity function measuring the transitions to the normal or health state, [S.sub.1].

It is clear from Eq. (1) that the health of the portfolio depends on the time the process stays in each state and the transitions from each of the S2 and S3 sub-health states to the [S.sub.1] health state. Clearly, the larger the health index, the healthier is the portfolio.

The expected duration of stay in a specific state is based on the Markov transition intensity matrix , V, shown below, Fig. 1.

Figure. 1: Transition intensity matrix, V
```a

[S.sub.1] [S.sub.2] [S.sub.3]

[S.sub.1] [V.sub.11]  [V.sub.12]  [V.sub.13]
[S.sub.2] [V.sub.21]  [V.sub.22]  [V.sub.23]
[S.sub.3] [V.sub.31]  [V.sub.32]  [V.sub.33] (2)
```

Here, [v.sub.11] = -([v.sub.12] + [V.sub.13]), [v.sub.22] = -([v.sub.21] + [v.sub.23]) and [v.sub.33] = -([v.sub.31] + [v.sub.32])

The transitions intensities are defined as (Chiang, 1980):

[v.sub.ij] [DELTA]t= Pr {an individual in state [S.sub.i] at time [tau] will be state [S.sub.j] at time [tau] + [DELTA]t), where i [not equal to]j, j = 1,2,3

Furthermore, we assume that the intensities [v.sub.ij] are independent of time [tau] (0 [less than or equal to] r [less than or equal to] t). Thus, we are concerned here with a time homogenous Markov chain.

If an individual stays in its original state, its intensity is defined by [v.sub.ii] = - [3.summation over (j=1)] [v.sub.ij], i [not equal to] j. By

this definition, it is obvious that 1 + [v.sub.ii] [DELTA]t= Pr {an individual in state [S.sub.i] at time [tau] will remain in state [S.sub.i] at time [tau] + [DELTA]t}.

Let [P.sub.ij]([tau], t) be the probability that an individual in state [S.sub.i] at time [tau] will be in state [S.sub.j] at time t, i,j=1,2.3 and [e.sub.j] (t) be its expected duration of stay in state j. It can be shown (Chiang, 1980) that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3)

and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4)

where, [[pi].sub.i], i = 1,2,3, is the proportion of individuals in the portfolio pool who are initially in [S.sub.i], i=1,2,3 and [e.sub.j] is the expected duration of stay in state j irrespective of the initial starting state. Here, A[i.sub.j] ([[rho].sub.l]) is the ij co-factor of A'([[rho].sub.l]), defined as

A' ([[rho].sub.l]) = ([[rho].sub.l] - V'), (5)

where [[rho].sub.l] = the lth eigenvalue of the characteristic matrix , ([[rho].sub.l] - V').

In the health index of Eq. (1), it can be seen that [[THETA].sub.j,1] measures an individual's ability to recover from the sub-health state Sj, j=2,3 to the health state, [S.sub.1]. For a given time period, the Maximum Likelihood estimate (Chiang, 1980) of [[THETA].sub.j,1] is given as

[[theta].sub.j,1] = [N.summation over (r=1)] [n.sub.j,1,r] / [N.summation over (r=1)] [t.sub.j,r], j = 1, 2, 3 (6)

where, [n.sub.j,1,r] is the number of transitions from [S.sub.j]: j = 1,2,3 to [S.sub.1] by the rth individual. As such, [[summation].sup.N.sub.r=1] [n.sub.j,1,r] is the total number of transitions made by all N individuals in the portfolio.

By the same reasoning, [N.summation over (r=1)] [t.sub.j,r] the total length of time that all individuals in the portfolio stay in Sj: j = 1,2,3. Therefore, from Eqs. (1), (3) and (6), the portfolio health index is given as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (7)

Let [c.sub.i] be the number of loans in state i at the initial starting date. Thus, [[pi].sub.i] can be estimated

as

[[pi].sub.i] = [c.sub.i] / [[summation].sup.3.sub.i=1] [c.sub.i], i = 1,2,3 (8)

APPLICATION

Mortgage data are difficult to obtain from any bank. For this study, we were able to obtain data over 23 one-month periods of retail mortgage loans, provided by a large commercial Bank in China. This was used to estimate the health index of the loans (Eq. (7)) and to analyze its relationship to macroeconomic factors at the national and regional level. The source for the economics factors was http://www.cnki.net/.

Our interest in this study is to demonstrate the applicability of this modeling approach to a given bank. Hence, data from one bank is deemed adequate for this purpose.

A practical method for estimating [[THETA].sub.j,1] in Equation (1) from the data over a given time period 0, t is

[[??].sup.t.sub.j,1] = [p.sub.j,1] [N.sub.t] / 30 [N.sub.t] [[delta].sub.j], j = 1,2,3, t = 1, 2, ..., 23 (9)

Where [[theta]'.sub.j,t] t is the intensity function for period t, [N.sub.t] is the total number of retail mortgages for period t, [N.sub.t] = [3.summation over (s=1)] [N.sub.s,t]. Thus, [N.sub.t] represents all individuals in the three states. Also, [p.sub.j1] [N.sub.t] is the expected number of transitions from state [S.sub.j] to state [S.sub.1] made by all individual loans during period t, where, [p.sub.j,1] is the transition probability from [S.sub.j] to [S.sub.1].

In equation (9), [[delta].sub.j] is defined as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

We use 30 [N.sub.t] [[delta].sub.j] to approximate the total length of time that all individuals in the portfolio stay in [S.sub.j], j = 1,2,3. As a result, 30 [N.sub.t] [[delta].sub.j] gives the length of time for all individuals staying in state [S.sub.j] during the one month period.

Regression Model

Macroeconomics factors play an important role in relation to the performance of a mortgage loan portfolio. The health index, as a measure of performance, will be more useful if it is linked to some macro-factors which will enable bank management to forecast the quality or performance of its mortgage portfolio..

There are several studies in the literature that have considered the use of macro-factors to predict future health status of different industries. Liu et al. (2011) used the state space time series model to analyze the sensitivities of industrial production indices (including banking) to the macro-factors such as GDP, interest rate, unemployment, inflation, and disposable personal income. Ludvigson & Ng, (2009) used regression and Principle Component methodology, to analyze the relationship between bond risk and macro-factors. Studies along this line were also undertaken by Bai & Ng (2008), Forni et al. (2005), and Boivin & Ng, (2005).

In the present study, we have a pool of candidate regressors or independent variables (Table 1) and the problem is to determine the subset of regressors that significantly affect the health index for inclusion in the model. Finding an appropriate subset of regressors to include in the model is called variable selection. The stepwise and backward elimination procedures are two recommended procedures for determining the subset regression model (Montgomery et al., 2001). The final model chosen should satisfy the following criteria:

Have a fairly high R-squared value, a normal distribution for the residuals, no outliers, no multicollinearity (Variance inflation factor, VIF, is less than 10) among the regressors, and a fairly good model predictive performance.

Using the software package SAS, we ran stepwise and backward elimination on the national and regional data separately because of the large number of independent variables relative to the sample size. We combined the significant national and regional variables from the stepwise and backward elimination to come up with one model. Applying the above criteria for model selection, the following subset model was selected as being the best model for the available data:

[H.sub.i] = 5.572 + 0.01819 [X.sub.1] + 0.00396 [X.sub.2] - 0.04162 [X.sub.3] (10)

Here, [H.sub.i] is the health index of the mortgage portfolio, [X.sub.1] is the GDP rate of increase, X2 is the Chinese currency rate of increase, and [X.sub.3] the housing rental index. All three independent variables have the expected sign. For this model, the distribution of residuals was normal, there were no outliers or influential observations and multicolliearity was not significant (VIF less than 7 for all three variables. All independent variables were highly significant (p values less than 0.002). The R-square value is 0.9173, which is fairly high. Also, the adjusted R-square is 0.9043.

Mortgage data are difficult to obtain. For this analysis we had 23 monthly observations ( December 2006 to October 2008) from a large commercial bank in China. In order to check on the predictive performance of the model it was not possible (because of the small sample size) to split the data into two samples since one would need 15- 20 observations for a reliable assessment of predictive performance . In this case, an alternative splitting technique is to use the Press statistic (Montegomery et al., 2001).

Press = [n.summation over (i=1)] [([Y.sub.i] - [Y.sub.(i)]).sup.2] (11)

Here, [Y.sub.i] = the ith observation

[Y.sub.(i)] = the predicted value of the ith observation from the model when the model was obtained by fitting it to the remaining n-1 observations (ith observation is deleted). The predictive R-square for this model was calculated as:

[R.sup.2.sub.pred] = 1--(Press/Total Sum of Squares) (12)

The R-square predicted value, from Eq. (12), for the model in Eq. (10) was 0.86, which means that this model explains 86% of the variability of new observations. The predictive performance of the model is fairly good. Such a model may be updated as more data become available in order to predict future portfolio performance.

This modeling approach is useful for any bank to use in order to gauge the effect of economics factors on the health index, used as an indicator for performance of its mortgage portfolio or other portfolios.

CONCLUSION

The modeling approach in this study provides the bank with a health index to assess the performance of its mortgage portfolio. The mortgage health index was related to economics factors in order to predict its behavior. Among all of the factors studied, only three had significant effects on the health index. These were the GDP rate of increase, the currency rate of increase and the house rental index at the national level in China. These three variables explained more than 90% of the variability in the sample data of 23 monthly observations. The model predictive performance was fairly good. It could explain 86% of the variability in predicting new observations not included in the original data.

This modeling approach to measure and predict the behavior of the health index of a mortgage loan portfolio is useful for the management of a bank in assessing the risk of a portfolio.

REFERENCES

Bai J & S. Ng (2008). Forecasting economic time series using targeted predictors. Journal of Econometrics, 2008, vol. 146, 304-317 .

Boivin J. & S. Ng (2005), Undertanding and Comparing Factor Based Forecasts, International Journal of Central Banking 1, 117-152.

Burkhard, J. & E. De Giorgi (2006). An Intensity-Based Non-parametric Default Residential Mortgage Portfolios, Journal of Risk, 8, 57-95.

Chiang, C.L. (1980). An Introduction to Stochastic Processes and Their Application, R.E. Krieger Publishing .

Cooper, D. & E. Wood (1981). Estimation of the parameters of the Markovian representation of the autoregressive-moving average model. Biometrika 68, 320-322.

Cyert, R., J. Davidson, & G. Thompson (1962). Estimation of the Allowance for Doubtful Accounts by Markov Chains. Management Science 8, 3-19.

D'Amico, G. J. Janssen, and R. Manca (2005) "Homogeneous semi-Markov reliability models for credit risk management," Decisions in Economics and Finance, vol. 28 79-93

Deng , Y., J M Quigley & R van Order (2000). Mortgage Termination, Heterogeneity and the Exercise of Mortgage Options. Econometrica, 68, 275-307.

Douglas, S., S Sanchez & L. Edward (1996). A Comprehensive Model for Managing Credit Home Mortgage Portfolio. Decision Sciences, 27, 291-317.

Esbitt, M. (1986). Bank Portfolios and Bank Failures During the Great Depression. The Journal of Economic History, Vol. 46, 455-462

Forni, M., M. Hallin., M. Lippi , & L. Reichlin (2005). The Generalized Dynamic Factor Model: One-Sided Estimation and Forecasting. Journal of the American Statistical Association, 100 830-840.

Gambera, M. (2000). Simple Forecasts of Bank Loan Quality in the Business Cycle, Emerging Issues from Federal Reserve Bank of Chicago , April 2000 (S&R-2000-3)

Green J & J. Shoven (1986). The Effects of Interest Rates on Mortgage Prepayments. Journal of Money, 18, 41-59.

Grinold, R. (1983), Market value maximization and markov dynamic programming, Management Science, 14, 23-145

Hauswald, R. & R. Marquez (2004), Loan-Portfolio Quality and the Diffusion of Technological Innovation, FDIC Working Paper No. 2004-02

Hayre S. , M. Saraf, R. Young,& J Chen (2008). Modeling of Mortgage Default. Journal of Fixed Income, 17, 6-31.

Lee, Lung-Fei (1997). Simulation estimation of dynamic switching regression and dynamic disequilibrium models--some Monte Carlo results. Journal of Econometrics, 78, 179-184

Liu, C., M. Hassan, & R. Nassar (2010) A Markov Chain Modeling Approach for Predicting A Retail Mortgage Health Index, Academy of Accounting & Financial Studies Journal, 14 (3), 101-112.

Liu, C., M. Hassan, & R. Nassar (2011). Time Series Analysis of the Sensitivity of Net Incomes of Industrial Sections to Macroeconomics Factors. International Journal of Business and Economics Perspectives. In Press.

Ludvigson S. & S. Ng (2009). Macro Factors in Bond Risk Premia. Review of Financial Studies, 22 . 5027-5067.

McNulty, J., A. Aigbe, and J. Verbrugge (2001). Small bank loan quality in a deregulated environment: the information advantage hypothesis. Journal of Economics and Business. volume 53, 325-339

Montgomery, D. E. Peck & C. Vining (2001). Introduction to Linear Regression Analysis. John Wiley, New York.

Pennington-Cross, A (2008).The Duration of Foreclosures in the Subprime Mortgage Market: A competing Risk Model Mixing. Journal of Real Estate, Finance and Economics, 40, 109-129.

SAS Online Doc, 2005 version, SAS/ETS User's Guide. Schwartz, E & T. Walter (1993). Mortgage Prepayment and Default Decision: Poisson Regression Approach. Journal of American Real Estate and Urban Association, 21, 31-44.

Soyer, R, & Feng X. (2010). Assessment of Mortgage Default Risk via Reliability Models. Applied Stochastic Models in Business and Industry, 26, 308-330.

Wei, W. (1990). Time Series Analysis: Univariate and Multivariate Methods. Addison-Wesley Publishing.

White, J. (1993). A Survey of Applications of Markov Decision Process. The Journal of the Operational Research Society, 44, 11-20.

Morsheda Hassan, Wiley College

Chang Liu, Southern West University of Finance and Economics, China

Raja Nassar, Louisiana Tech University
```Table 1: Macroeconomics factors, at the national and regional
levels in China, used for developing the empirical model

National    Regional

GDP rate of increase                      X
M1 Currency rate of increase              X
CPI Index                                 X           X
CPI-Living index                          X           X
Construction Material Price Index         X           X
Housing Sales Index                       X
Housing Development Index                 X
Housing Sale Amount                       X
HPI                                       X           X
Housing Rental Index                      X           X
Community Management Fee Index            X
Export Rate of Increase                   X
Metro Family Monthly Income               X           X
Metro Family Disposable Income            X           X
One-year mortgage Prime Rate
GDP                                                   X
CPI-Food Index                                        X
Food Sale Index                                       X
Living Material Sales Amount                          X
Number Employed                                       X
Construction Investment Price Index                   X
Metro Family Living Expenditures                      X
```
COPYRIGHT 2012 The DreamCatchers Group, LLC
No portion of this article can be reproduced without the express written permission from the copyright holder.
Author: Printer friendly Cite/link Email Feedback Hassan, Morsheda; Liu, Chang; Nassar, Raja Academy of Banking Studies Journal 1USA Jan 1, 2012 3563 Improving the art, craft and science of economic credit risk scorecards using random forests: why credit scorers and economists should use random... Bank loans Bank management Banks (Finance) Decision making Decision-making Macroeconomics Markov processes Mortgages