Subsections


15 Exchange Rate Prediction,
Time Series Model


1 Introduction

Solutions of optimal investment problems, considered in the previous chapter, depend on predicted stock rates. Predictions are needed considering most of the investment problems, because the result of an investment depends on future values of various parameters. Predictions are an important part of scheduling problems, too. The optimal schedules depend on the predicted demand, and supply. Time series models are common prediction tools. Here we investigate a well-known time series model. The so called, Auto Regressive Moving Average (ARMA) model. We briefly consider the Artificial Neural Network (ANN) and the Bilinear models, too.

Mainly, the financial data is predicted. The prediction of call rates, using the same models, are briefly mentioned, just for comparison. The call rate prediction is described in detail considering call center optimization problems in Chapter 16. There the traditional ARMA is supplemented by an expert model.

Modeling economic and financial time series by ARMA has attracted the attention of many researchers in recent years [Diebold and Rudebusch, 1989,Cheung, 1993,Yin-Wong and Lai., 1993,Cheung and Lai, 1993,Koop et al., 1994,Mockus and Soofi, 1995]. Three approaches have been used to estimate the parameters of the ARMA models, : Maximum Likelihood (ML) [Sowel, 1992], approximate ML [Li and McLeod, 1986,Fox and Taqqu, 1986,Hosking, 1981,Hosking, 1984], and two-step procedures [Geweke and Porter-Hudak, 1983,Janacek, 1982]. In all the cases local optimization techniques were applied. Here the optimization results depend on the initial values, what implies that one cannot be sure if the global maximum is found.

The global optimization is very difficult in many cases[*]. The reason is a high complexity of multi-modal optimization problems. It is well known [Ko, 1991] that optimization of polynomial-time computable real functions cannot be done in polynomial-time, unless $P=NP$[*]. In practice, this means that we need an algorithm of exponential time to obtain the $\epsilon$-exact solution. The number of operations in exponential algorithms grows exponentially with the accuracy $m$ of solution and dimension $n$ of the optimization problem. The accuracy $m$ means that $\epsilon \le 2^{-m}$. The dimension $n$ means that one optimizes a function $f(x)$ where $x=(x_1,...,x_n)$.

The Least Squares (LS) method is a popular approach to estimate parameters of ARMA models. Using LS one minimizes the log-sum of square residuals using ARMA models and their extensions [Mockus and Soofi, 1995]. In this chapter, the multi-modality problems are considered using different data. The data include:
daily exchange rates of $/£ and DM/$ ,
closing rates of stocks of AT&T , Intel Co, and some Lithuanian banks,
the London stock exchange index [Raudys and Mockus, 1999],
call-rates of a commercial call-center.
The graphical images and comparison of the average prediction results of ARMA and the Random Walk (RW) models are presented.


2 Auto Regressive Moving-Average Models (ARMA)

1 Definitions

We define ARMA model as

\begin{eqnarray}
w_{t} = \sum_{i=1}^p a_i w_{t-i} +\sum_{i=1}^q b_i \epsilon_{t-i}
+\epsilon_t.
\end{eqnarray}


We assume that

\begin{eqnarray}
z_{t-i}=0,\
w_{t-i}=0,\ \epsilon_{t-i}=0,\ \ if\ \ t \le i.
\end{eqnarray}


2 Definition of Residuals

An advantage of residual minimization is that one may see directly how the objective depends on unknown parameters. Using equalities (15.1) we define residuals by recurrent expressions

\begin{eqnarray}
\epsilon_1& =&w_1, \\
\nonumber \epsilon_2&=&w_2-a_1 w_1 - b_...
... -a_p w_{t-p} - b_1
\epsilon_{t-1} - ... - b_q \epsilon_{t-q}.\\
\end{eqnarray}


Next the sum

\begin{eqnarray}
f(x)=\log f_m(x),\ \ f_m(x)= \sum_{t=1}^T \epsilon_t^2
\end{eqnarray}


is minimized. The logarithm is there to decrease the objective variation by improving the scales.


3 Minimization of Residuals of ARMA Models

Here we consider an algorithm for optimization of parameters of the ARMA model. This model is simple and may be considered as a good first approximation. Denote by $y_t$ the value of $y$ at the time $t$. Denote by $a=(a_1,...,a_q)$ a vector of auto-regression (AR) parameters, and by $b=(b_1,...,b_q)$ a vector of moving-average (MA) parameters.

\begin{eqnarray}
y_t-\sum_{i=1}^p a_i y_{t-i}= \epsilon_t - \sum_{j=1}^q b_j
\epsilon_{t-j},\ t=1,...,T.
\end{eqnarray}


The residual

\begin{eqnarray}
\epsilon_t = y_t-\sum_{i=1}^p
a_i y_{t-i}+ \sum_{j=1}^q b_j \epsilon_{t-j},
\end{eqnarray}


or

\begin{eqnarray}
\epsilon_t = B_t+\sum_{i=1}^p a_i A_t(i).
\end{eqnarray}


Here

\begin{eqnarray}
B_t=y_t+\sum_{j=1}^q b_j B_{t-j-1}
, \end{eqnarray}


and

\begin{eqnarray}
A_t(i) = -y_{t-i-1} + \sum_{j=1}^q b_j
A_{t-j-1},
\end{eqnarray}


where $t-i > 0$ and $t-j >0$.

1 Optimization of AR Parameters

Denote

\begin{eqnarray}
S(a,b) =\sum_{t=1}^T \epsilon^2,
\end{eqnarray}


where $a=(a_1,...,a_p)$ and $b=(b_1,...,b_q)$. From expressions (15.11) and (15.8) the minimum condition is

\begin{eqnarray}
{\partial S(a,b) \over \partial a_j} = 2
\sum_{t=1}^T \epsilon_t A_t(j)=0,\ j=1,...,p,
\end{eqnarray}


or

\begin{eqnarray}
\sum_{i=1}^p A(i,j) a_i = -B(j),\
j=1,...,p.
\end{eqnarray}


Here

\begin{eqnarray}
A(i,j) =\sum_{t=1}^T A_t(i) A_t(j),
\end{eqnarray}


and

\begin{eqnarray}
B(j) =\sum_{t=1}^T A_t(j) B_t.
\end{eqnarray}


The minimum of expression (15.11) at fixed parameters $b$ is defined by the system of linear equations

\begin{eqnarray}
a(b)= A^{-1} B.
\end{eqnarray}


Here matrix $A=(A(i,j),\ i,j=1,...p)$. Vector $B=(B(j), \
j=1,...p)$, where elements $A(i,j)$ are from (15.14), components $B(j)$ are from (15.15), and $A^{-1}$ is an inverse matrix $A$. This way one define the vector $a(b)=(a_i(b),\ i=1,...,p)$ that minimize sum (15.11) at fixed parameters $b$.

2 Optimization of MA Parameters

The sum of squared residuals (15.11) is a non-linear non-convex function of parameters $b$. Thus, we have to consider the global optimization algorithms. Denote

\begin{eqnarray}
f(x)=\log S(a(x),x).
\end{eqnarray}


Here $x=b$ and $S(a,b)$ is from (15.11) at optimal parameter $a=a(b)$. Denote

\begin{eqnarray}
b^0 =x^0 =arg
\min_x f(x).
\end{eqnarray}



3 Predicting "Next-Day" Rate

Predicting "Next-Day" Rate, one minimizes the expected "next-day" squared deviation $\epsilon_{t+1}$ using the data available at the moment $t$

\begin{eqnarray}
y_{t+1}^0=arg \min_{y_{t+1}} {\bf E}\
\epsilon_{t+1}^2.
\end{eqnarray}


Here

\begin{eqnarray}
{\bf E} \epsilon_{t+1}^2={\bf E}\
(B_{t+1}+\sum_{i=1}^p A_{t+1}(i) a_i(b^0))^2,
\end{eqnarray}


where the optimal parameter $b^0$ was obtained using the data available at the day $t$. Variance (15.20) is minimal, if

\begin{eqnarray}
y_{t+1}^0= B_{t+1}+\sum_{i=1}^p A_{t+1}(i)
a_i(b^0),
\end{eqnarray}


because the expectation of $y_{t+1}$ is $B_{t+1}+\sum_{i=1}^p
A_{t+1}(i) a_i(b^0)$ under the assumptions.

4 Evaluation of ARMA Prediction Errors

We compare the "next-day" ARMA prediction results and a simple Random Walk (RW) model. In RW model the next day value is $y_{t+1}=y_t+\epsilon_{t+1}$ and the predicted value is equal to the conditional expectation of $y_{t+1}=y_t$. This means that the present value is the RW prediction for the next day. There are three reasons to consider RW models Therefore, the difference of average prediction errors of ARMA and RW models can be regarded as some parameter of "ARMA-unpredictability" of the data. That means that if this difference is zero or negative, then the time series are not predictable by the ARMA model. Table 15.1 shows the difference between the mean square deviations of ARMA and RW models.

Table 15.1: The average "next-day" prediction results of ARMA and RW models.
$Data$ $DeltaARMA \%$ $MeanARMA$ $VarARMA$
$/$\pounds)$ - 1.779090e-01 1.293609e+00 8.454827e-02
DM/$ - 1.191e-02 1.092e-01 9.985e-02
Yen/$ - 1.086e+00 6.369e+00 6.446505e+00
Fr/$ - 3.029e-01 4.285e-01 3.395e-01
AT&T -1.375e+00 4.554e+00 3.621e+00
Intel Co +2.814e-01 2.052e+01 1.936e+00
Hermis Bank -4.280e+01 2.374e+01 1.998e+01
London Stock Exchange -5.107e-01 2.751e+02 2.346e+01
Call Center +3.076e+01 8.453e+02 7.111e+02

The table uses four data sets In Table 15.1, the symbol $MeanARMA$ denotes average prediction errors of ARMA model. The symbol $VarArma$ means the variance of ARMA predictions. The symbol $DeltaARMA \%$ denotes the relation
$(RW-ARMA)/RW$ in percentages. Here $RW$ defines average errors of the Random Walk. $DeltaARMA\%>0$ means that the ARMA model predicts better.

The data is divided into three equal parts.
The first part is to estimate parameters $a$ and $b$ of an ARMA model using fixed numbers $p$ and $q$.
The best values of $p$ and $q$ are defined using the second part of data.
The third part is to compare ARMA and RW models. The table shows the comparison results.

Table 15.1 shows that the ARMA model predicts all the financial data not better than RW. However, ARMA predicts call rates thirty one percent better then RW. That is a statistically significant difference. The observed deviations between RW and ARMA models of financial data are too small for practical conclusions.

Figure 15.1: The optimal parameters of the ARMA model predicting call rates.
\begin{figure}\begin{codebox}{4.7in}
\begin{verbatim}b[0] = -2.888616e-01
a[0...
...[10] = -1.432491e+03
a[11] = 8.030343e-01\end{verbatim}\end{codebox}\end{figure}
Figure 15.1 shows optimal parameters $b$ and $a$. Here the optimal $q=1$ and $p=12$. The "multi-day" predictions of call rates are shown in Figure 15.2.
Figure 15.2: Multi-day predictions of call rates.
\begin{figure}\centerline{ \epsfig{file=progn.call.eps}
}\protect\end{figure}


4 External Factors

Often we have to predict one main factor that depends on some external factors. This means that we are not interested in future values of external factors while predicting the main one. For example, we predict the stock rate as the main factor depending on the GNP as the external factor. We consider the cases when future values of all the factors are not known. Considering the cases when future values of external factors are known[*] the model should be modified accordingly.

Different symbols are used in ARMA models with external factors. The predicted value is denoted by $\nu(t)$ and the external factors are $\eta(t)=(\eta_1(t),\eta_2(t),...)$. An influence of some external factors may be delayed.

For illustration, consider two-dimensional case, omitting the Moving Average (MA) part. This means the two-dimensional Auto Regressive (AR) model with an external factor $\eta(t)$

\begin{eqnarray}
\nu(t)=\sum_i^p (a_{1i} \nu(t-i)+a_{2i}
\eta(t-d-i))+\epsilon(t),\ p+d < t < T_0.
\end{eqnarray}


Here $d$ is the delay parameter. First we minimize the squared deviation $\sum_t \epsilon(t)^2$ at fixed delay $d$ and then chose the optimal delay $d$. The minimum of a sum (15.22), as a function of parameters $a$, is defined by the system of linear equations. These are obtained from the condition that all the partial derivatives are zero

\begin{eqnarray}
\sum_{t=p+d+1}^{T_0-1} ( \nu(t)-\sum_i^p (a_{1i} \nu(t-i)+a_{2i...
...)+a_{2i}
\eta(t-d-i)) \eta(t-d-i) \nonumber \\
=0,\ i=1,...,p.
\end{eqnarray}


We have to solve $2p$ linear equations with $2p$ variables $a_{1i},a_{2i},\ i=1,...,p$ and obtain the least squares estimates $a_{1i}(d),a_{2i}(d),\ i=1,...,p$ at given $d$.

Sum (15.25), as a function of delay parameter $d$, is not necessarily uni-modal one. Thus to define the exact minimum

\begin{eqnarray}
\nu(t)=\sum_i^p (a_{1i} (d)\nu(t-i)+a_{2i}(d)
\eta(t-d-i))+\epsilon(t),
\\
p+d < t < T_0. \nonumber
\end{eqnarray}


One should consider all the $K(d)$ values of the integer $d$. Here $K(d)$ is the number of "interesting" values of $d$. Therefore, expression (15.25) is an example of multi-modal function in AR models. We mentioned the delay time just to show a way how to include this important factor into a time series model. Later the delay factor is omitted, to simplify the expressions.


1 Missing Data

Suppose that predicting $\nu(t)$ we don't know some previous values $\nu(s),\ s<t$ . Then we replace the missing data by the expected value $\hat{\nu(s)}$ defined as

\begin{eqnarray}
\hat{\nu(s)}=\sum_i^p (a_{1i} \nu(s-i)+a_{2i} \eta(s-i)),\ p < t
< T_0.
\end{eqnarray}


The least squares estimates of the regression parameters $a_{1i},\ a_{2i}$ are obtained using observations before the missing one and minimizing expression (15.22). The same idea can be extended, if two or more values of the factor $\nu$ are missing.

This algorithm is to "fill" the missing data only for the predicted factor $\nu$. We replace missing values of the external factor $\eta$ by the nearest previous value, because we do not predict the external factors in this model.


5 Applying ARMA to External Factors

Omitting the delay $d$ in expression (15.22) we write

\begin{eqnarray}
\nu(t)=\sum_i^p (a_{1i} \nu(t-i)+a_{2i} \eta(t-i))+\epsilon(t),\
p < t < T_0.
\end{eqnarray}


Including this expression into the ARMA model (15.6) one obtains the following expression

\begin{eqnarray}
\nu(t)=\sum_i^p (a_{1i} \nu(t-i)+a_{2i} \eta(t-i))+\nonumber \\
\sum_{i=1}^q b_{i} \epsilon_{t-i} +\epsilon_t ,\ p < t < T_0.
\end{eqnarray}


Extending expression (15.28) to $M$ external factors

\begin{eqnarray}
w^M(t)= \sum_{m=1}^{M}\sum_i^p a_{i}^m w^m(t-i) +\nonumber \\
\sum_{i=1}^q b _{i} \epsilon_{t-i} +\epsilon_t ,\ p < t <
T_0.
\end{eqnarray}


In expression (15.29), the predicted and the external factors are denoted by the same letter $w$ with different upper indices. All the factors are represented as one-dimensional time series of the special type

\begin{eqnarray}
w(Mt)=\sum_{m=1}^{M } \sum_i^p \alpha_{Mi-(M-m)} w(M(t-i)-(M-m)...
...\sum_{i=1}^q b_i \epsilon_{M(t-i)} +\epsilon_{Mt}
,\ p < t < T_0.
\end{eqnarray}


Here $\alpha_{Mi-(M-m)}=a_{i}^m,\ w(M(t-i)-(M-m))=w^m(t),\
m=1,...,M$ and the index $mi=m*i$. Using this expression one applies the software developed for the one-dimensional ARMA model to $M$ external factors. The data file should correspond to the expression (15.30).


6 Auto Regressive Models (AR-ABS)

The method of least squares is very sensitive to large deviations. For example, minimizing squared deviations, a large deviation of one hundred has the some influence as ten thousand small deviations, each equal to one. Therefore, replacement of squares by absolute values is beneficial in some cases.

1 Definitions

We define AR-ABS model as

\begin{eqnarray}
w_{t} = \sum_{i=1}^p a_i w_{t-i}
+\epsilon_t.
\end{eqnarray}


We assume that

\begin{eqnarray}
z_{t-i}=0,\ w_{t-i}=0,\ \epsilon_{t-i}=0,\ \ if\ \ t \le i.
\end{eqnarray}


2 Definition of Residuals

Using equalities (15.31) we define residuals by expressions

\begin{eqnarray}
\epsilon_1& =&w_1, \nonumber \\
\nonumber
\epsilon_2&=&w_2-a...
...\
\nonumber \epsilon_t&=&w_t-a_1 w_{t-1} - ... -a_p w_{t-p} .\\
\end{eqnarray}


Next the sum

\begin{eqnarray}
f(x)= \sum_{t=1}^T \vert\epsilon_t\vert
\end{eqnarray}


is minimized. This explains the acronym AR-ABS meaning that one minimizes a sum of absolute values of residuals.


3 Minimization of Residuals of AR-ABS Models

To minimize $f(x)$ we apply the Linear Programming.

\begin{eqnarray}
\min_{a,u} \sum_{t=1}^T u_t
\\
u_t \ge \epsilon_t,\ t=1,...,T ...
...t \ge -\epsilon_t\ t=1,...,T\\
u_t \ge 0,\ \epsilon_t\ t=1,...,T
\end{eqnarray}


Here
$u=(u_t^1,u_t^2, \ t=1,...,T)$, $a=(a_i^1,a_i^2, \ i=1,...,p)$,
$u_t=u_t^1-u_t^2,\ t=1,...,T)$, $a_i=a_i^1-a_i^2,\ i=1,...,p$,
$u_t^k \ge 0,\ a_i^k \ge 0,\ k=1,2$. The advantage of AR-ABS is the lesser sensitivity to large deviations.


4 Applying AR-ABS to External Factors

Applying AR-ABS models to external factors is similar to that of ARMA models described in Section 5. One just omits the MA parameters $b_i$. Then, from (15.29) one can write

\begin{eqnarray}
w^M(t)= \sum_{m=1}^{M}\sum_i^p a_{i}^m w^m(t-i) +\epsilon_t ,\ p < t <
T_0.
\end{eqnarray}


In expression (15.39), the predicted and the external factors are denoted by the same letter $w$ with different upper indices. All the factors are represented as one-dimensional time series of the special type

\begin{eqnarray}
w(Mt)=\sum_{m=1}^{M } \sum_i^p \alpha_{Mi-(M-m)} w(M(t-i)-(M-m))+\nonumber \\ +\epsilon_{Mt}
,\ p < t < T_0.
\end{eqnarray}


Here $\alpha_{Mi-(M-m)}=a_{i}^m,\ w(M(t-i)-(M-m))=w^m(t),\
m=1,...,M$ and the index $mi=m*i$. Using this expression one applies the software developed for the one-dimensional AR-ABS model to $M$ external factors. The data file should correspond to the expression (15.40).


5 Applying AR models to Linear Regression (LR)

AR models with external factors can be applied to estimate Linear Regression parameters, too. The difference of LR models from those of AR is that in the former case $t$ denotes the test number. It is supposed that the observed values of the main parameter $w^M(t)$ and external parameters $w(t)=(w^m(t), \ m=1,....,M-1)$ at different tests $t_i$ and $t_j$ are independent. For example the diagnosis $w(t_i)$ of some patient $t_i$ does not depend on the health of others patients $t \ne t_i$. In such a case the "memory" $ p=1$ and and expression (15.39) is modified this way

\begin{eqnarray}
w^M(t)= \sum_{m=1}^{M-1} a^m w^m (t) +\epsilon_t,
\end{eqnarray}


where $t=1,...,T$ defines the test number. Here the main and the external factors are denoted by the same letter $w$ with different upper indices where index $w=M$ means the main factor, for example a diagnosis, that is known for $t=1,...,T-1$ and should be predicted for $t=T$. The external factors are supposed to be known for all $t=1,...,T$ Here numbers $t=1,...,T-1$ denote known cases, number $t=T$ denotes a new case, for example a new patient just arrived.

Let us to represent the multi-factorial LR model (15.41) by one-dimensional time series of the special type

\begin{eqnarray}
w(Mt)=\sum_{m=1}^{M-1} a^m w^m(t) +\epsilon_{Mt},
\end{eqnarray}


where $Mt=M*t$. Using this expression one applies the software developed for the one-dimensional AR model to $M$ factors of Linear Regression if data file corresponds to expression (15.42).


6 Simplest Illustration of AR and AR-ABS Models

Assume that

\begin{eqnarray}
w(x)= a x
+\epsilon
\end{eqnarray}


then the least squares estimate is

\begin{eqnarray}
a(N)=\frac {\sum_{i=1}^N w(x_i)} {\sum_{i=1}^N x_i},
\end{eqnarray}


and the ABS estimate is

\begin{eqnarray}
a(N)= arg\min_a \sum_i \vert w(x_i) -a x_i \vert.
\end{eqnarray}


Condition (15.44) provides the minimum of square deviations, condition (15.45) corresponds to the minimum of absolute deviations. In the case (15.45) one minimizes a piece-wise linear function with breaking points $a_i=w(x_i)/x_i$. Therefore optimum is obtained at the point $a_i$ that minimizes the sum of (15.45). In one-dimensional cases this can be done by comparing all values of $a_i,\ i=1,...,N$. In general, minimization of (15.45) is performed by linear programming (15.35).


7 Artificial Neural Networks Models (ANN)

With respect to the Moving Average parameters $b$, the ARMA model is non-linear (see expression (15.29)). With respect to data, this model is linear. If we are interested in the non-linearities, then we may apply many other non-linear models, including the ones that are non-linear with respect to data. In this book, we discuss two of them. Here the ANN model will be considered. In the next chapter, we introduce a bilinear term into the ARMA model.

1 Involving Auto Regression (AR) into ANN

We apply ANN by involving the non-linear activation function $\phi$ into the standard Auto-Regression (AR) model

\begin{eqnarray}
w_t = \phi (\sum_{i=1}^p a_i w_{t-i})+\epsilon_t.
\end{eqnarray}


The idea lurking behind ANN-AR model is that the activation function $\phi$ roughly represents the activation of a real neuron. We minimize the sum

\begin{eqnarray}
f_m(x)=\sum_{t=1}^T \epsilon_t^2.
\end{eqnarray}


Here the objective $f_m(x)$ depends on $l$ unknown parameters given as a $l$-dimensional vector $x=(x_k, k= 1,...,p)=(a_i,
i=1,...,p)$.

One see from expression (15.46) that residuals $\epsilon_t$ are non-linear functions of parameters $a_t$[*]. This means that the minimum conditions

\begin{eqnarray}
{\partial f_m(x) \over
\partial a_i} =0,\ i=1,...,p
\end{eqnarray}


is a system of non-linear equations with multiple solutions.

An interesting activation function is derived using the Gaussian distribution function

\begin{eqnarray}
\phi(w_t(l))=\frac{\beta}{\sqrt{2\phi}\sigma}\int_{-\infty}^{w_t(
l)} e^{-\frac{w-\mu}{\sigma}} dw.
\end{eqnarray}


Here $w_t(l)=\sum_{i=1}^l a_i w_{t-i}$. $\beta$ is a scale parameter. The function (15.49) is different from the activation of a real neuron[*]but is convenient for analysis. Here sum (15.46) depends on the parameters $\beta,\ \sigma, \ \mu$, too. These parameters are unknown, as usual. Therefore, they should be optimized. Therefore, the objective $f_m(x)$ depends on $p+3$ unknown parameters represented as a $p+3$-dimensional vector $x=(x_k, k= 1,...,p+3)=(a_i, i=1,...,p, \beta, \sigma, \mu)$. That is the main difference of model (15.49) from the traditional ANN models.

In the traditional ANN models the activation functions are selected by their resemblance to the natural ones from biophysical experimentation. In this research the resemblance factor is neglected. The activation function (15.49) is considered just as a reasonable non-linearity that should be adapted to the available data. The multi-modality problems of ANN models are discussed in [Mockus et al., 1997].


8 Bilinear Models (BL)

It is well known that for the adequate description of some phenomena additional non-linear terms of the time series should be included. A simple example is a bilinear term (see [Rao and Gabr, 1984,Liu, 1989]). Here bilinear time series extend the ARMA model:

\begin{eqnarray}
w_{t} = \sum_{i=1}^p a_i w_{t-i} +\sum_{i=1}^q b_i \epsilon_{t-...
...sum_{i=1}^s \sum_{j=1}^r c_{ij}
z_{t-i}\epsilon_{t-j}+\epsilon_t.
\end{eqnarray}


An illustrative example is in [Mockus et al., 1997].


9 Auto Regressive Fractionally Integrated
Moving Average Models (ARFIMA)


1 Definitions

ARMA, ANN, and BL models consider stationary time series . The stationarity of a model is a simplification of reality. A well-known source of non-stationary behavior is the linear component, the trend. One eliminates the trend by differentiation, since derivatives of linear functions are constant. The elegant extension of this idea is the Auto Regressive Fractionally Integrated Moving Average model (ARFIMA).

We define an ARFIMA[*] process as the following time series $z_t$[*]

\begin{eqnarray}
A(L)(1-L)^d z_t= B(L) \epsilon_t.
\end{eqnarray}


Here

\begin{eqnarray}
A(L) w_t = w_t -\sum_{i=1}^p a_i w_{t-i}
\end{eqnarray}


and

\begin{eqnarray}
B(L) \epsilon_t = \epsilon_t -\sum_{i=1}^q b_i \epsilon_{t-i},
\end{eqnarray}


where $\epsilon_t= Gaussian\ \{0, \sigma^2\}$ . We define the transformation $(1-L)^d$ as follows:

\begin{eqnarray}
w_t=(1-L)^d z_t = z_t-\sum_{i=1}^{\infty} d_i z_{t-i}.
\end{eqnarray}


Here

\begin{eqnarray}
d_i= {\Gamma (i-d) \over \Gamma(i+1) \Gamma(-d)},
\end{eqnarray}


where $d$ is a fractional integration parameter, and $\Gamma(.)$ is a gamma function. We assume that

\begin{eqnarray}
z_{t-i}=0,\
w_{t-i}=0,\ \epsilon_{t-i}=0,\ \ if\ \ t \le i.
\end{eqnarray}


We truncate sequence (15.54)

\begin{eqnarray}
d_i=0,\ \ if\ \ i > R.
\end{eqnarray}


Here $R$ is the truncation parameter, the number of non-zero components.

2 Minimization of Residuals

Minimization of Residuals We define residuals by recurrent expressions:

\begin{eqnarray}
\epsilon_1& =&w_1, \\
\nonumber \epsilon_2&=&w_2-a_1 w_1 + b...
... -a_p w_{t-p} + b_1
\epsilon_{t-1} + ... + b_q \epsilon_{t-q}.\\
\end{eqnarray}


Next, the sum

\begin{eqnarray}
f(x)=\log f_m(x),\
\ f_m(x)= \sum_{t=1}^T \epsilon_t^2
\end{eqnarray}


is minimized.

The logarithm is used to decrease the objective variation by improving the scales. The objective $f_m(x)$ depends on $m=p+q+1$ unknown parameters. They are represented as an $m$-dimensional vector $x=(x_k, k= 1,...,m)=(a_i, i=1,...,p, b_j,
j=1,...,q, d)$.

It follows from (15.59), (15.54), and (15.52) that residuals $\epsilon_t$ are linear functions of the parameters $a_t$. This means that the minimum conditions

\begin{eqnarray}
{\partial f_m(x)
\over \partial a_i} =0,\ i=1,...,p
\end{eqnarray}


are given by a system of linear equations. They estimate linear parameters $a_i=a_i(b, d)$ as a function of non-linear ones $b_i, i=1,...,q, d$. It reduces the number of parameters of non-linear optimization to $n=q+1$.

The system

\begin{eqnarray}
{\partial f_m(x) \over \partial b_i} =0,\ i=1,...,q
\end{eqnarray}


may have a multiple solution, because the residuals $\epsilon_t$ depend on $b_i$ as polynomials of degree $T-1$.

The equation

\begin{eqnarray}
{\partial f_m(x) \over \partial d} =0
\end{eqnarray}


may have multiple solutions, too, because the residuals depend on $d$ as a polynomial of degree $R$, where $R$ is a truncation parameter.

The objective $f_m(x)$ is a multi modal function of parameters $d$ and $b_i, i=1,...,q$ [*]. Therefore, one uses methods of global optimization (see, [Mockus et al., 1997]). Denote

\begin{eqnarray}
f(x)= \log f_{q+1}(x).
\end{eqnarray}


Here

\begin{eqnarray}
f_{q+1}(x)= f_m(x),\ x_j=b_j, j=1,...,q,\ x_{j+1} =d, \nonumber \\ x_{j+1+
i} =a_i(b,d), \ i=1,...,p. \nonumber
\end{eqnarray}


This means that condition (15.61) defines those $x$- components that represent parameters $a_i, i=1,...,p$.

There is no variance $\sigma^2$ in expressions (15.64) and (15.59). If necessary, we have to estimate the variance by another well-known technique.

3 Discussions

Table 15.2 shows results obtained using the ARFIMA model [Mockus et al., 1997]. Parameters $b$ and $d$ were estimated using daily exchange rates of $/£, and DM/$, and closing rates of AT&T and Intel Co shares.

Table 15.2: Estimated parameters $b$ and $d$ of ARFIMA models.
$Data$ $b_0$ $b_1$ $d$ $\min \log f(x)$
$/$\pounds $ -1.195 -0.169 0.0005 1.51675
DM/$ -1.019 0.0120 0.0007 1.60065
AT&T -1.017 0.0118 0.00005 9.83208
Intel Co 0.9975 0.0055 0.012 7.35681

Table 15.2 shows that $d$'s are very close to zero (see also Figures 15.3). An exception is Intel Co shares.
Figure 15.3: Log-sum (15.60) as a function of the parameter $d \in [-0.01,0.5]$. Top figure shows $/$\pounds $ exchange rates. Bottom one shows Intel Co stocks closing rates.
\begin{figure}\centerline{ \epsfig{file=lbd.eps,height=8.0cm,width=12cm}
}\vspa...
...enterline{ \epsfig{file=intd.eps,height=8.0cm,width=12cm}
}\protect\end{figure}
Hence, following the traditional approaches [Cheung, 1993,Yin-Wong and Lai., 1993,Cheung and Lai, 1993] one concludes that the underlying stochastic processes generating exchange rates do not exhibit persistence and are stationary. That contradicts the visual impression of the corresponding data (see Figures 15.4 and 15.10).

This apparent contradiction may be resolved by dropping the assumption that the parameters $a,b,d$ of the ARFIMA model remain constant. This assumption is common for most of the traditional methods. An alternative is the structural stabilization model described in Chapter 11.


10 Multi-Step Predictions

Often one wants to predict for several days (hours, weeks, months, e.t.c) ahead. This is the Multi-Step Predictions (MSP). One can do that by using a Monte Carlo simulation. The residuals $\epsilon_t$ (see expression (15.59)) are determined up to the simulation starting moment $t(s)$ using the observed data. The rest of residuals $\epsilon_t,\ t \ge t(s)$ are generated by a Gaussian distribution with zero mean and variance $\sigma^2$. The simulation is repeated $K$ times. Considering external factors one just replaces them by the nearest previous values (see Section 4.1).

The illustration is in Figure 15.2. The line $call.rate.actual$ shows the observed call rate. Lines
$call.rate.min$, $call.rate.mean$, and $call.rate.max$ show the minimal, the average, and the maximal results of MSP predictions. The "min" and "max" lines denote the lower and the upper values of simulation. Therefore, these lines are called "MSP- confidence intervals," meaning that if the model is true, one may expect those "intervals" to cover the real data with some "MSP-confidence level" $\alpha(MSP)$.

It is difficult to define $\alpha(MSP)$ exactly. If "interval deviations" may be considered as independent and uniformly distributed random variables, we obtain $\alpha(MSP)=1-K$. Here $K$ is the number of Monte-Carlo repetitions. In the example, $K=10$, thus $\alpha(MSP)=0.9$.

This assumption over-simplifies the statistical model. Therefore, the "MSP-confidence level" $\alpha(MSP)$ is just a Monte Carlo approximation.


11 Structural Stabilization

1 Stabilization of Structures of Time Series

The traditional time series models assume that their parameters do not change. Examples are parameters $a,b$ of the ARMA model, parameters $a,b,d$ of the ARFIMA model, parameters $a,b,c$ of the BL model, and parameters $a,\beta, \sigma,\mu$ of the ANN model. Natural changes of economical conditions introduce a variability of parameters of models describing the financial data, such as stock rates or currency exchange rates. The variability remains in the stable economical conditions, too. The reason is the "Feed-Back" processes. It is well known that the stock rate predictions may influence the supply and demand and consequently the future rates. In such cases the statistical best fit models should be complemented by the game-theoretical equilibrium models. There is the possibility that the equilibrium model would be namely the Wiener process. That means that market rates in economics are governed by models similar to the Brownian motion. In such a case market rates would be as unpredictable as movements of individual molecules in gases. This is a working hypothesis supported by available data. Table 15.1 shows that the predictions of the financial data by the simple Wiener process, called as the Random Walk (RW) model, are as good as that of the sophisticated ARMA models.

The objective of traditional time series models is to define such parameters that minimize a deviation from the available data. One may call them as the best fit models. The goodness of fit is described by continuous parameters $C$ called as state variables. For example, in the ARMA model (see expression (15.1)) the state variables are
$C=(a_i,\
i=1,...,p,\ b_j,\ j=1,...,q)$.

If the parameters remain constant, then models that fit best to the past data will predict the future data as well. Otherwise, the best fit to the past data can be irrelevant or even harmful for predictions. Therefore, one needs models which are not sensitive to the changes of parameters. Such models may predict the uncertain future better by eliminating the nuisance parts from the structure of the model.

Trying to solve this problem, one introduces a notion of the model structure. The model structure is determined by the Boolean parameters $S$ called as structural variables. A structural variable is equal to unit, if the corresponding component of time series model is included. Otherwise, the structural variable is equal to zero.

For example, in the ARMA model $S=(s_i^a,\ i=1,...,p,\ s_j^b,\ j=1,...,q)$. Here $s_i^a=1$, if the parameter $a_i$ is included into the ARMA model. Otherwise, $s_I^a=0$ [*]. We search for such structure $S$ of the model that minimizes the prediction errors in the changing environment. To achieve this we divide available data $W=(w_t, \ t=1,...,T)$ into two parts $W_0=(w_t,\ t=1,....T_0)$ and $W_1=(w_t\
t=T_0+1,...,T)$.

The first part $W_0$ is to estimate continuous parameters $C=C(S)$ that depends on Boolean structural parameters $S$. The estimates are obtained for a set of all feasible $S$ by minimizing the least square deviation using data $W_0$.

The second part $W_1$ is used to select such $S$ that minimize the least square deviation. This means that the second part $W_1$ is to estimate Boolean structural parameters.

Denote by $R_t(S,C,W)$ the predicted value of a model $R$ with fixed parameters $S,C$ using the data $(w_1,...,w_{t-1})\subset W$. The difference between the prediction and the actual data $w_t$ is denoted by $\epsilon_t(S,C,W)=w_t-R_t(S,C,W)$. Denote by $C_0(S)$ the fitting parameters $C$ which minimize the sum of squared deviations $\Delta_{0,0}(C,S)$ using the first data set $W_0$ at fixed structure parameters $S$.

\begin{eqnarray}
\Delta_{0,0}(C,S)= \sum_{t=1}^{T_0} \epsilon_t^2(S,C,W),\\
C_{0}(S)= \arg \min_C \Delta_{0,0}(C,S).
\end{eqnarray}


We stabilize the structure $S$ by minimizing the sum of squared deviations $\Delta_{1,0}(S)$ using the second data set $W_1$ and the fitting parameters $C_{0}(S)$ that were obtained from the first data set

\begin{eqnarray}
\Delta_{1,0}(S)= \sum_{t=T_0+1}^T \epsilon_t^2(S,C_0(S),W),\\ S_1=
\arg \min_S \Delta_{1,0}(S).
\end{eqnarray}


This is a way to reach a tradeoff between the fitting parameters and the structural ones. The fitting parameters $C_0(S)$ provide the best fit to the first data set $W_0$ at fixed structure $S$. One stabilizes the structure $S$ by minimizing the prediction errors for the second set of data $W_1$ using the fitting parameters $C_0(S)$. Here stabilization is achieved because the fitting parameters are defined by the first set of data $W_0$. The stabilized structure $S=S_1$ of $R$ is obtained by eliminating unstable[*] parameters and parts of the time series.

We consider two data sets $W_0$ and $W_1$ just for simplicity. One may partition the data $W$ into many data subsets $w_t \in W_k,\ t \in T_k,\ k=1,...,K, \ \cup_k W_k=W,\
\cup_k T_k = T$. In this case we minimize the sum

\begin{eqnarray}
S_{K}=\arg \min_S \Delta(S),
\end{eqnarray}


where

\begin{eqnarray}
\Delta(S)= \sum _{k,l} \Delta_{k,l}(S). \end{eqnarray}


Here

\begin{eqnarray}
\Delta_{k,l}(S)= \sum_{t \in T_k} \epsilon_t^2(S,C_l(S),W).
\end{eqnarray}


Note, that dividing the data $W$ into many parts one may obtain sequences $T_k$ too short for the meaningful estimate of parameters $C_k$, if $T$ is not large. If $T$ is large, one may expect that most of the fitting parameters $C_k$ would be different in different data subsets $W_k$. Thus, they will be eliminated by the stabilization procedure (15.69). Therefore, $K=2$ seems a reasonable number, at the first stabilization attempt. One may try $K >2$ later.

The idea of the structural stabilization follows from the following observation. The best estimate of time series parameters, using a part $W_0$ of the data $W=W_0 \cup
W_1$, is optimal for another part $W_1$ only if all the parameters remain the same. Otherwise, one may obtain a better estimate by elimination of the changing parameters from the model. For example, in the case of changing parameters $(a_i,\ i=2,...,p,\
b_j,\ j=1,...,q)$ of the ARMA model, the best prediction is obtained by elimination of all these parameters, except $a_1=1$ (see Table 15.1).

2 Simple Example

Consider, for illustration, a simple example.

\begin{eqnarray}
w_t=a_1(t)w_{t-1}+a_2(t)w_{t-2}+\epsilon_t,\ t=3,4,5.
\end{eqnarray}


The observed values are $w_1=-1,\ w_2=w_3=1,\ w_4=2,\ w_5=1$. The first data set $W_0={w_i,...,w_4}$ is to estimate continuous parameters $C=(a_1,a_2)$. The second data set $W_1={w_5}$ is to estimate Boolean parameters $S=(s_1^a,s_2^a), \ s_i^a={0,1}$. Equality to zero $s_i^a=0$ suggests the elimination of the continuous parameter $a_i$. There are three feasible $S$, namely $S_1=(0,1),\ S_2= (1,0),\ S_3=
(1,1)$[*].

Assume that unknown parameters depend on $t$ this way

\begin{eqnarray}
a_1(t)&=&\cases {2, &if $t=3$, \cr 1,&if $t=4$,\cr 0, &if $t=5$,\cr}
\end{eqnarray}


\begin{eqnarray}
a_2(t)=1, \nonumber\\
\epsilon(t)=0. \nonumber
\end{eqnarray}


In the case $S=S_3$, the least square estimates are $a_1(S_3)=1.5$ and $a_2(S_3)=0.5$. The prediction is $w_5(S_3)=3.5$.
If $S=S_1$, the least squares estimate of the only remaining parameter is $a_2(S_1)=0.5$. The prediction is $w_5(S_1)=0.5$.
In the case $S=S_2$, the least square estimate is $a_1(S_2)=1.5$ and the prediction is $w_5(S_2)=3.0$.

The best prediction $w_5(S_1)=0.5$ is provided by the structure $S_1=(0,1)$. The reason is obvious: this structure eliminates the highly unstable parameter $a_1$. Applying the structural stabilization one eliminates the nuisance parameter $a_1(t)$ and simplifies AR model (15.72)

\begin{eqnarray}
w_t=a_2(t)w_{t-2}+\epsilon_t,\ t=3,4,5. \end{eqnarray}


Note, that we eliminate the larger parameter $a_1$ ( $a_1(S_2) =
1.5 > a_2(S_2)=0.5$) because it changes (see expression (15.73)).


3 Examples of Structural Optimization with
External Factors

Various examples of structural optimization are described in [Mockus, 1997]. Here the structural optimization is considered in time series models with external factors.

If predictions depend on several factors, the multi-dimensional ARMA should be used. Denote by $\nu(Mt),\ t \ge
p/M$ the main statistical component and by $\nu(Mt-i),\ i=1,..,p
-1$ the external factors. One extends the traditional ARMA model this way

\begin{eqnarray}
\nu(Mt)=\sum_{i=0}^{p-1} a_{i} \nu(Mt-i-1)+\sum_{j=0}^{q-1} b_j
\epsilon(M(t-j-1))+\epsilon(Mt),
\\ p/M \le t \le
T0 \nonumber \end{eqnarray}


Here the number of the Auto Regressive (AR) components is denoted by $p$ and the number of the Moving Average (MA) ones is denoted by $q$. The continuous variables $a$ and $b$ define the state of the ARMA model. We call them the state variables. The discrete parameters $p$, $q$, and $t0$ define the structure. One calls them the structural variables. The structural variable $t0$ defines the time when one starts scanning the time series for the optimization of state variables $a,b$. Denote by $T0$ the scanning end.

One minimizes the squared deviation at fixed structural variables $p,q,t0$

\begin{eqnarray}
\Delta_{00}(p,q,t0)=\sum_{t=t0}^{T0} \epsilon(Mt)^2,\ t0 \ge
p/M.
\end{eqnarray}


Here $T0<T1<T$ and $\epsilon(Mt)^2$ is from expression (15.75). Denote the optimal values $a=a(p,q,t0)$ and $b=b(p,q,t0)$.

If $b$ is fixed, the optimal values of $a=a^b$ are defined by a system of linear equations. These equations follows from the condition that all the partial derivatives of sum (15.75) are equal to zero $\partial \Delta_{00}/\partial {a_i}=0,\ i=0,...,p-1$. Therefore, to obtain the least squares estimates $a_i=a_i^b,\ i=0,...,p-1$ at given $b$ one solves $p$ linear equations with $p$ variables $a_{i},\
i=0,...,p-1$ (see Chapter 3 for details).

To optimize discrete structural variables $p,q,t0$ one uses another data set. It starts at $T0$ and ends at $T1$. During optimization of structural variables one keeps the best fitting values of state variables $a=a(p,b,t0)$ and $b=b(p,q,t0)$. These values are obtained by minimization of the sum $\Delta_{01}(p,q,t0)$. The data is from $t0$ to $T0$. Here the sum

\begin{eqnarray}
\Delta_{01}(p,q,t0)=\sum_{t=T0}^{T1} \epsilon(Mt)^2.
\end{eqnarray}



12 Examples of Squared Residuals Minimization

1 Multi-Modality Examples

Consider these examples: The following figures show the data and optimization results. The optimization results show how least square deviations depend on parameters $b$ of ARMA models. The $3D$ optimization results are in two forms: as surfaces and as contours.

Figures 15.4, 15.5, and 15.6 consider exchange rates of $/$\pounds $ and DM/$.
Figures 15.7, 15.8, and 15.9 consider exchange rates of yen/$ and franc/$.
Figures 15.10 reflect closing rates of AT&T (top) and Intel Co.(bottom) stocks.
Figures 15.13 , 15.14, and 15.15 consider the London stock exchange index.
Figures 15.16, 15.17, and 15.17 shows stock rates of the Hermis bank and optimization results.
Figures 15.18 and 15.19 show the daily call rates of a call center and illustrate optimization results.

Figure 15.4: Daily exchange rate of $/£(top figure) and DM/$ (bottom figure) starting from September 13, 1993.
\begin{figure}\centerline{
\psfig{figure=fedreslb.eps,height=8.0cm,width=12cm}
...
...line{
\psfig{figure=fedresdm.eps,height=8.0cm,width=12cm}
}\protect\end{figure}

Figure 15.5: Exchange rates of £/$: surface $f$ depending on parameters $b_0,\ b_1$ (top), contours of $f$ depending on parameters $b_0,\ b_1$ (bottom).
\begin{figure}\centerline{ \epsfig{file=plot.surf.lb.eps, height=8.0cm, width=12...
...{ \epsfig{file=plot.cont.lb.eps, height=8.0cm, width=12cm}
}\protect\end{figure}

Figure 15.6: Exchange rates of DM/$: surface of $f$ depending on parameters $b_0,\ b_1$ (top), contours of $f$ depending on parameters $b_0,\ b_1$ (bottom).
\begin{figure}\centerline{ \epsfig{file=plot.surf.dm.eps, height=8.0cm, width=12...
...{ \epsfig{file=plot.cont.dm.eps, height=8.0cm, width=12cm}
}\protect\end{figure}
Figure 15.7: Exchange rates: yen/$ (top), franc/$ (bottom).
\begin{figure}\centerline{ \epsfig{file=yen.rate.eps, height=8.0cm, width=12cm}
...
...ne{ \epsfig{file=franc.rate.eps, height=7.5cm, width=12cm}
}\protect\end{figure}
Figure 15.8: Exchange rates of fr/$, surface $f$ depending on parameters $b_0,\ b_1$ (top), contours of $f$ depending on parameters $b_0,\ b_1$ (bottom).
\begin{figure}\centerline{ \epsfig{file=plot.surf.fr.eps, height=8.0cm, width=12...
...{ \epsfig{file=plot.cont.fr.eps, height=7.0cm, width=12cm}
}\protect\end{figure}
Figure 15.9: Exchange rates of yen/$, surface $f$ depending on parameters $b_0,\ b_1$ (top), contours of $f$ depending on parameters $b_0,\ b_1$ (bottom).
\begin{figure}\centerline{ \epsfig{file=plot.surf.yen.eps, height=8.0cm, width=1...
... \epsfig{file=plot.cont.yen.eps, height=7.0cm, width=12cm}
}\protect\end{figure}
Figure 15.10: AT&T (top) and Intel Co.(bottom) stocks closing rates starting from August 30, 1993.
\begin{figure}\centerline{\psfig{figure=attclosing.eps,height=2.7in,width=4.4in
...
...nterline{
\psfig{figure=intclosing.eps,height=2.7in,width=4.4in}
} \end{figure}

Figure 15.11: Closing rates of AT&T stocks: surface $f$ depending on parameters $b_0,\ b_1$ (top), contours of $f$ depending on parameters $b_0,\ b_1$ (bottom).
\begin{figure}\centerline{ \epsfig{file=plot.surf.att.eps, height=8.0cm, width=1...
... \epsfig{file=plot.cont.att.eps, height=7.0cm, width=12cm}
}\protect\end{figure}
Figure 15.12: Closing rates of Intel Co stocks: surface of $f$ depending on parameters $b_0,\ b_1$ (top) contours of $f$ depending on parameters $b_0,\ b_1$ (bottom).
\begin{figure}\centerline{ \epsfig{file=plot.surf.intel.eps, height=8.0cm, width...
...epsfig{file=plot.cont.intel.eps, height=7.0cm, width=12cm}
}\protect\end{figure}
Figure 15.13: London stock exchange index.
\begin{figure}\centerline{ \epsfig{file=birza.eps, height=8.0cm, width=12cm}
}\protect\end{figure}
Figure 15.14: London stock exchange index, surface of $f$ depending on parameters $b_0,\ b_1$ (top), contours of $f$ depending on parameters $b_0,\ b_1$ (bottom).
\begin{figure}\centerline{ \epsfig{file=plot.surf.birza.eps, height=8.0cm, width...
...epsfig{file=plot.cont.birza.eps, height=7.0cm, width=12cm}
}\protect\end{figure}
Figure 15.15: London stock exchange index: deviation $f$ depending on parameter $b_0$, uni-modal part (top), multi-modal part (bottom).
\begin{figure}\centerline{ \epsfig{file=plot1w07.birza.eps, height=8.0cm, width=...
...{ \epsfig{file=plot1w.birza.eps, height=8.0cm, width=12cm}
}\protect\end{figure}
Figure 15.16: Closing rates of stocks of the Hermis bank (private) and the Lithuanian Savings bank (state owned).
\begin{figure}\centerline{ \epsfig{file=bank.rate.eps, height=8.0cm, width=12cm}
}\protect\end{figure}
Figure 15.17: Closing rates of Hermis bank stocks: surface of $f$ depending on parameters $b_0,\ b_1$ (top) contours of $f$ depending on parameters $b_0,\ b_1$ (bottom).
\begin{figure}\centerline{
\epsfig{file=plot.surf.bank.eps, height=8.0cm, width...
...\epsfig{file=plot.cont.bank.eps, height=7.0cm, width=12cm}
}\protect\end{figure}
Figure 15.18: Call rates.
\begin{figure}\centerline{
\epsfig{file=callrates.eps, height=8.0cm, width=12cm}
}\end{figure}
Figure 15.19: Call rates: surface of $f$ depending on parameters $b_0,\ b_1$ (top), contours of $f$ depending on parameters $b_0,\ b_1$ (bottom).
\begin{figure}\centerline
{ \epsfig{file=plot.surf.call.eps, height=8.0cm, widt...
...\epsfig{file=plot.cont.call.eps, height=7.0cm, width=12cm}
}\protect\end{figure}

To estimate unknown ARMA parameters we minimize a log-sum of squared residuals defined by expression (15.60). Estimates of parameters $a$ are from expression (15.13). In most of the cases, figures show the multi-modality of log-sum (15.60), as a function of parameters $b_0,\ b_1$. Areas in vicinity of the global minima, often appear flat. A reason is that differences between values of the deviation function $f$ in an area around the global minimum are smaller as compared with these outside this area (see Figures 15.15).

2 Optimization Results

The ARMA model optimization results are the points $b$ and $a$ (see Figure 15.1). These results were defined using a sequence of two global methods called $BAYES1$ and $EXKOR$ [Mockus et al., 1997]. $BAYES1$ denotes a search by a multi-dimensional Bayesian model [Mockus et al., 1997]. The best result obtained by $BAYES1$ after fifty iterations is a starting point for local optimization. The local optimization is by one-dimensional coordinate search $EXKOR$ [Mockus et al., 1997], using sixty iterations.

The maximal number of Auto Regressive (AR) parameters $p$ was $10*M$. Here $M=1$, if no external factors are involved. The optimal number $p$ is defined by structural stabilization (see Chapter 11). Only two Moving Average (MA) parameters $q=2$ were considered while plotting surfaces and contours. The results of Table 15.1 were obtained by optimization of both structural variables $p$ and $q$.

The objective of this part of research is to show a multi-modality arising in the prediction problem. Therefore, to save the computing time, the global optimization is carried out approximately, using not many iterations. The results of global optimization are starting points for local optimization. The reason is that the squared deviation as a function of parameters $b$ becomes uni-modal near the global minimum (see Figures 15.15). Therefore, the results are at least as good as those obtained by the traditional local optimization.

The high-accuracy global optimization is very expensive. In the global optimization, the computing time is an exponential function of accuracy $m$, in the sense that $\epsilon \le 2^{-m}$. Therefore, the problem of future investigations is how to balance computing expenses and accuracy of estimates. This task is important in both the time series prediction and the global optimization.

The investigation of multi-modality of squared deviation and variability of the parameters is the natural first step. The multi- modality is involved in non-linear regression models, including the ARFIMA ones[*].


13 Software Examples


1 C Version of ARMA Software (ARMAC)

The call rates depend on several factors what is usual in statistical prediction problems. That is the main difference from the traditional ARMA and ARFIMA software for the exchange rate prediction [Mockus et al., 1997]. The ARFIMA software and applications are described in [Mockus et al., 1997].

We consider a version of an extended ARMA model that includes external factors (see expressions 15.28 and 15.29). The programs are in the files 'main.C', 'fi.C', and 'fitimeb.h' on the web-site (see Section 4). The first version of ARMA software is designed for data sets with no future data. That means that no future factors are not known. Therefore, the external factors are treated as missing data. It is assumed that future values of external factors are equal to the last ones. In the ARMA software under development, this is considered as the default case. In the new software version, the known future values of external factors will replace the default ones.

The results of a simple test are in the files 'test.out' and 'test.progn.out'. The data is in the file 'test.data', the initiation file is 'test.ini'. The results of the call rate example are in the files 'call.out' and 'call.progn.out'. The data is in the file 'call.data'. The initiation file is 'call2.ini'. The names of the data files are referred as INP in the initiation file INIFILE. The software is compiled and run this way:

Compile by 'make onestep',
Run by 'onestep > results.out'

1 Illustrative Example of ARMA Software: Stock Rate Prediction

Figures 15.20 and 15.21 show the part of the file 'fitimeb.h' defining the control parameters and data files.
Figure 15.20: Example of the file 'fitimeb.h', part 1.
\begin{figure}\begin{codebox}{4.7in}
\begin{verbatim}...
Figure 15.21: Example of the file 'fitimeb.h', part 2.
\begin{figure}\begin{codebox}{4.7in}
\begin{verbatim}...
The control parameters and the optimization results (in the case INP 0) are shown in Figure 15.22.
Figure 15.22: Example of the file 'results.out.'
\begin{figure}\begin{codebox}{4.7in}
\begin{verbatim}Number of Factors M 2
Mul...
....054141e+01
Finish progn_err 2.438730e+01\end{verbatim}\end{codebox}\end{figure}
The comments in Figures 15.20 and 15.20 are short. Here are some additional explanations

2 Illustrative Example of ARMA Software: Simple Test File

The test file is defined by 'test.ini':

INP test2
COL 1
COL 2
Here INP defines the data file 'test2'
1.   1.
2.   1.
1.   1.
2.   1.
1.   1.
2.   1.
1.   1.
2.   1.
1.   1.
Here COL 1 means that the first column of the file 'test2' should be considered as the factor to be predicted. COL 2 indicates the second column as an external factor.

A fragment of the optimization results is shown in the Figure 15.28.

Figure 15.28: A fragment of the control parameters and optimization results of the simplest illustration.
\begin{figure}\begin{codebox}{4.7in}
\begin{verbatim}...
Here 'delta_perc' denotes the error in percents of the "Random Walk" prediction. 'delta_mean' is the average error. The results of optimization are printed as $Po,Qo,ao,bo$. Predicted results are in the file
'progn.out.old' (see Figure 15.29).
Figure 15.29: Example of the prediction results of the simplest illustration.
\begin{figure}\begin{codebox}{4.7in}
\begin{verbatim}progn t ymin[M*t] yav[M*t...
...00000e+00 2.000000e+00
progn 0.000000e+00\end{verbatim}\end{codebox}\end{figure}
The minimal $ymin$, the average $yav$, and the maximal $ymax$ predicted values are equal. Therefore, the prediction variance $progn$ is zero.

2 Java version of ARMA Software (ARMAJ)

The Java version of ARMA software (ARMAJ) [Kuzminaite, 1999] implements the same algorithms as the C++ version ARMAC. The ARMAJ is on the web-sites (see Section 4) and can be run by remote users.

1 Users Guide

The applet 'index.html' is started by a browser, for example, by Netscape 4.6, or by the appropriate appletviewer. One clicks the button 'Show' (see the top Figure 15.30) to open the main window 'ARMA Frame' (see the bottom Figure 15.30). There are four buttons: 'File,' 'Input,' 'Options,' and 'Output' which open corresponding windows.

File
is for data input.
There are two fields: 'INI File' and 'Working directory or URL.'
There is the button 'Browse...' and two options:
'Local file' and 'Local URL.'

The option 'Local file' activates the 'Browse...' button to select some local file.
The option 'Local URL' closes this button. Then the contents of fields 'INI File' and 'Working directory or URL' determine the data file.
The file name, for example, 'arma.ini,' is in the field 'INI File.'
The file 'arma.ini' controls the data input from the test file 'arma.test.' The directory is in the field 'Working directory or URL.'
If the 'Local URL' option is on, then the directory is, for example, this:

http:/optimum.mii.lt/~jonas/armajavaj
The default URL is the applet directory.
Input
is to change the default values of parameters (see the top Figures 15.31). The list of parameters is the same as in the C file 'fitimeb.h.'
Options
are to select options (see the top Figure 15.32):
Output options
defines the output file,
None
means no output,
System output
means console output,
Frame
defines output into separate windows,
File
outputs data into local files[*]
Graphic options
involves graphics.
Output
is for data output (see the bottom Figure 15.32):
one starts computations by clicking the button 'Calculate',
the message ' ARMA Frame: calculated' indicates the end.
Some input and output operations are not permitted by the 'Security Manager.' Using the 'appletviewer,' it is possible to bypass the 'Security Manager.' For example, in the file $/.hotjava/properties$ one writes:
acl.read=/home/jonas/public\_html/armajavaj/arma.ini
acl.write=/home/jonas/public\_html/armajavaj/arma.out
That permits to read from 'arma.ini' and to write into 'arma.out.'

The file $arma\_all.jar$ is the 'jar' archive including all the 'class' files.
$source.jar$ or $source.zip$ are archives of 'java' files.
'index.html' is a starting applet.
'arma.ini' is the input control file.
'armatest' is the test data file.

2 Illustrations

The top Figure 15.30 shows the applet sign with the 'Show' button that starts ARMAJ. The bottom Figure 15.30 shows the initial window where the input file 'arma.ini' is defined. button that starts the ARMAJ.

Figure 15.30: The applet sign (top figure), and the initial window defining the input file 'arma.ini'(bottom figure).
\begin{figure}\centerline{ \epsfig{file=arma.show.eps,height=4.0cm,width=6.0cm}
...
...rline{ \epsfig{file=arma.ini.eps,height=8.0cm,width=12cm}
}\protect\end{figure}

Figures 15.31 show the list of default values of control parameters, similar to those in the ARMA C++ version. The upper part of the list is on the top figure, the middle part is on the bottom one.

Figure 15.31: The list of default values of control parameters, the upper part is on the top figure, the middle part is on the bottom one.
\begin{figure}\centerline{ \epsfig{file=arma.inp1.eps,height=8.0cm,width=12.0cm}...
...line{ \epsfig{file=arma.inp2.eps,height=8.0cm,width=12cm}
}\protect\end{figure}

The top Figure 15.32 shows the option window. The bottom Figure 15.32 shows the last fragment of the output window. Here the results, similar to those in the ARMA C++ version, are written.

Figure 15.32: The option window (top figure) and the last fragment of the output window.
\begin{figure}\centerline{ \epsfig{file=arma.options.eps,height=8.0cm,width=12.0...
...epsfig{file=arma.outputs.end.eps,height=8.0cm,width=12cm}
}\protect\end{figure}

3 AR-ABS Model

The Java1.3 version for estimate of of AR-ABS parameters is on the web-sites and can be run by all the Internet users.

The applet is started by a browser or by the appropriate appletviewer. First one clicks the label 'Model' (see the Figure 15.33) to select the data file, for example, the file 'armatest,

To select the parameters one clicks the label 'Inputs', see Figure 15.34).

The calculations are started by the label 'Count', Figure 15.35 shows the results.

Figure 15.33: The initial window defining the input file 'armatest'.
\begin{figure}\centerline{
\epsfig{file=arma.abs.0.eps,width=12cm}
} \protect \end{figure}

Figure 15.34: The input data.
\begin{figure}\centerline{ \epsfig{file=arma.abs.inp.eps,width=12.0cm}
} \protect \end{figure}

Figure 15.35: The results.
\begin{figure}\centerline{ \epsfig{file=arma.abs.count.eps,height=18.0cm}
} \protect \end{figure}


4 ANN Software

A version of the ANN model (see expressions 15.46 and 15.49) is in the file $fi.C.anngauss$ on the web-site (see Section 4). $fi.C.anngauss$ is designed as an objective function of the GMC global optimization system.

The software is compiled and run this way:

Extract files by 'tar -zxf gmc.tgz'
Rename the file 'fi.C.anngauss' by 'cp fi.C.anngauss fi.C'
Compile by 'make',
Run by './test'

14 Stock Exchange Game Model

It is easy to obtain daily stock rates for a long time. However, this information alone is not always useful while predicting the future stock rates [Mockus et al., 1997,Raudys and Mockus, 1999].

To improve predictions, one adds additional factors, such as, relations of cash sales[*] to the sum of inventories and credit sales[*]. However, these factors are available quarterly, as usual. Therefore, it is difficult to unite them with the daily exchange rates in the same model.

The aim of the Stock Rate Exchange Model is to explain the poor predictability of stock rates. Therefore, we start by considering the simplest case: a single stock-broker, $I$ major customers $i=1,...,I$, and $J$ joint-stock companies $j=1,...,J$..

1 Monte Carlo Algorithm

One assumes that the stock rates $z^j(t)$ of a joint stock company $j$ at a time $t$ depend on bying-selling strategies of customers $i=1,...,I$ and some random factors $\epsilon_j(t)$. At a time $t=1,...,T$ the customer $i$ orders a stock-broker to buy a number $n=n_i^j$ of $j$ shares[*] , if

\begin{eqnarray}
z^j(t) \le z_i^{j_{buy}}(n,t).
\end{eqnarray}


Here $z_i^{j_{buy}}(n,t)$ is a buying threshold of the customer $i$ of the number $n$ of $j$ shares at the moment $t$. At a time $t=1.2,...,T$ the customer $i$ orders a stock-broker to sell a number $n=n_i^j(t)$ of $j$ shares, if

\begin{eqnarray}
z^j(t) \ge z_i^{j_{sell}}(n,t).
\end{eqnarray}


Here $z_i^{j_{sell}}(n,t)$ is a selling threshold of the customer $i$ of the number $n$ of $j$ shares at the moment $t$.

The buying threshold is feasible if

\begin{eqnarray}
n \le N(t)
\end{eqnarray}


Here $N_i(t)$ is the buying capacity of the customer $i$ at the time $t$. This is defined, for example, as the initial wealth $N_i$ plus the balance at the time $t$[*].

In this case, the stock rate of the $j$ share at a moment $t+1$

\begin{eqnarray}
z^j(t+1)=\cases {z^j(t) +\epsilon_j(t+1), & if $z^{j_{buy}}(t) ...
...}}(n,t)+\epsilon_j(t+1), & if $z^j(t) \ge z^{j_{sell}}(n,t)$\cr}.
\end{eqnarray}


Here

\begin{eqnarray}
z^{j_{buy}}(t)=\max_{n,i} \ z_i^{j_{buy}}(n,t)
\end{eqnarray}


and

\begin{eqnarray}
z^{j_{sell}}(t)=\min_{n,i} \ z_i^{j_{sell}}(n,t).
\end{eqnarray}


The buying and selling thresholds depend on the difference $\Delta_i^j(t+1)$ between the expected profit of $j$ share and that of certificate of deposit (CD) of the same value.

\begin{eqnarray}
\Delta_i^j(t+1)=(\beta_i^j(t+1) + \delta_j(t+1) - \alpha(t+1))\ z(t).
\end{eqnarray}


Here

\begin{eqnarray}
\beta_i^j(t+1)=\frac{z_i^j(t+1)-z^j(t)}{z^j(t)},
\end{eqnarray}


where $z_i^j(t+1)$ is the stock rate of the share $j$ predicted by the customer $i$ for the time $t+1$.
Denote the fraction

\begin{eqnarray}
\delta_j(t+1) =\frac{d_j(t+1)}{z^j(t)},
\end{eqnarray}


where $d_j(t+1)$ is the dividend expected for a share $j$ at the moment $t+1$. Denote the relation

\begin{eqnarray}
\alpha(t+1)=a_j(t+1)/z^j(t),
\end{eqnarray}


where $a_j(t+1)$ is the yield of CD of value equal to $z^j(t)$ at the time $t+1$.
Now one defines the buying threshold

\begin{eqnarray}
z_i^{j_{buy}}(t)=k^{buy}\ \Delta_i^j(t),\ k_i^{buy}(n) > 1
\end{eqnarray}


and the selling threshold

\begin{eqnarray}
z_i^{j_{sell}}(t)=k^{sell}\ \Delta_i^j(t),\ k_i^{sell}(n) < 1
\end{eqnarray}


Here parameters $k_i^{buy}(n)$ and $k_i^{sell}(n)$ shows the risk aversion of the customer $i$ and approximately reflects a personal utility function. As usual, $k_i^{buy}(n+1) \le k_i^{buy}(n)$ and $k_i^{sell}(n+1) \ge k_i^{sell}(n)$.

The number of $j$ shares own by the customer $i$ at the time $t$ from (15.81) assuming the feasibility condition (15.80)

\begin{eqnarray}
N_i^j(t+1)&=&\cases {N_i^j(t) & if $z_i^{j_{buy}}(t) \le z^j(t)...
...cr
N_i^j(t) -n_i^j(t) & if $z^j(t) \ge z_i^{j_{sell}}(t)$\ \cr}.
\end{eqnarray}


The profit of the customer $i$ during the interval $T_0 \le t \le T$

\begin{eqnarray}
u_i(T,T_0)=
\\
\sum_{j=1,...,n} (N_i^j(T) z^j_T - N_i^j(T_0) z^j(T_0)
-\sum_{t=T_0}^T (N_i^j(t+1)-N_i^j(t))\ z^j(t)).
\nonumber
\end{eqnarray}


The profit $u_i$ depends on the accuracy of stock rate predictions $\beta_i^j(t+1)$ and on random deviations $\epsilon_j(t)$.

Suppose, the customer $i$ predicts the stock rate of the $j$ share for the time $t+1$ using the Auto Regressive (AR) model:

\begin{eqnarray}
z_i^j(t+1)=\sum_{k=1}^{p_i} a_i^k(j)\ z_{t-k}^j+\epsilon_{ji}(t+1)
.
\end{eqnarray}


Here the unknown parameters $a_k(j)$ are defined by minimization of squared deviations at fixed number $p$

\begin{eqnarray}
\min_{a_i(j)} \sum_{s=t_0}^{t}\ \epsilon_{ji}^2(s)
,
\end{eqnarray}


where

\begin{eqnarray}
\epsilon_{ji}(s)=z^j(s)-\sum_{k=1}^{p_i}\ a_i^k(j)\ z^j(s-k)
.
\end{eqnarray}


This time series model, hopefully, represents some aspects of stock rates.

2 Nash Model

The time series generated by Monte Carlo model depend on parameters $p_i$ that are not defined by the model. To define these parameters, the "Nash Model" is used. The Nash Model searches for such numbers $(p_i,\ i=1,...,m)$ that satisfies the Nash equilibrium conditions. This means that no server can obtain higher expected profit $u_i$ by changing $p_i$ individually.

To ensure the existence of the Nash equilibrium one introduces mixed strategies. Here the mixed strategy is to select the integer $p_i$ at random with probabilities

\begin{eqnarray}
x_{p_i},\ i=1,...,m, \ p_i=1,...,P_i,\ \sum_{p_i=1}^{P_i} x_{p_i} =1, \ 0 \le x_{p_i} \le 1
.
\end{eqnarray}


That means that one choses $p_i$ by some lottery where $x_{p_i}$ is the probability to win the integer $p_i$. The lottery is implemented by generating random numbers $\xi$ uniformly distributed in the zero-one interval. This interval is partitioned into $P_i$ parts proportional to $p_i$.

To make sense of this randomization, one repeates the Monte Carlo simulation $K$ times. Denote $u_i(l, x_{p_i}\ i=1,...,m)$ the $l$-th sample of the profit function $u_i(T,T_)$ obtained by the $i$-th player using mixed strategies $(x_{p_i}\ i=1,...,m)$.

The average of all the $K$ samples:

\begin{eqnarray}
u_i^K(x_{p_i}, \ i=1,...,m) =1/K \sum_{l=1}^K\ u_i(l,x_{p_i}, \ i=1,...,m).
\end{eqnarray}


Denote the "contract" vector by $x_{p_i}^0,\ i=1,2$ Define the "fraud" vector [Raudys and Mockus, 1999] as

\begin{eqnarray}
x_{p_i}^1= \arg \max_{x_{p_i}} u_1^K(x_{p_i},x_{p_l}^0,\ l \ne i),
\end{eqnarray}


Ones searches for the Nash equilibrium $x_{p_i}^*$ by minimizing the difference between the fraud and contract vectors

\begin{eqnarray}
(x_{p_i}^*, \ i=1,...,m)= \arg \min_{x_{p_i}^0, \ i=1,...,m}
\sum_i (x_{p_i}^1-x_{p_i}^0)^2
\end{eqnarray}


The optimization is carried out by methods of stochastic optimization. The convexity of profit functions is tested, if the minimum is greater than simulation errors [Raudys and Mockus, 1999].

3 Prediction of Simulated Stock Rates by ARMA Model

To compare the predictions of real and simulated stock rates the well known ARMA model is used. The estimates of parameters $p,q$ of this model are obtained by stabilization procedure.

4 Software Example

This Java1.1 software implements the stock excange model in the case $I=2$, $J=1$, and $n=1$. That means two major players buying a single share at a time and one major joint stock company. The purpose of this simplest case is to obtain some starting data and show advantages and disadvantages of the model. This information helps developing more complicated mdels.

Figure 15.36 describes the simulation results in the case when both players are predicting by the Wiener model: $p=1,\ a_1=1,\ a_i=0,\ i >1$.

Figure 15.36: Simulation Results; Wiener Model
\begin{figure}\centerline{
\epsfig{file=stock.game.eps,width=12.0cm}
} \end{figure}
The working hipothesis is that the behaviour of players based on the predictions obtained by the Wiener model provides the Nash equilibrium. Figure 15.37 describes the simulation results in the case when both players are predicting by the AR model: $p=10,\ a_i=0,\ i >10$.
Figure 15.37: Simulation Results; AR Model, p=5
\begin{figure}\centerline{
\epsfig{file=stock.game.ar5.eps,width=12.0cm}
} .
\end{figure}
One can see that the graphs are similar in both cases, the average deviations are close, too. Thus, there is no incentive for the players to change predictions based on the Wiener model. That means that the Wiener model can be regarded as a sort the equilibrium situation. One may conveniently investigate various examples directly by visiting the web-site: http://mockus.org/optimum and starting the corresponding Java applet (see the task "Stock Exchange Model" in the Section "Global Optimization")

This conclusion is supported by the results of AR models and partly supported by the results of ARMA models [Mockus et al., 1997,Raudys and Mockus, 1999]. Using other models one may obtain different results. For example, the AR model predicts better if the relation $c(t)$ is included as an external factor, where

\begin{eqnarray}
c(t)=\frac {cash(t)} {sold(t)+inventor(t)}
\end{eqnarray}


Here $cash(t)$ denotes the cash income during a time period $t$, $sold(t)$ means the contract price of the sold production during the same time, and $inventor(t)$ is the list price of the unsold production during the time $t$. There exists a positive correlation between the factors $z(t+delay\_time)$ and $c(t)$.

jonas mockus 2004-03-20