Connect Fall 1996:  STATISTICS AND THE SOCIAL SCIENCES


Extrapolative Forecasting:
Exponential Smoothing with SPSS and SAS

Robert A. Yaffee

If a researcher interested in longitudinal analysis has a series of data points over a time span, there are several methods he may use to define the nature of this series. Time-series analysis may be divided into five basic classifications: extrapolative, decomposition, Box-Jenkins, spectral, and dynamic regression models.

On the basis of a model of a given time series, future values of a series may be generated as a forecast over a chosen time horizon. The researcher first graphs the data and then divides it into initial and validation datasets. After formulating a model on the basis of the initial dataset, he compares its predicted values to the actual values of the validation dataset. The forecast is fit to the validation data, and residuals of different models are compared to determine the best-fitting model. The researcher can then use this model to forecast later developments in the time series. In this article I'll address the highlights of extrapolative analysis, the principal techniques of which are incorporated into both SAS (Statistical Analysis System) and SPSS (Statistical Products and Service Solutions). ACF offers SPSS/Windows 6.12 and SAS/Windows 6.10 on its local area networks, and SPSS/Unix 5 and SAS/Unix 6.09 on its RS6000.

Extrapolative methods consist of a variety of exponential smoothing techniques. First, there is simple exponential smoothing, with or without a constant (a baseline level for the series). Second, there is Holt exponential smoothing with a trend (a deterministic tendency over time) for long term patterns. Third, there is Winters exponential smoothing, which involves a linear or quadratic trend with a multiplicative or additive seasonal (regular variation around the trend) component. There is also stepwise autoregressive exponential smoothing for more short-run fluctuations.

Exponential smoothing methods are typically cheaper, easier to use, and need less data than the fifty or more equally spaced values over time required by the Box-Jenkins techniques. For these reasons, smoothing methods are often applied to production, sales, and inventory control where strong consideration is given to keeping costs down and profits up; however, if seasonal variation is present, typically at least two years of data are necessary for both exponential smoothing and Box-Jenkins analysis. Dynamic-regression models are particularly good for development of social-science theory, insofar as they can handle more indepent variables (hypothesis of simple relationships) and interactions (hypotheses of joint relationships). Dynamic regression may not be as easy to develop and apply as the more mechanical exponential-smoothing methods. Although exponential-smoothing methods do not produce theoretical models, their mechanical forecasts under some circumstances may be more feasible, accurate, cheaper, and timely than those derived from the Box-Jenkins models, which is why they have to be considered by persons studying or applying forecasting techniques.

Exponential smoothing is based on the concept of moving average. If the researcher computes a mean of the first twelve data points (of say fifty), records it, moves one time period ahead from the previous starting position to compute the average for points two through thirteen, and then reiterates this process until the end of the series is reached, the new data series recorded is called a moving average of order twelve. The moving average smooths out irregular fluctuation, and a double moving average -- a moving average of a moving average -- smooths it out even more.

Exponential smoothing represents an improvement on moving-average smoothing. Simple moving averages give more weight to mid-range data values, whereas exponential smoothing has the decided advantage of giving more weight to recent observations and exponentially smaller weight to historically distant observations. A simple exponential forecast for one time period in the future is the forecast of the current value plus the average error. The average error of the series at the present time is the quantity of the value of the series at the present time, divided by the total number of values, minus the quantity of the forecast of the current value, divided by the total number of values.

Coupled with this moving-average concept, the Holt model accommodates a constant, linear trend for long-run forecasts, or a quadratic trend )for projections of a recent change in the series) in SAS. SPSS offers the option of a dampened trend (one that tapers off) instead of a quadratic trend. The Winters model accomodates seasonal fluctuations in the series as well. Both SPSS and SAS allow for a multiplicative as well as an additive Winters model. Both SPSS and SAS permit custom-designed smoothing and forecasting.

The relative fit of these models is assessed by the sum of squared errors. SAS and SPSS produce a wide variety of measures of fit; the model with the best fit is the chosen model.

Extrapolative forecasting in SPSS and SAS share vast similarities of capability. Both can handle additive as well as multiplicative seasonal models; both handle no trend, linear, and options. SPSS has an exponential trend option while SAS has a quadratic trend option each program provides for fixing chosen parameters; and both allow the user to custom-design models. Both SAS and SPSS have stepwise autoregression algorithms that may be used in forecasting. (With the stepwise autoregression, a time trend is found and the differences between the actual data and the trend line are computed. These residuals are then fit using autoregressive estimation. According to the SAS ETS User's Guide, this procedure is usually near optimal and computationally inexpensive).

Despite these similarities, there are important differences between SPSS and SAS. For automatic parameter selection, SPSS has a grid search capability which finds and utilizes optimal parameters based on the minimum sum of squared errors for each of these parameter values. With SAS, forecasting is not so automatic unless the researcher has a competent knowledge of SAS macro language to set up the automatic forecasting procedures. (You can find the macros for this automation in Brocklebank and Dickey, Forecasting Techniques Using SAS/ETS Software Course Notes). SPSS is easier to graph but SASGRAPH has more power and flexibility in its production and presentation.

SAS and SPSS could be improved by the addition of certain features. Although SAS lacks fully automatic detection and estimation of seasonality and trend parameters, SPSS does have a provision for a grid search for the best trend and trend parameter. While both SAS and SPSS possess the basic methods, neither SAS nor SPSS includes all of the techniques. They lack such methods of smoothing as Trigg's tracking signal monitoring system, Chow's adaptive control method, and Harrison's harmonic smoothing, for example. More specialized packages for forecasting, like Sybil Runner (by Lincoln Systems in Westford, Mass.) and Forecast Pro (by Business Forecast Systems, Belmont, Mass.) possess these capabilities. Compared to the graphing capability in some specialized packages, forecast graphs complete with forecast intervals are cumbersome to construct in SAS and not automatically generated by SPSS. Even though SPSS graphs are generally easier to construct here, they typically do not have the same power and flexibility as those of SAS. Although SPSS can produce the forecast values, it does not automatically produce the forecast interval limits, with the exponential-smoothing procedure. SAS can generate both forecast value and forecast interval limits, so its graphing of forecasts tends to be more sophisticated than with SPSS. Neither SAS nor SPSS at this point in time has automatic comparison of results of different runs. [ C ]


Robert A. Yaffee is a Statistician in the Social Sciences Computing Group of NYU's Academic Computing Facility.
robert.yaffee@nyu.edu

Posted 24 September 1996