Nowcasting macro-financial indicators requires combining low-frequency and high-frequency time series. Mixed data sampling (MIDAS) regressions explain a low-frequency variable based on high-frequency variables and their lags. For instance, the dependent variable could be quarterly GDP and the explanatory variables could be monthly activity or daily market data. The most common MIDAS predictions rely on distributed lags of higher frequency regressors to avoid parameter proliferation. Analogously, reverse MIDAS models predict a high-frequency dependent variable based on low-frequency explanatory variables. Compared to state-space models (view post here), MIDAS simplifies specification and theory-based restrictions for nowcasting. The R package ‘midasr’ estimates models for multiple frequencies and weighting schemes. In practice, MIDAS has been used for nowcasting financial market volatility, GDP growth, inflation trends and fiscal trends.

The sources of the post are summarized at the bottom. The below are condensed annotated quotes. Cursive text and text in brackets have been added for clarity.

The post ties up with this site’s summary on quantititative methods for macro information efficiency.

### What are MIDAS regressions?

“Mixed data sampling (MIDAS) regressions are now commonly used to deal with time series data sampled at different frequencies.” [Handbook of Statistics]

“Data are not all sampled at the same frequency. Most macroeconomic data are sampled monthly (e.g., employment) or quarterly (e.g., GDP). Most financial variables (e.g., interest rates and asset prices), on the other hand, are sampled daily or even more frequently. The challenge is how to best use available data.” [Armesto, Engemann and Owyang]

“Mixed-data sampling __(MIDAS) regressions allow estimating dynamic equations that explain a low-frequency variable by high-frequency variables and their lags__. When the difference in sampling frequencies between the regressand and the regressors is large, distributed lag functions are typically employed to model dynamics avoiding parameter proliferation.” [Foroni, Marcellino and Schumacher]

“A MIDAS regression model allows us to __‘explain’ a time-series variable that’s measured at some frequency, as a function of current and lagged values of a variable that is measured at a higher frequency__. So, for instance, we can have a dependent variable that is quarterly, and a regressor that is measured at a monthly, or daily, frequency…[Hence] a MIDAS regression model is a __very general type of autoregressive-distributed lag model__, in which high-frequency data are used to help in the prediction of a low-frequency variable…Typically, some ‘extra’ values [of] the high-frequency variable(s) will be available after the most recent sample value of the low-frequency dependent variable has been observed…These __‘extra’ observations can be used for…nowcasting__.” [Giles]

“Nowcasting…refers to the prediction of the present, the very near future, and the very recent past based on information provided by available data that are sampled at higher frequencies.” [Rufino]

“On the one hand, variables that are available at high frequency contain potentially valuable information. On the other hand, the researcher cannot use this high-frequency information directly if some of the variables are available at a lower frequency, because most time series regressions involve data sampled at the same interval…MIxed Data Sampling – or MIDAS – regressions represent __a simple, parsimonious, and flexible class of time series models that allow the left-hand and right-hand side variables of time series regressions to be sampled at different frequencies__.” [Ghysels, Sinko and Valkanov]

“MIDAS regressions are essentially tightly parameterized, reduced form regressions that involve processes sampled at different frequencies…Technically speaking __MIDAS models specify conditional expectations as a distributed lag of regressors recorded at some higher sampling frequencies__…MIDAS involve regressors with different sampling frequencies and are therefore not autoregressive models, since the notion of autoregression implicitly assumes that data are sampled at the same frequency in the past.” [Ghysels, Santa_Clara, and Valkanov]

“The focus in the literature has mostly been on improving the forecast of low-frequency variables by means of high-frequency information. In particular, different models have been introduced for dealing with the different sampling frequencies at which macroeconomic and financial indicators are available…__Recently, new models have been proposed for forecasting high-frequency variables by means of low-frequency variables__…Reverse Unrestricted MIDAS (RU-MIDAS) and Reverse MIDAS (R-MIDAS) model [link] high-frequency dependent variable with low-frequency explanatory variables in univariate context.” [Foroni, Ravazzolo and Rossini]

### Basic technical intuition

“When data of different sampling frequencies are mixed, one invariably deals with temporal aggregation…In empirical work, a direct treatment of mixed data samples is typically circumvented by first aggregating the highest frequency data in order to reduce all data to the same frequency. Then, in a second step, a standard regression model is estimated with pre-filtered data…__The mixed data sampling regression exploits a much larger information set and is more flexible. The cost is parameter proliferation__, as a suitable polynomial might involve many lags…We want to preserve most of the information in the MIDAS regression, while __decreasing the number of parameters to estimate…Our approach has its roots in an old literature on distributed lag models__…MIDAS regressions are more efficient than the common practice of first aggregating the highest frequency data in order to reduce all data to the same frequency.” [Ghysels, Santa_Clara, and Valkanov]

“The MIDAS approach __allows for non-equal weights (multipliers) for the components that are parsimoniously reparametrized __through a weighing scheme anchored on the innovative use of lag polynomials. The way lag polynomials are employed in defining the weighing scheme for the multiplier represents a specific MIDAS regression model.” [Rufino]

“__ The time-averaging model is parsimonious but discards any information about the timing of innovations to higher-frequency data__…[A] survey [of] some common methods for dealing with mixed-frequency data…shows that, in some cases, simply averaging the higher-frequency data produces no discernible disadvantage. In other cases, however, __explicitly modeling the flow of data (e.g., using mixed data sampling) may be more beneficial…especially if the forecaster is interested in constructing intra-period forecasts__…In principle, one could use any (normalized) weighting function…While this may be tractable when mixing quarterly and monthly observations, other sampling frequencies may be problematic…__Mixed data sampling (MIDAS)…employs (exogenously chosen) distributed lag polynomials as weighting functions__.” [Armesto, Engemann and Owyang]

“The advantages of MIDAS, in addition to overcoming the problem of data with mixed frequency, is to minimize the number of estimated parameters and make the regression model simpler. A __weighting function is used to reduce the number of parameters in the MIDAS regression__. The weighting function can have a number of functional forms [such as] the exponential Almon function and the Beta function.” [Tri Utari and Ilma]

“__In macroeconomic applications…differences in sampling frequencies are often small__. In such a case, it might not be necessary to employ distributed lag functions [leading to] unrestricted lag polynomials in MIDAS regressions.” [Foroni, Marcellino and Schumacher]

### MIDAS regression versus Kalman filter

“The theory of the Kalman filter applies, strictly speaking, to linear homoskedastic Gaussian systems and yields an optimal [linear projection] in population…However, there are two important limitations to this result. First, it applies only in population, ignoring parameter estimation error. Second, it of course assumes that the state space model is correctly specified – __state space model predictions can be suboptimal if the regression dynamics are mis-specified__…State space models can be quite involved, as __one must explicitly specify a linear dynamic model for all the series involved: low-frequency data series, latent low-frequency series treated as missing and the high-frequency observed processes__. The system of equations therefore typically requires a lot of parameters, for the measurement equation, the state dynamics and their error processes.” [Bai, Ghysels and Wright]

“__MIDAS regression can also be viewed as a reduced-form representation of the linear projection which emerges from a state space model approach__ – by reduced form we mean that the MIDAS regression does not require the specification of a full state space system of equations…The Kalman filter…has several disadvantages: (1) it is more prone to specification errors as a full system of measurement and state equations is required and as a consequence (2) requires a lot more parameters, which in turn results in (3) computational complexities which often limit the scope of applications.” [Ghysels, Kvedara and Zemlys]

“__Kalman filter state space models…involve a system of equations, whereas in contrast MIDAS regressions involve a (reduced form) single equation__. As a consequence, MIDAS regressions might be less efficient, but also less prone to specification errors…Forecasts from MIDAS regressions are generally quite similar to those from the Kalman filter. Kalman filter forecasts are typically a little better, but MIDAS regressions can be more accurate if the state-space model is mis-specified or over-parameterized.” [Bai, Ghysels and Wright]

### R package midasr

“The R package midasr…enables estimating regression models with variables sampled at different frequencies within a MIDAS regression framework..We __define a general autoregressive MIDAS regression model with multiple variables of different frequencies and show how it can be specified using the familiar R formula interface__..The package is its flexibility in terms of the model formulation and estimation, which allows for:

- estimation of regression models with their parameters defined (restricted) by certain functional constraints…
- estimation of MIDAS models with many variables and (numerous) different frequencies;
- various mixtures of restrictions/weighting schemes and also lag orders…
- statistical testing for the adequacy of the model specification…
- information criteria and testing-based selection of models;
- forecasting and nowcasting functionality, including various forecast combinations.” [Ghysels, Kvedara and Zemlys]

“From a data handling point of view, the key specificity of the MIDAS regression model is that the length of observations of variables observed at various frequencies differs and needs to be aligned…[A special package function] performs exactly the transformation…converting an observation vector of a given (potentially) higher-frequency series into the corresponding stacked matrix of observations of low-frequency series.” [Ghysels, Kvedara and Zemlys]

### Applications of MIDAS regressions

“The interest in MIDAS regressions __addresses a situation often encountered in practice where the relevant information is high frequency data, whereas the variable of interest is sampled at a lower frequency__…For example, some macroeconomic data are sampled monthly, like price series and monetary aggregates, whereas other series are sampled quarterly or annually, typically real activity series like GDP and its components…[Another] example pertains to models of stock market volatility. The low frequency variable is for instance the quadratic variation or other volatility process over some long future horizon corresponding to the time to maturity of an option, whereas the high frequency data set is past market information potentially at the tick-by-tick level.” [Ghysels, Santa_Clara, and Valkanov]

“As a key indicator of real economic activity, GDP is published at quarterly frequency and with a considerable delay. Due to this limited data availability, typically __more timely high-frequency business cycle indicators such as industrial production or surveys about business expectations might help monitoring the current state of the economy__ as well as for forecasting…We derive unrestricted MIDAS regressions from linear high-frequency models..and show that their parameters can be estimated by OLS…In an empirical application on out-of-sample nowcasting GDP in the US and the Euro area using monthly predictors, we find a good performance of unrestricted MIDAS for a number of indicators.” [Foroni, Marcellino and Schumacher]

“MIDAS (Mixed Data Sampling) regression [can] solve the mixed frequency problem in implementing the __nowcasting of the country’s economic growth…using quarterly Real GDP data and monthly data on inflation, industrial production__…Results indicate the relative superiority of the MIDAS framework in accurately predicting the growth trajectory of the economy using information from high-frequency economic indicators.” [Rufino]

“We apply the MIDAS regression model to __forecast the growth of the Indonesian GDP using the value of Indonesian agricultural exports__…Exports will directly increase a country’s income. It is expected that an increase in…income will also result in an increase of its GDP. We use the Mixed Data Sampling (MIDAS) regression model to deal with a period or frequency difference issues of GDP and export variables”. [Tri Utari and Ilma]

“__Online price index is tested as a predictor of the monthly core inflation__ in Argentina…there is a slight improvement compared to the low-frequency benchmark autoregression and the unconditional mean.” [Libonatti]

“__We analyse the importance of low frequency hard and soft macroeconomic information__, respectively the industrial production index and the manufacturing Purchasing Managers’ Index surveys, __for forecasting high-frequency daily electricity prices__…We do that by means of mixed-frequency models, introducing a Bayesian approach to reverse unrestricted MIDAS models (RU-MIDAS)…Results indicate that the macroeconomic low frequency variables are more important for short horizons than for longer horizons.” [Foroni, Ravazzolo and Rossini]

“We employ a Mixed Data Sampling (MiDaS) approach to __analyze mixed frequency fiscal data__…We use quarterly fiscal data to forecast a very disaggregated set of fiscal series at annual frequency….Once data for the third quarter is incorporated, the annual forecast becomes very accurate (very close to actual data). We also benchmark against the European Commissions forecast and find the results fare favorably, particularly when considering that they stem from a simple univariate framework.” [Asimakopoulos, Paredes and Warmedinger]

### Linked sources

Bai, Jennie, Eric Ghysels and Jonathan Wright (2010), “State Space Models and MIDAS Regressions.”

Giles, Dave (2016), “MIDAS Regression is Now in EViews.”

Handbook of Statistics, Chapter 4 – Mixed data sampling (MIDAS) regression models.

Libonatti, Luis (2018), “MIDAS Modeling for Core Inflation Forecasting.”

Rufino, Cesar (2019) “Nowcasting Philippine Economic Growth Using MIDAS Regression.”