Appendix A: The use of polynomial coefficients in likelihood function inference for MAD

The computational cost of inferring a likelihood function scales with dimensionality.  First, we demonstrate the challenge of dimensionality and second why there is computational cost affiliated with this challenge.

A primary objective of any MAD analysis is to stably infer the likelihood function (a step that can be validated with the convergence tab in MAD#), which is an application of the weak law of large numbers. Consider this simple illustrative example: let us sample randomly from a 1D normal PDF and a 2D normal PDF, with the goal of reconstructing the underlying PDFs with non-parametric kernel density analysis (the same methodology MAD# employs to infer the form of the likelihood function).  We can think of this dimensional dependence as the number of points required to fit a curve in the 1D case and to fit a surface in the 2D case; our intuition leads us correctly to reason that additional points will be required in the higher dimensional case.

It is quite clear that the contours in 2D are much more poorly matched than the 1D curves to the underlying true PDF when using the equivalent number of regression points.  Let us define N as the number of regression points required for stable inference of the underlying PDF and D as the dimensionality of the underlying PDF, utilizing the work of Izenman, et al. [1991], the proportionalityholds.  Typically N grows exponentially in  and almost always grows faster than linearly. 

In the 1D case, we can make one more additional valuable observation that the fitted curves do not change too greatly between 100 and 1,000 points used for regression; albeit there are some slight differences from the underlying true probability, but nothing as severe as the 10 point regression fit.  In practical applications, we may elect to say 100 regression points is sufficient for stable inference and save the cost of producing the additional 900 regression points, because these points do not correspond to random samples in MAD analyses, but rather simulations from a forward model - for which the individual cost can range from seconds to minutes to hours depending on the complexity of the scenario being modeled. 

Therefore, by defining the computational expense of performing a single simulation as Csim, we can estimate approximately the total computational time required to generate the ensemble of regression points for a single likelihood calculation as NCsim.  However, this is the cost of generating the simulations for just a single point in the parameter space and must be repeated for each new configuration of parameters.

This is the practical, computational cost-motivated basis for implementing low dimensionality likelihood functions. Lower likelihood computation expenses also permit more extensive parameter domain searches for a given computational budget. 


Now we consider how a time series (or a collection of time series) would be utilized without alteration in the likelihood function inference procedure. 

The maximum dimensionality for a collection of measurements to be used as conditioning information can be defined as follows

where  nt i,j is the number of temporal instances of the measurement of the jth Type-B variable at the ith spatial location and the summations are over all Type-b variables collected and all spatial locations of measurements respectively for j and i.  Obviously, this summation can grow quite large, quite rapidly. 

Hence, the goal is to try to represent the same information contained in the various time series of Type-B data at the different spatial locations in a more dimensionally compact form.  The MAD# software refers to this dimension reducing approximation as “aggregation”.

Currently, the software offers representation of time series data in terms of regressed polynomial coefficients.  While a regressed curve may not represent a very dynamic time series well (right pane, low R2 for 2nd order polynomial fit), in cases where the curve is smoother, hundreds of temporal data points may be represented by just a few regression coefficients (left pane, high R2 for 2nd order polynomial fit).

On the left, while not perfectly matching the data, the approximation of using a regressed polynomial to represent the temporal measurements, results in over a 33X reduction in the dimensionality from the entire time series itself.  On the left, where the match is very poor, the dimension reduction is probably not justifiable (unless gross/average behavior is the desired conditioning information) because the fit is quite poor.  However, if we increase the polynomial to 6th order, the gross shape of the regressed polynomial is much improved (but maybe still inadequately matching the time series for some applications) and corresponds with over a 14X reduction in the dimensionality relative to the full time series.

It should be very clearly understood that inferring the likelihood of coefficients of polynomials of your coefficients is an approximation of inferring the likelihood of the time series of measurements directly; however, using a complete time series may be extremely computationally  demanding and aggregation can significantly drive down this cost. 

It is important to note that using polynomial coefficients does not preclude the use of subsets of data in the time series.  Coefficients and measurements can be intermingled in the likelihood function; perhaps in addition to the gross behavior of the time series, a few maxima or minima are extremely important to the physics of the model, so both should be dependent variables in the likelihood.   

Finally, while there is no theoretical reason that subsets of the polynomial coefficients cannot be selected, we stress the importance of examining the family of curves affiliated with these partial sets of coefficients before excluding any coefficients.  

Last edited Sep 16, 2013 at 6:46 PM by frystacka, version 8

Comments

No comments yet.