Maximum Likelihood/OEM
9.3.2.2 produce the model responses which would closely match the measurements. A likelihood function, similar to a probability density function (PDF), is defined when measurements are used. This likelihood function is maximized to obtain the estimates of the parameters of the dynamic system. The OEM is an ML estimator, which accounts only for measurement noise and not process noise. The main idea is to define a function of the data and the unknown parameters. This function is called the likelihood function. The parameter estimates are those values that maximize this function. If bi, b2,…, br are the unknown parameters of a system and zi, z2,…, zn the measurements of the true values yi, y2,…, yn, then these true values could be made a function of these unknown parameters as
If z is a random variable with PDF p(z, b), then to estimate “b” from z, choose the value of b that maximizes the likelihood function L(z, b) = p(z, b). Thus, the problem of parameter estimation is reduced to maximization of a real function called the likelihood function (of the parameter b and the data z). In essence, “p” becomes “L” when the measurements are obtained and used in p. The parameter b, which makes this function most probable (to have yielded these measurements), is called the ML estimate. Such a likelihood function is given as
The main point in any estimator is to determine or predict the error made in the estimates relative to the true parameters, although the true parameters are unknown in the real sense. Only some statistical indicators for the errors can be worked out. The Cramer-Rao lower bound is, perhaps, the best measure for such errors. The likelihood function is defined as
L(z|b) = log p(z|b) (9.19)
The likelihood differential equation is obtained as
@ . p
—L(zjb) = L'(zjb) = j (zjb) = 0 (9.20)
The equation is nonlinear in b and a first-order approximation by Taylor’s series expansion is used to obtain /3:
The increment in b is obtained as
D8 = = -(L"(z/bo))_1L'(z/bo) (9.22)
L (zlbo)
The likelihood related partials can be evaluated when the details of the dynamic systems are specified. In a general sense, the expected value of the denominator of Equation 9.22 is defined as the information matrix:
Im(b) = E{-L’ ‘(zlb)} (9.23)
From Equation 9.23, we see that if there is large information content in the data, then |L"| tends to be large, and the uncertainty in estimate b is small (Equation 9.22). The Cramer-Rao inequality provides a lower bound to the variance of an unbiased estimator. If be(z) is any estimator of b based on the measurement z, and be(z) = E{be(z)}, the expectation of the estimate, then we have as the Cramer – Rao inequality, for unbiased estimator:
sbe > (Im(b))-1 (9.24)
For unbiased efficient estimator sb = I—1 (b). The inverse of the information matrix, for certain special cases, is the covariance matrix and hence we have the theoretical expression for the variance of the estimator. Thus, the actual variance in the estimator, for an efficient estimator, would be at least equal to the predicted variance, whereas for other cases it could be greater but not less than the predicted value. The predicted value provides the lower bound.