In this post I shall talk more about the linear least-mean-squares (minimum variance) estimators, which is very important for understanding KF.

A) what is a least-mean-squares estimator?

$x$ is the random variable to be estimated. $\hat x$ is the estimator. If the estimator can minimize the least-mean-squares

$E(x-\hat{x})(x-\hat{x})^T$

then $\hat{x}$ is called a least-mean-squares estimator. More importantly, $E(x-\hat{x})(x-\hat{x})^T$ actually is the covariance matrix of the estimator error, so a least-mean-squares estimator is also known as minimum-variance estimator. If it is further unbiased, it is well-known as Minimum Variance Unbiased Estimator (MVUE).

B) how to minimize a matrix?

As mentioned, a minimum variance estimator can minimize the error covariance matrix. But how can we minimize a matrix? In fact, we are going to minimize some scalar index of the matrix such as $a^TE(x-\hat{x})(x-\hat{x})^Ta$ with $a$ as an arbitrary vector or the trace of the covariance matrix.  However, the most convenient way to understand is that: estimator $\hat{x}^*$ is optimal in the sense of minimizing covariance iff

$E(x-\hat{x})(x-\hat{x})^T -E(x-\hat{x}^*)(x-\hat{x}^*)^T \ge 0$

where $\hat{x}$ is another arbitrary estimator. The above equation means the left hand side of the equation is positive semi-definite. And we can often find the optimal estimator by using completion of squares method.

C) what is a linear minimum-variance estimator?

When we give some measurements $y$, if the estimator is a linear (or generally affine) function of $y$, say

$\hat{x}=Ky$

then the estimator is called a linear estimator. Linear estimators are the most widely used estimators.

D) with or without model?

I’m surprised to find out: we can design a linear minimum variance estimator without having a model of $x$ and $y$. For zero-mean cases, the l.m.v.e is a linear form of $y$; for non-zero-mean cases, the l.m.v.e is an affine form of $y$. All needed in the estimator are the statistics characteristics of $x, y$. See ‘linear estimation’ section 3.2 for details.

Usually we can get a model of $x$ and $y$. Then the role of the model actually is to establish the relation between the statistics characteristics of $x, y$. We may simply apply the results obtained without models to the cases with models. The most widely used model is:

$y=Hx+v$

This actually is the measurement model of a linear state space model. From here we start to see what kalman filter looks like.