The Least mean squares filter solution converges to the Wiener filter solution, assuming that the unknown system is LTI and the noise is stationary . Both filters can be used to identify the impulse response of an unknown system, knowing only the original input signal and the output of the unknown system. By relaxing the error criterion to reduce current sample error instead of minimizing the total error over all of n, the LMS algorithm can be derived from the Wiener filter.
Derivation of the Wiener filter for system identification
Given a known input signal
s
[
n
]
{\displaystyle s[n]}
, the output of an unknown LTI system
x
[
n
]
{\displaystyle x[n]}
can be expressed as:
x
[
n
]
=
∑ ∑ -->
k
=
0
N
− − -->
1
h
k
s
[
n
− − -->
k
]
+
w
[
n
]
{\displaystyle x[n]=\sum _{k=0}^{N-1}h_{k}s[n-k]+w[n]}
where
h
k
{\displaystyle h_{k}}
is an unknown filter tap coefficients and
w
[
n
]
{\displaystyle w[n]}
is noise.
The model system
x
^ ^ -->
[
n
]
{\displaystyle {\hat {x}}[n]}
, using a Wiener filter solution with an order N, can be expressed as:
x
^ ^ -->
[
n
]
=
∑ ∑ -->
k
=
0
N
− − -->
1
h
^ ^ -->
k
s
[
n
− − -->
k
]
{\displaystyle {\hat {x}}[n]=\sum _{k=0}^{N-1}{\hat {h}}_{k}s[n-k]}
where
h
^ ^ -->
k
{\displaystyle {\hat {h}}_{k}}
are the filter tap coefficients to be determined.
The error between the model and the unknown system can be expressed as:
e
[
n
]
=
x
[
n
]
− − -->
x
^ ^ -->
[
n
]
{\displaystyle e[n]=x[n]-{\hat {x}}[n]}
The total squared error
E
{\displaystyle E}
can be expressed as:
E
=
∑ ∑ -->
n
=
− − -->
∞ ∞ -->
∞ ∞ -->
e
[
n
]
2
{\displaystyle E=\sum _{n=-\infty }^{\infty }e[n]^{2}}
E
=
∑ ∑ -->
n
=
− − -->
∞ ∞ -->
∞ ∞ -->
(
x
[
n
]
− − -->
x
^ ^ -->
[
n
]
)
2
{\displaystyle E=\sum _{n=-\infty }^{\infty }(x[n]-{\hat {x}}[n])^{2}}
E
=
∑ ∑ -->
n
=
− − -->
∞ ∞ -->
∞ ∞ -->
(
x
[
n
]
2
− − -->
2
x
[
n
]
x
^ ^ -->
[
n
]
+
x
^ ^ -->
[
n
]
2
)
{\displaystyle E=\sum _{n=-\infty }^{\infty }(x[n]^{2}-2x[n]{\hat {x}}[n]+{\hat {x}}[n]^{2})}
Use the Minimum mean-square error criterion over all of
n
{\displaystyle n}
by setting its gradient to zero:
∇ ∇ -->
E
=
0
{\displaystyle \nabla E=0}
which is
∂ ∂ -->
E
∂ ∂ -->
h
^ ^ -->
i
=
0
{\displaystyle {\frac {\partial E}{\partial {\hat {h}}_{i}}}=0}
for all
i
=
0
,
1
,
2
,
.
.
.
,
N
− − -->
1
{\displaystyle i=0,1,2,...,N-1}
∂ ∂ -->
E
∂ ∂ -->
h
^ ^ -->
i
=
∂ ∂ -->
∂ ∂ -->
h
^ ^ -->
i
∑ ∑ -->
n
=
− − -->
∞ ∞ -->
∞ ∞ -->
[
x
[
n
]
2
− − -->
2
x
[
n
]
x
^ ^ -->
[
n
]
+
x
^ ^ -->
[
n
]
2
]
{\displaystyle {\frac {\partial E}{\partial {\hat {h}}_{i}}}={\frac {\partial }{\partial {\hat {h}}_{i}}}\sum _{n=-\infty }^{\infty }[x[n]^{2}-2x[n]{\hat {x}}[n]+{\hat {x}}[n]^{2}]}
Substitute the definition of
x
^ ^ -->
[
n
]
{\displaystyle {\hat {x}}[n]}
:
∂ ∂ -->
E
∂ ∂ -->
h
^ ^ -->
i
=
∂ ∂ -->
∂ ∂ -->
h
^ ^ -->
i
∑ ∑ -->
n
=
− − -->
∞ ∞ -->
∞ ∞ -->
[
x
[
n
]
2
− − -->
2
x
[
n
]
∑ ∑ -->
k
=
0
N
− − -->
1
h
^ ^ -->
k
s
[
n
− − -->
k
]
+
(
∑ ∑ -->
k
=
0
N
− − -->
1
h
^ ^ -->
k
s
[
n
− − -->
k
]
)
2
]
{\displaystyle {\frac {\partial E}{\partial {\hat {h}}_{i}}}={\frac {\partial }{\partial {\hat {h}}_{i}}}\sum _{n=-\infty }^{\infty }[x[n]^{2}-2x[n]\sum _{k=0}^{N-1}{\hat {h}}_{k}s[n-k]+(\sum _{k=0}^{N-1}{\hat {h}}_{k}s[n-k])^{2}]}
Distribute the partial derivative:
∂ ∂ -->
E
∂ ∂ -->
h
^ ^ -->
i
=
∑ ∑ -->
n
=
− − -->
∞ ∞ -->
∞ ∞ -->
[
− − -->
2
x
[
n
]
s
[
n
− − -->
i
]
+
2
(
∑ ∑ -->
k
=
0
N
− − -->
1
h
^ ^ -->
k
s
[
n
− − -->
k
]
)
s
[
n
− − -->
i
]
]
{\displaystyle {\frac {\partial E}{\partial {\hat {h}}_{i}}}=\sum _{n=-\infty }^{\infty }[-2x[n]s[n-i]+2(\sum _{k=0}^{N-1}{\hat {h}}_{k}s[n-k])s[n-i]]}
Using the definition of discrete cross-correlation :
R
x
y
(
i
)
=
∑ ∑ -->
n
=
− − -->
∞ ∞ -->
∞ ∞ -->
x
[
n
]
y
[
n
− − -->
i
]
{\displaystyle R_{xy}(i)=\sum _{n=-\infty }^{\infty }x[n]y[n-i]}
∂ ∂ -->
E
∂ ∂ -->
h
^ ^ -->
i
=
− − -->
2
R
x
s
[
i
]
+
2
∑ ∑ -->
k
=
0
N
− − -->
1
h
^ ^ -->
k
R
s
s
[
i
− − -->
k
]
=
0
{\displaystyle {\frac {\partial E}{\partial {\hat {h}}_{i}}}=-2R_{xs}[i]+2\sum _{k=0}^{N-1}{\hat {h}}_{k}R_{ss}[i-k]=0}
Rearrange the terms:
R
x
s
[
i
]
=
∑ ∑ -->
k
=
0
N
− − -->
1
h
^ ^ -->
k
R
s
s
[
i
− − -->
k
]
{\displaystyle R_{xs}[i]=\sum _{k=0}^{N-1}{\hat {h}}_{k}R_{ss}[i-k]}
for all
i
=
0
,
1
,
2
,
.
.
.
,
N
− − -->
1
{\displaystyle i=0,1,2,...,N-1}
This system of N equations with N unknowns can be determined.
The resulting coefficients of the Wiener filter can be determined by:
W
=
R
x
x
− − -->
1
P
x
s
{\displaystyle W=R_{xx}^{-1}P_{xs}}
, where
P
x
s
{\displaystyle P_{xs}}
is the cross-correlation vector between
x
{\displaystyle x}
and
s
{\displaystyle s}
.
Derivation of the LMS algorithm
By relaxing the infinite sum of the Wiener filter to just the error at time
n
{\displaystyle n}
, the LMS algorithm can be derived.
The squared error can be expressed as:
E
=
(
d
[
n
]
− − -->
y
[
n
]
)
2
{\displaystyle E=(d[n]-y[n])^{2}}
Using the Minimum mean-square error criterion, take the gradient:
∂ ∂ -->
E
∂ ∂ -->
w
=
∂ ∂ -->
∂ ∂ -->
w
(
d
[
n
]
− − -->
y
[
n
]
)
2
{\displaystyle {\frac {\partial E}{\partial w}}={\frac {\partial }{\partial w}}(d[n]-y[n])^{2}}
Apply chain rule and substitute definition of y[n]
∂ ∂ -->
E
∂ ∂ -->
w
=
2
(
d
[
n
]
− − -->
y
[
n
]
)
∂ ∂ -->
∂ ∂ -->
w
(
d
[
n
]
− − -->
∑ ∑ -->
k
=
0
N
− − -->
1
w
^ ^ -->
k
x
[
n
− − -->
k
]
)
{\displaystyle {\frac {\partial E}{\partial w}}=2(d[n]-y[n]){\frac {\partial }{\partial w}}(d[n]-\sum _{k=0}^{N-1}{\hat {w}}_{k}x[n-k])}
∂ ∂ -->
E
∂ ∂ -->
w
i
=
− − -->
2
(
e
[
n
]
)
(
x
[
n
− − -->
i
]
)
{\displaystyle {\frac {\partial E}{\partial w_{i}}}=-2(e[n])(x[n-i])}
Using gradient descent and a step size
μ μ -->
{\displaystyle \mu }
:
w
[
n
+
1
]
=
w
[
n
]
− − -->
μ μ -->
∂ ∂ -->
E
∂ ∂ -->
w
{\displaystyle w[n+1]=w[n]-\mu {\frac {\partial E}{\partial w}}}
which becomes, for i = 0, 1, ..., N-1,
w
i
[
n
+
1
]
=
w
i
[
n
]
+
2
μ μ -->
(
e
[
n
]
)
(
x
[
n
− − -->
i
]
)
{\displaystyle w_{i}[n+1]=w_{i}[n]+2\mu (e[n])(x[n-i])}
This is the LMS update equation.
See also
References
J.G. Proakis and D.G. Manolakis, Digital Signal Processing: Principles, Algorithms, and Applications, Prentice-Hall, 4th ed., 2007.