In multivariate statistics , if
ε ε -->
{\displaystyle \varepsilon }
is a vector of
n
{\displaystyle n}
random variables , and
Λ Λ -->
{\displaystyle \Lambda }
is an
n
{\displaystyle n}
-dimensional symmetric matrix , then the scalar quantity
ε ε -->
T
Λ Λ -->
ε ε -->
{\displaystyle \varepsilon ^{T}\Lambda \varepsilon }
is known as a quadratic form in
ε ε -->
{\displaystyle \varepsilon }
.
Expectation
It can be shown that[ 1]
E
-->
[
ε ε -->
T
Λ Λ -->
ε ε -->
]
=
tr
-->
[
Λ Λ -->
Σ Σ -->
]
+
μ μ -->
T
Λ Λ -->
μ μ -->
{\displaystyle \operatorname {E} \left[\varepsilon ^{T}\Lambda \varepsilon \right]=\operatorname {tr} \left[\Lambda \Sigma \right]+\mu ^{T}\Lambda \mu }
where
μ μ -->
{\displaystyle \mu }
and
Σ Σ -->
{\displaystyle \Sigma }
are the expected value and variance-covariance matrix of
ε ε -->
{\displaystyle \varepsilon }
, respectively, and tr denotes the trace of a matrix. This result only depends on the existence of
μ μ -->
{\displaystyle \mu }
and
Σ Σ -->
{\displaystyle \Sigma }
; in particular, normality of
ε ε -->
{\displaystyle \varepsilon }
is not required.
A book treatment of the topic of quadratic forms in random variables is that of Mathai and Provost.[ 2]
Proof
Since the quadratic form is a scalar quantity,
ε ε -->
T
Λ Λ -->
ε ε -->
=
tr
-->
(
ε ε -->
T
Λ Λ -->
ε ε -->
)
{\displaystyle \varepsilon ^{T}\Lambda \varepsilon =\operatorname {tr} (\varepsilon ^{T}\Lambda \varepsilon )}
.
Next, by the cyclic property of the trace operator,
E
-->
[
tr
-->
(
ε ε -->
T
Λ Λ -->
ε ε -->
)
]
=
E
-->
[
tr
-->
(
Λ Λ -->
ε ε -->
ε ε -->
T
)
]
.
{\displaystyle \operatorname {E} [\operatorname {tr} (\varepsilon ^{T}\Lambda \varepsilon )]=\operatorname {E} [\operatorname {tr} (\Lambda \varepsilon \varepsilon ^{T})].}
Since the trace operator is a linear combination of the components of the matrix, it therefore follows from the linearity of the expectation operator that
E
-->
[
tr
-->
(
Λ Λ -->
ε ε -->
ε ε -->
T
)
]
=
tr
-->
(
Λ Λ -->
E
-->
(
ε ε -->
ε ε -->
T
)
)
.
{\displaystyle \operatorname {E} [\operatorname {tr} (\Lambda \varepsilon \varepsilon ^{T})]=\operatorname {tr} (\Lambda \operatorname {E} (\varepsilon \varepsilon ^{T})).}
A standard property of variances then tells us that this is
tr
-->
(
Λ Λ -->
(
Σ Σ -->
+
μ μ -->
μ μ -->
T
)
)
.
{\displaystyle \operatorname {tr} (\Lambda (\Sigma +\mu \mu ^{T})).}
Applying the cyclic property of the trace operator again, we get
tr
-->
(
Λ Λ -->
Σ Σ -->
)
+
tr
-->
(
Λ Λ -->
μ μ -->
μ μ -->
T
)
=
tr
-->
(
Λ Λ -->
Σ Σ -->
)
+
tr
-->
(
μ μ -->
T
Λ Λ -->
μ μ -->
)
=
tr
-->
(
Λ Λ -->
Σ Σ -->
)
+
μ μ -->
T
Λ Λ -->
μ μ -->
.
{\displaystyle \operatorname {tr} (\Lambda \Sigma )+\operatorname {tr} (\Lambda \mu \mu ^{T})=\operatorname {tr} (\Lambda \Sigma )+\operatorname {tr} (\mu ^{T}\Lambda \mu )=\operatorname {tr} (\Lambda \Sigma )+\mu ^{T}\Lambda \mu .}
Variance in the Gaussian case
In general, the variance of a quadratic form depends greatly on the distribution of
ε ε -->
{\displaystyle \varepsilon }
. However, if
ε ε -->
{\displaystyle \varepsilon }
does follow a multivariate normal distribution, the variance of the quadratic form becomes particularly tractable. Assume for the moment that
Λ Λ -->
{\displaystyle \Lambda }
is a symmetric matrix. Then,
var
-->
[
ε ε -->
T
Λ Λ -->
ε ε -->
]
=
2
tr
-->
[
Λ Λ -->
Σ Σ -->
Λ Λ -->
Σ Σ -->
]
+
4
μ μ -->
T
Λ Λ -->
Σ Σ -->
Λ Λ -->
μ μ -->
{\displaystyle \operatorname {var} \left[\varepsilon ^{T}\Lambda \varepsilon \right]=2\operatorname {tr} \left[\Lambda \Sigma \Lambda \Sigma \right]+4\mu ^{T}\Lambda \Sigma \Lambda \mu }
.[ 3]
In fact, this can be generalized to find the covariance between two quadratic forms on the same
ε ε -->
{\displaystyle \varepsilon }
(once again,
Λ Λ -->
1
{\displaystyle \Lambda _{1}}
and
Λ Λ -->
2
{\displaystyle \Lambda _{2}}
must both be symmetric):
cov
-->
[
ε ε -->
T
Λ Λ -->
1
ε ε -->
,
ε ε -->
T
Λ Λ -->
2
ε ε -->
]
=
2
tr
-->
[
Λ Λ -->
1
Σ Σ -->
Λ Λ -->
2
Σ Σ -->
]
+
4
μ μ -->
T
Λ Λ -->
1
Σ Σ -->
Λ Λ -->
2
μ μ -->
{\displaystyle \operatorname {cov} \left[\varepsilon ^{T}\Lambda _{1}\varepsilon ,\varepsilon ^{T}\Lambda _{2}\varepsilon \right]=2\operatorname {tr} \left[\Lambda _{1}\Sigma \Lambda _{2}\Sigma \right]+4\mu ^{T}\Lambda _{1}\Sigma \Lambda _{2}\mu }
.[ 4]
In addition, a quadratic form such as this follows a generalized chi-squared distribution .
Computing the variance in the non-symmetric case
The case for general
Λ Λ -->
{\displaystyle \Lambda }
can be derived by noting that
ε ε -->
T
Λ Λ -->
T
ε ε -->
=
ε ε -->
T
Λ Λ -->
ε ε -->
{\displaystyle \varepsilon ^{T}\Lambda ^{T}\varepsilon =\varepsilon ^{T}\Lambda \varepsilon }
so
ε ε -->
T
Λ Λ -->
~ ~ -->
ε ε -->
=
ε ε -->
T
(
Λ Λ -->
+
Λ Λ -->
T
)
ε ε -->
/
2
{\displaystyle \varepsilon ^{T}{\tilde {\Lambda }}\varepsilon =\varepsilon ^{T}\left(\Lambda +\Lambda ^{T}\right)\varepsilon /2}
is a quadratic form in the symmetric matrix
Λ Λ -->
~ ~ -->
=
(
Λ Λ -->
+
Λ Λ -->
T
)
/
2
{\displaystyle {\tilde {\Lambda }}=\left(\Lambda +\Lambda ^{T}\right)/2}
, so the mean and variance expressions are the same, provided
Λ Λ -->
{\displaystyle \Lambda }
is replaced by
Λ Λ -->
~ ~ -->
{\displaystyle {\tilde {\Lambda }}}
therein.
In the setting where one has a set of observations
y
{\displaystyle y}
and an operator matrix
H
{\displaystyle H}
, then the residual sum of squares can be written as a quadratic form in
y
{\displaystyle y}
:
RSS
=
y
T
(
I
− − -->
H
)
T
(
I
− − -->
H
)
y
.
{\displaystyle {\textrm {RSS}}=y^{T}(I-H)^{T}(I-H)y.}
For procedures where the matrix
H
{\displaystyle H}
is symmetric and idempotent , and the errors are Gaussian with covariance matrix
σ σ -->
2
I
{\displaystyle \sigma ^{2}I}
,
RSS
/
σ σ -->
2
{\displaystyle {\textrm {RSS}}/\sigma ^{2}}
has a chi-squared distribution with
k
{\displaystyle k}
degrees of freedom and noncentrality parameter
λ λ -->
{\displaystyle \lambda }
, where
k
=
tr
-->
[
(
I
− − -->
H
)
T
(
I
− − -->
H
)
]
{\displaystyle k=\operatorname {tr} \left[(I-H)^{T}(I-H)\right]}
λ λ -->
=
μ μ -->
T
(
I
− − -->
H
)
T
(
I
− − -->
H
)
μ μ -->
/
2
{\displaystyle \lambda =\mu ^{T}(I-H)^{T}(I-H)\mu /2}
may be found by matching the first two central moments of a noncentral chi-squared random variable to the expressions given in the first two sections. If
H
y
{\displaystyle Hy}
estimates
μ μ -->
{\displaystyle \mu }
with no bias , then the noncentrality
λ λ -->
{\displaystyle \lambda }
is zero and
RSS
/
σ σ -->
2
{\displaystyle {\textrm {RSS}}/\sigma ^{2}}
follows a central chi-squared distribution.
See also
References
^ Bates, Douglas. "Quadratic Forms of Random Variables" (PDF) . STAT 849 lectures . Retrieved August 21, 2011 .
^ Mathai, A. M. & Provost, Serge B. (1992). Quadratic Forms in Random Variables . CRC Press. p. 424. ISBN 978-0824786915 .
^ Rencher, Alvin C.; Schaalje, G. Bruce. (2008). Linear models in statistics (2nd ed.). Hoboken, N.J.: Wiley-Interscience. ISBN 9780471754985 . OCLC 212120778 .
^ Graybill, Franklin A. Matrices with applications in statistics (2. ed.). Wadsworth: Belmont, Calif. p. 367. ISBN 0534980384 .