.. _estimate_spectral_density_function:
Estimation of a spectral density function
==========================================
| Let :math:`X: \Omega \times \cD \rightarrow \Rset^d` be a multivariate
stationary normal process of dimension :math:`d`. We only treat here
the case where the domain is of dimension 1: :math:`\cD \in \Rset`
(:math:`n=1`).
| If the process is continuous, then :math:`\cD=\Rset`. In the discrete
case, :math:`\cD` is a lattice.
| :math:`X` is supposed to be a second order process with zero mean and
we suppose that its spectral density function
:math:`S : \Rset \rightarrow \mathcal{H}^+(d)` defined in
:eq:`specdensFunc` exists.
:math:`\mathcal{H}^+(d) \in \mathcal{M}_d(\Cset)` is the set of
:math:`d`-dimensional positive definite hermitian matrices.
| This objective of this use case is to estimate the spectral density
function :math:`S` from data, which can be a sample of time series or
one time series.
Depending on the available data, we proceed differently:
- if the data correspond to several independent realizations of the
process, a *statistical
estimate* is performed using statistical average of a
realization-based estimator;
- if the data correspond to one realization of the process at different
time stamps (stored in a *TimeSeries* object), the process being
observed during a long period of time, an *ergodic estimate* is
performed using a time average of an ergodic-based estimator.
| The estimation of the spectral density function from data may use some
parametric or non parametric methods.
| The *Welch* method is a *non parametric* estimation technique, known
to be performant. We detail it in the case where the available data on
the process is a time series which values are
:math:`(\vect{x}_0, \dots,\vect{x}_{N-1})` associated to the time grid
:math:`(t_0, \dots, t_{N-1})` which is a discretization of the domain
:math:`[0,T]`.
| We assume that the process has a spectral density :math:`S` defined on
:math:`| f | \leq \frac{T}{2}`.
| The method is based on the segmentation of the time series into
:math:`K` segments of length :math:`L`, possibly overlapping (size of
overlap :math:`R`).
| Let :math:`\vect{X}_{1}(j), \ j = 0, 1,...,L-1` be the first such
segment. Then:
.. math:: \vect{X}_{1}(j) = \vect{X}(j) , \ j = 0, 1,...,L-1
Applying the same decomposition,
.. math:: \vect{X}_{2}(j) = \vect{X}(j + (L - R)) , \ j = 0, 1,...,L-1
and finally:
.. math:: \vect{X}_{K}(j) = \vect{X}(j + (K-1)(L-R)) , \ j = 0, 1,...,L-1
The objective is to get a statistical estimator from these :math:`K`
segments. We define the *periodogram* associated with the segment
:math:`\vect{X}_k` by:
.. math::
\begin{aligned}
\vect{X}_{k}(f_p,T)&=&\Delta t\sum_{n=0}^{L-1}\vect{x}(n\Delta t)\exp\left[\frac{-j2\pi pn}{N}\right], \quad p=0,\dots, L-1\\
\hat{G}_{\vect{x}}(f_p,T)&=&\frac{2}{T}\vect{X}_{k}(f_p,T)\,{\vect{X}_{k}(f_p,T)^*}^t,\quad p=0,\dots,L/2-1\end{aligned}
with :math:`\Delta t=\frac{T}{N}` and
:math:`f_p=\frac{p}{T}=\frac{p}{N}\frac{1}{\Delta t}`.
| It has been proven that the *periodogram* has bad statistical
properties. Indeed, two quantities summarize the properties of an
estimator: its *bias* and its *variance*. The bias is the expected
error one makes on the average using only a finite number of time
series of finite length, whereas the covariance is the expected
fluctuations of the estimator around its mean value. For the
periodogram, we have:
- Bias\ :math:`=\mathbb{E}[\hat{G}_{\vect{x}}(f_p, T)-G_{\vect{X}}(f_p)]=(\frac{1}{T}W_B(f_p, T)-\delta_0)*G_{\vect{X}}(f_p)`
where :math:`W_B(f_p, T) = \left(\frac{\sin\pi fT}{\pi fT}\right)^2`
is the squared module of the Fourier transform of the function
:math:`w_B(t, T)` (*Barlett window*) defined by:
.. math:: w_B(t, T) = \mathbf{1}_{[0,T]}(t)
This estimator is *biased* but this bias vanishes when
:math:`T\rightarrow\infty` as
:math:`\lim_{T\rightarrow\infty} \frac{1}{T}W_B(f_p, T)=\delta_0`.
- Covariance\ :math:`=\frac{1}{T}W_B(f_p, T)*G_{\vect{X}}(f_p)\rightarrow G_{\vect{X}}(f_p)`
as :math:`T\rightarrow\infty`, which means that the fluctuations of
the periodogram are of the same order of magnitude as the quantity to
be estimated and thus the estimator is not convergent.
The periodogramâ€™s lack of convergence may be easily fixed if we consider
the *averaged periodogram* over :math:`K` independent time series or
segments:
.. math:: \hat{G}_{\vect{x}}(f_p,T)=\frac{2}{KT}\sum_{k=0}^{K-1}\vect{X}^{(k)}(f_p,T)\vect{X}^{(k)}(f_p,T)^t
The averaging process has no effect on the significant bias of the
periodogram.
The use of a *tapering window* :math:`w(t, T)` may significantly reduce
it. The time series :math:`\vect{x}(t)` is replaced by a tapered time
series :math:`w(t, T)\vect{x}(t)` in the computation of
:math:`\vect{X}(f_p,T)`. One gets:
.. math:: \mathbb{E}[\hat{G}_{\vect{x}}(f_p, T)-G_{\vect{X}}(f_p)=(\frac{1}{T}W(f_p, T)-\delta_0)*G_{\vect{X}}(f_p)
where :math:`W(f_p, T)` is the square module of the Fourier transform
of :math:`w(t, T)` at the frequency :math:`f_p`. A judicious choice of
tapering function such as the *Hanning window* :math:`w_H` can
dramatically reduce the bias of the estimate:
.. math::
:label: HamEff
w_H(t, T) = \sqrt{\frac{8}{3}}\left(1-\cos^2\left(\frac{\pi t}{T}\right)\right)\mathbf{1}_{[0,T]}(t)
The library builds an estimation of the spectrum on a *TimeSeries* by
fixing the number of segments, the *overlap* size parameter and a
*FilteringWindows*. The available ones are:
- The *Hamming* window
.. math:: w(t, T) = \sqrt{\frac{1}{K}}\left(0.54-0.46\cos^2\left(\frac{2 \pi t}{T}\right)\right)\mathbf{1}_{[0,T]}(t)
with :math:`K` = :math:`\sqrt{0.54^2 + \frac{1}{2} 0.46^2}`
- The *Hanning* window described in :eq:`HamEff` which is supposed to be
the most useful.
.. topic:: API:
- See :class:`~openturns.Hanning`
- See :class:`~openturns.Hamming`
- See :class:`~openturns.WelchFactory`
.. topic:: Examples:
- See :doc:`/auto_data_analysis/estimate_stochastic_processes/plot_estimate_spectral_density_function`