Filtro de suavizado Savitzky-Golay para datos no igualmente espaciados

16

Tengo una señal que se mide a 100Hz y necesito aplicar el filtro de suavizado Savitzky-Golay en esta señal. Sin embargo, en una inspección más cercana, mi señal no se mide a una velocidad perfectamente constante, el delta entre mediciones oscila entre 9.7 y 10.3 ms.

¿Hay alguna manera de usar el filtro Savitzky-Golay en datos no espaciados por igual? ¿Hay otros métodos que podría aplicar?

filters smoothing

— VLC
fuente

Un artículo de 1991 de Gorry trata sobre este tema exacto datapdf.com/… . Pero tldr, la respuesta de datageist es la idea principal correcta (mínimos cuadrados locales). Lo que Gorry observa es que los coeficientes dependen solo de las variables independientes y son lineales en las variables dependientes (como Savitzky-Golay). Luego da una manera de calcularlos, pero si no está escribiendo una biblioteca optimizada, se podría usar cualquier antiguo ajustador de mínimos cuadrados.

— Dave Pritchard

5

Un método sería volver a muestrear sus datos para que estén igualmente espaciados, luego puede hacer el procesamiento que desee. El remuestreo ilimitado de banda usando filtrado lineal no será una buena opción ya que los datos no están espaciados uniformemente, por lo que podría usar algún tipo de interpolación polinómica local (por ejemplo, splines cúbicas) para estimar cuáles son los valores de la señal subyacente "exactos" Intervalos de 10 milisegundos.

— Jason R
fuente

Tenía esta solución en mente como último recurso. Me pregunto si al final este enfoque ofrece una mejor solución que simplemente asumir que mi señal se mide a una velocidad constante.

— VLC

Creo que incluso si se muestrea de manera no uniforme, aún puede usar la interpolación sinc () (o un filtro de paso bajo altamente muestreado diferente). Esto puede dar mejores resultados que una spline o un pchip

— Hilmar

1

@Hilmar: Tienes razón. Hay varias formas de volver a muestrear los datos; la interpolación sinc aproximada sería el método "ideal" para el remuestreo de banda ilimitada.

— Jason R

15

Debido a la forma en que se deriva el filtro Savitzky-Golay (es decir, como ajustes polinomiales de mínimos cuadrados locales), existe una generalización natural al muestreo no uniforme: es mucho más costoso desde el punto de vista computacional.

Filtros Savitzky-Golay en general

Para el filtro estándar, la idea es ajustar un polinomio a un conjunto local de muestras [usando mínimos cuadrados], luego reemplazar la muestra central con el valor del polinomio en el índice central (es decir, en 0). Eso significa que los coeficientes de filtro SG estándar pueden generarse invirtiendo una matriz de Vandermonde de indicios de muestra. Por ejemplo, para generar un ajuste parabólico local en cinco muestras (con las indicaciones locales -2, -1,0,1,2), el sistema de ecuaciones de diseño sería el siguiente: $y_0\dots y_4$ $Ac = y$

[\begin{matrix} - 2^{0} & - 2^{1} & - 2^{2} \\ - 1^{0} & - 1^{1} & - 1^{2} \\ 0^{0} & 0^{1} & 0^{2} \\ 1^{0} & 1^{1} & 1^{2} \\ 2^{0} & 2^{1} & 2^{2} \end{matrix}] [\begin{matrix} c_{0} \\ c_{1} \\ c_{2} \end{matrix}] = [\begin{matrix} y_{0} \\ y_{1} \\ y_{2} \\ y_{3} \\ y_{4} \end{matrix}] .

$\begin{bmatrix}-2^0 & -2^1 & -2^2 \\ -1^0 & -1^1 & -1^2 \\ 0^0 & 0^1 & 0^2 \\ 1^0 & 1^1 & 1^2 \\ 2^0 & 2^1 & 2^2 \end{bmatrix} \begin{bmatrix} c_0 \\ c_1 \\ c_2 \end{bmatrix} = \begin{bmatrix} y_0 \\ y_1 \\ y_2 \\ y_3 \\ y_4 \end{bmatrix}.$

En lo anterior, los son los coeficientes desconocidos del polinomio de mínimos cuadrados . Dado que el valor del polinomio en es solo , calcular el pseudoinverso de la matriz de diseño (es decir, ) arrojará los coeficientes del filtro SG en la fila superior. En este caso, serían $c_0 \dots c_2$ $c_0 + c_1x + c_2x^2$ $x = 0$ $c_0$ $c = (A^TA)^{-1}A^T y\space$

[\begin{matrix} c_{0} \\ c_{1} \\ c_{2} \end{matrix}] = [\begin{matrix} - 3 & 12 & 17 & 12 & - 3 \\ - 7 & - 4 & 0 & 4 & 7 \\ 5 & - 3 & - 5 & - 3 & 5 \end{matrix}] [\begin{matrix} y_{0} \\ y_{1} \\ y_{2} \\ y_{3} \\ y_{4} \end{matrix}] .

$\begin{bmatrix}c_0 \\ c_1 \\ c_2 \end{bmatrix} = \begin{bmatrix} -3 & 12 & 17 & 12 & -3 \\ -7 & -4 & 0 & 4 & 7 \\ 5 & -3 & -5 & -3 & 5 \\ \end{bmatrix} \begin{bmatrix} y_0 \\ y_1 \\ y_2 \\ y_3 \\ y_4 \end{bmatrix}.$

$c_0 + c_1x + c_2x^2$ $c_1 + 2c_2x$ $c_1$

Muestreo no uniforme

$x_n$ $t_n$ $0$

\begin{aligned} t_{- 2} & = x_{- 2} - x_{0} \\ t_{- 1} & = x_{- 1} - x_{0} \\ t_{0} & = x_{0} - x_{0} \\ t_{1} & = x_{1} - x_{0} \\ t_{2} & = x_{2} - x_{0} \end{aligned}

$\begin{align} t_{-2} & = x_{-2} - x_0 \\ t_{-1} & = x_{-1} - x_0 \\ t_0 & = x_0 - x_0 \\ t_1 & = x_1 - x_0 \\ t_2 & = x_2 - x_0 \end{align}$

entonces cada matriz de diseño tendrá la siguiente forma:

A = [\begin{matrix} t_{- 2}^{0} & t_{- 2}^{1} & t_{- 2}^{2} \\ t_{- 1}^{0} & t_{- 1}^{1} & t_{- 1}^{2} \\ t_{0}^{0} & t_{0}^{1} & t_{0}^{2} \\ t_{1}^{0} & t_{1}^{1} & t_{1}^{2} \\ t_{2}^{0} & t_{2}^{1} & t_{2}^{2} \end{matrix}] = [\begin{matrix} 1 & t_{- 2} & t_{- 2}^{2} \\ 1 & t_{- 1} & t_{- 1}^{2} \\ 1 & 0 & 0 \\ 1 & t_{1} & t_{1}^{2} \\ 1 & t_{2} & t_{2}^{2} \end{matrix}] .

$A = \begin{bmatrix} t_{-2}^0 & t_{-2}^1 & t_{-2}^2 \\ t_{-1}^0 & t_{-1}^1 & t_{-1}^2 \\ t_0^0 & t_0^1 & t_0^2 \\ t_1^0 & t_1^1 & t_1^2 \\ t_2^0 & t_2^1 & t_2^2 \end{bmatrix} = \begin{bmatrix} 1 & t_{-2} & t_{-2}^2 \\ 1 & t_{-1} & t_{-1}^2 \\ 1 & 0 & 0 \\ 1 & t_1 & t_1^2 \\ 1 & t_2 & t_2^2 \end{bmatrix}.$

$A$ $c_0$

— Datageist
fuente

suena como si se moviera de O (log (n)) a O (n ^ 2).

— EngrStudent - Restablece a Monica

Aquí hay una implementación de Scala descrita por datageist hacia arriba.

— Núcleo medio

1

@Mediumcore No agregaste un enlace a tu publicación original. Además, lo eliminé porque no proporcionaba una respuesta a la pregunta. Intenta editar la publicación de datageist para agregar un enlace; será moderado después de la revisión.

— Peter K.

4

$f$ $N$ $\sqrt{N/2}$
$\qquad -$

(derivación alguien?)

$\pm N/2$ $t$ $t$
$t_i \to i$

— denis
fuente

1

Descubrí que hay dos formas de usar el algoritmo savitzky-golay en Matlab. Una vez como filtro y otra como función de suavizado, pero básicamente deberían hacer lo mismo.

yy = sgolayfilt (y, k, f): Aquí, se supone que los valores y = y (x) están igualmente espaciados en x.
yy = suave (x, y, span, 'sgolay', grado): ¡Aquí puede tener x como entrada adicional y haciendo referencia a la ayuda de Matlab x no tiene que estar equidistante!

— Jochen
fuente

0

Si es de alguna ayuda, hice una implementación en C del método descrito por datageist. Uso gratuito bajo su propio riesgo.

/**
 * @brief smooth_nonuniform
 * Implements the method described in  /signals/1676/savitzky-golay-smoothing-filter-for-not-equally-spaced-data
 * free to use at the user's risk
 * @param n the half size of the smoothing sample, e.g. n=2 for smoothing over 5 points
 * @param the degree of the local polynomial fit, e.g. deg=2 for a parabolic fit
 */
bool smooth_nonuniform(uint deg, uint n, std::vector<double>const &x, std::vector<double> const &y, std::vector<double>&ysm)
{
    if(x.size()!=y.size()) return false; // don't even try
    if(x.size()<=2*n)      return false; // not enough data to start the smoothing process
//    if(2*n+1<=deg+1)       return false; // need at least deg+1 points to make the polynomial

    int m = 2*n+1; // the size of the filter window
    int o = deg+1; // the smoothing order

    std::vector<double> A(m*o);         memset(A.data(),   0, m*o*sizeof(double));
    std::vector<double> tA(m*o);        memset(tA.data(),  0, m*o*sizeof(double));
    std::vector<double> tAA(o*o);       memset(tAA.data(), 0, o*o*sizeof(double));

    std::vector<double> t(m);           memset(t.data(),   0, m*  sizeof(double));
    std::vector<double> c(o);           memset(c.data(),   0, o*  sizeof(double));

    // do not smooth start and end data
    int sz = y.size();
    ysm.resize(sz);           memset(ysm.data(), 0,sz*sizeof(double));
    for(uint i=0; i<n; i++)
    {
        ysm[i]=y[i];
        ysm[sz-i-1] = y[sz-i-1];
    }

    // start smoothing
    for(uint i=n; i<x.size()-n; i++)
    {
        // make A and tA
        for(int j=0; j<m; j++)
        {
            t[j] = x[i+j-n] - x[i];
        }
        for(int j=0; j<m; j++)
        {
            double r = 1.0;
            for(int k=0; k<o; k++)
            {
                A[j*o+k] = r;
                tA[k*m+j] = r;
                r *= t[j];
            }
        }

        // make tA.A
        matMult(tA.data(), A.data(), tAA.data(), o, m, o);

        // make (tA.A)-¹ in place
        if (o==3)
        {
            if(!invert33(tAA.data())) return false;
        }
        else if(o==4)
        {
            if(!invert44(tAA.data())) return false;
        }
        else
        {
            if(!inverseMatrixLapack(o, tAA.data())) return false;
        }

        // make (tA.A)-¹.tA
        matMult(tAA.data(), tA.data(), A.data(), o, o, m); // re-uses memory allocated for matrix A

        // compute the polynomial's value at the center of the sample
        ysm[i] = 0.0;
        for(int j=0; j<m; j++)
        {
            ysm[i] += A[j]*y[i+j-n];
        }
    }

    std::cout << "      x       y       y_smoothed" << std::endl;
    for(uint i=0; i<x.size(); i++) std::cout << "   " << x[i] << "   " << y[i]  << "   "<< ysm[i] << std::endl;

    return true;
}

— techwinder
fuente