Busque y escaneará ... en tablas particionadas

He leído estos artículos en PCMag por Itzik Ben-Gan :

Busque y escaneará la Parte I: cuando el optimizador no optimiza
Búsqueda y escaneará la Parte II: Teclas ascendentes

Actualmente tengo un problema de "Máx. Agrupados" con todas nuestras tablas particionadas. Usamos el truco que Itzik Ben-Gan proporcionó para obtener un máximo (ID), pero a veces simplemente no se ejecuta:

DECLARE @MaxIDPartitionTable BIGINT
SELECT  @MaxIDPartitionTable = ISNULL(MAX(IDPartitionedTable), 0)
FROM    ( SELECT    *
          FROM      ( SELECT    partition_number PartitionNumber
                      FROM      sys.partitions
                      WHERE     object_id = OBJECT_ID('fct.MyTable')
                                AND index_id = 1
                    ) T1
                    CROSS APPLY ( SELECT    ISNULL(MAX(UpdatedID), 0) AS IDPartitionedTable
                                  FROM      fct.MyTable s
                                  WHERE     $PARTITION.PF_MyTable(s.PCTimeStamp) = PartitionNumber
                                            AND UpdatedID <= @IDColumnThresholdValue
                                ) AS o
        ) AS T2;
SELECT @MaxIDPartitionTable

Consigo este plan

Pero después de 45 minutos, mira las lecturas

reads          writes   physical_reads
12,949,127        2       12,992,610

de donde salgo sp_whoisactive.

Normalmente funciona bastante rápido, pero no hoy.

Editar: estructura de tabla con particiones:

CREATE PARTITION FUNCTION [MonthlySmallDateTime](SmallDateTime) AS RANGE RIGHT FOR VALUES (N'2000-01-01T00:00:00.000', N'2000-02-01T00:00:00.000' /* and many more */)
go
CREATE PARTITION SCHEME PS_FctContractualAvailability AS PARTITION [MonthlySmallDateTime] TO ([Standard], [Standard])
GO
CREATE TABLE fct.MyTable(
    MyTableID BIGINT IDENTITY(1,1),
    [DT1TurbineID] INT NOT NULL,
    [PCTimeStamp] SMALLDATETIME NOT NULL,
    Filler CHAR(100) NOT NULL DEFAULT 'N/A',
    UpdatedID BIGINT NULL,
    UpdatedDate DATETIME NULL
CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED 
(
    [DT1TurbineID] ASC,
    [PCTimeStamp] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = ROW) ON [PS_FctContractualAvailability]([PCTimeStamp])
) ON [PS_FctContractualAvailability]([PCTimeStamp])

GO

CREATE UNIQUE NONCLUSTERED INDEX [IX_UpdatedID_PCTimeStamp] ON [fct].MyTable
(
    [UpdatedID] ASC,
    [PCTimeStamp] ASC
)
INCLUDE (   [UpdatedDate]) 
WHERE ([UpdatedID] IS NOT NULL)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = ROW) ON [PS_FctContractualAvailability]([PCTimeStamp])
GO

sql-server query-performance partitioning

— Henrik Staun Poulsen
fuente

El problema básico es que la búsqueda de índice no es seguida por un operador superior. Esta es una optimización que generalmente se introduce cuando la búsqueda devuelve filas en el orden correcto para un MIN\MAXagregado.

Esta optimización explota el hecho de que la fila min / max es la primera en orden ascendente o descendente. También puede ser que el optimizador no pueda aplicar esta optimización a tablas particionadas; Yo olvido.

De todos modos, el punto es que sin esta transformación, el plan de ejecución termina procesando cada fila que califica S.UpdatedID <= @IDColumnThresholdValuepor partición, en lugar de la fila deseada por partición.

No ha proporcionado definiciones de tablas, índices o particiones en la pregunta, por lo que no puedo ser mucho más específico. Debe verificar que su índice sea compatible con dicha transformación. Más o menos equivalentemente, también podría expresar el MAXcomo a TOP (1) ... ORDER BY UpdatedID DESC.

Si esto resulta en una Clasificación (incluida una Clasificación TopN ), sabrá que su índice no es útil. Por ejemplo:

SELECT
    @MaxIDPartitionTable = ISNULL(MAX(T2.IDPartitionedTable), 0)
FROM    
( 
    SELECT
        O.IDPartitionedTable
    FROM      
    ( 
        SELECT
            P.partition_number AS PartitionNumber
        FROM sys.partitions AS P
        WHERE 
            P.[object_id] = OBJECT_ID(N'fct.MyTable', N'U')
            AND P.index_id = 1
    ) AS T1
    CROSS APPLY 
    (    
        SELECT TOP (1) 
            S.UpdatedID AS IDPartitionedTable
        FROM fct.MyTable AS S
        WHERE
            $PARTITION.PF_MyTable(S.PCTimeStamp) = T1.PartitionNumber
            AND S.UpdatedID <= @IDColumnThresholdValue
        ORDER BY
            S.UpdatedID DESC
    ) AS O
) AS T2;

La forma del plan que esto debería producir es:

Observe la parte superior debajo del índice de búsqueda. Esto limita el procesamiento a una fila por partición.

O, usando una tabla temporal para contener números de partición:

CREATE TABLE #Partitions
(
    partition_number integer PRIMARY KEY CLUSTERED
);

INSERT #Partitions
    (partition_number)
SELECT
    P.partition_number AS PartitionNumber
FROM sys.partitions AS P
WHERE 
    P.[object_id] = OBJECT_ID(N'fct.MyTable', N'U')
    AND P.index_id = 1;

SELECT
    @MaxIDPartitionTable = ISNULL(MAX(T2.UpdatedID), 0)
FROM #Partitions AS P
CROSS APPLY 
(
    SELECT TOP (1) 
        S.UpdatedID
    FROM fct.MyTable AS S
    WHERE
        $PARTITION.PF_MyTable(S.PCTimeStamp) = P.partition_number
        AND S.UpdatedID <= @IDColumnThresholdValue
    ORDER BY
        S.UpdatedID DESC
) AS T2;

DROP TABLE #Partitions;

Nota al margen: acceder a una tabla del sistema en su consulta evita el paralelismo. Si esto es importante, considere materializar los números de partición en una tabla temporal, luego APPLYde eso. El paralelismo no suele ser útil en este patrón (con una indexación correcta), pero sería negligente de mi parte no mencionarlo.

Nota al margen 2: hay un elemento Connect activo que solicita soporte integrado para MIN\MAXagregados y Top en objetos particionados.

— Paul White dice GoFundMonica
fuente