Estoy tratando de determinar qué índices usar para una consulta SQL con una WHERE
condición y GROUP BY
cuál actualmente se ejecuta muy lentamente.
Mi consulta:
SELECT group_id
FROM counter
WHERE ts between timestamp '2014-03-02 00:00:00.0' and timestamp '2014-03-05 12:00:00.0'
GROUP BY group_id
La tabla tiene actualmente 32.000.000 filas. El tiempo de ejecución de la consulta aumenta mucho cuando aumento el marco de tiempo.
La tabla en cuestión se ve así:
CREATE TABLE counter (
id bigserial PRIMARY KEY
, ts timestamp NOT NULL
, group_id bigint NOT NULL
);
Actualmente tengo los siguientes índices, pero el rendimiento sigue siendo lento:
CREATE INDEX ts_index
ON counter
USING btree
(ts);
CREATE INDEX group_id_index
ON counter
USING btree
(group_id);
CREATE INDEX comp_1_index
ON counter
USING btree
(ts, group_id);
CREATE INDEX comp_2_index
ON counter
USING btree
(group_id, ts);
Ejecutar EXPLAIN en la consulta da el siguiente resultado:
"QUERY PLAN"
"HashAggregate (cost=467958.16..467958.17 rows=1 width=4)"
" -> Index Scan using ts_index on counter (cost=0.56..467470.93 rows=194892 width=4)"
" Index Cond: ((ts >= '2014-02-26 00:00:00'::timestamp without time zone) AND (ts <= '2014-02-27 23:59:00'::timestamp without time zone))"
SQL Fiddle con datos de ejemplo: http://sqlfiddle.com/#!15/7492b/1
La pregunta
¿Se puede mejorar el rendimiento de esta consulta agregando mejores índices o debo aumentar la potencia de procesamiento?
Editar 1
Se utiliza PostgreSQL versión 9.3.2.
Editar 2
Intenté la propuesta de @Erwin con EXISTS
:
SELECT group_id
FROM groups g
WHERE EXISTS (
SELECT 1
FROM counter c
WHERE c.group_id = g.group_id
AND ts BETWEEN timestamp '2014-03-02 00:00:00'
AND timestamp '2014-03-05 12:00:00'
);
Pero desafortunadamente esto no pareció aumentar el rendimiento. El plan de consulta:
"QUERY PLAN"
"Nested Loop Semi Join (cost=1607.18..371680.60 rows=113 width=4)"
" -> Seq Scan on groups g (cost=0.00..2.33 rows=133 width=4)"
" -> Bitmap Heap Scan on counter c (cost=1607.18..158895.53 rows=60641 width=4)"
" Recheck Cond: ((group_id = g.id) AND (ts >= '2014-01-01 00:00:00'::timestamp without time zone) AND (ts <= '2014-03-05 12:00:00'::timestamp without time zone))"
" -> Bitmap Index Scan on comp_2_index (cost=0.00..1592.02 rows=60641 width=0)"
" Index Cond: ((group_id = g.id) AND (ts >= '2014-01-01 00:00:00'::timestamp without time zone) AND (ts <= '2014-03-05 12:00:00'::timestamp without time zone))"
Editar 3
El plan de consulta para la consulta LATERAL de ypercube:
"QUERY PLAN"
"Nested Loop (cost=8.98..1200.42 rows=133 width=20)"
" -> Seq Scan on groups g (cost=0.00..2.33 rows=133 width=4)"
" -> Result (cost=8.98..8.99 rows=1 width=0)"
" One-Time Filter: ($1 IS NOT NULL)"
" InitPlan 1 (returns $1)"
" -> Limit (cost=0.56..4.49 rows=1 width=8)"
" -> Index Only Scan using comp_2_index on counter c (cost=0.56..1098691.21 rows=279808 width=8)"
" Index Cond: ((group_id = $0) AND (ts IS NOT NULL) AND (ts >= '2010-03-02 00:00:00'::timestamp without time zone) AND (ts <= '2014-03-05 12:00:00'::timestamp without time zone))"
" InitPlan 2 (returns $2)"
" -> Limit (cost=0.56..4.49 rows=1 width=8)"
" -> Index Only Scan Backward using comp_2_index on counter c_1 (cost=0.56..1098691.21 rows=279808 width=8)"
" Index Cond: ((group_id = $0) AND (ts IS NOT NULL) AND (ts >= '2010-03-02 00:00:00'::timestamp without time zone) AND (ts <= '2014-03-05 12:00:00'::timestamp without time zone))"
group_id
y no cuenta?
group_id
valores diferentes hay en la tabla?