Estuve buscando en Google, autodidacta y buscando solución durante horas, pero sin suerte. Encontré algunas preguntas similares aquí, pero no este caso.
Mis mesas:
- personas (~ 10 millones de filas)
- atributos (ubicación, edad, ...)
- enlaces (M: M) entre personas y atributos (~ 40 millones de filas)
Situación:
trato de seleccionar todos los identificadores de persona ( person_id
) de algunas ubicaciones ( location.attribute_value BETWEEN 3000 AND 7000
), de algún género ( gender.attribute_value = 1
), nacido en algunos años ( bornyear.attribute_value BETWEEN 1980 AND 2000
) y con el color de algunos ojos ( eyecolor.attribute_value IN (2,3)
).
Esta es mi consulta bruja tomó 3 ~ 4 min. y me gustaría optimizar:
SELECT person_id
FROM person
LEFT JOIN attribute location ON location.attribute_type_id = 1 AND location.person_id = person.person_id
LEFT JOIN attribute gender ON gender.attribute_type_id = 2 AND gender.person_id = person.person_id
LEFT JOIN attribute bornyear ON bornyear.attribute_type_id = 3 AND bornyear.person_id = person.person_id
LEFT JOIN attribute eyecolor ON eyecolor.attribute_type_id = 4 AND eyecolor.person_id = person.person_id
WHERE 1
AND location.attribute_value BETWEEN 3000 AND 7000
AND gender.attribute_value = 1
AND bornyear.attribute_value BETWEEN 1980 AND 2000
AND eyecolor.attribute_value IN (2,3)
LIMIT 100000;
Resultado:
+-----------+
| person_id |
+-----------+
| 233 |
| 605 |
| ... |
| 8702599 |
| 8703617 |
+-----------+
100000 rows in set (3 min 42.77 sec)
Explique extendido:
+----+-------------+----------+--------+---------------------------------------------+-----------------+---------+--------------------------+---------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+--------+---------------------------------------------+-----------------+---------+--------------------------+---------+----------+--------------------------+
| 1 | SIMPLE | bornyear | range | attribute_type_id,attribute_value,person_id | attribute_value | 5 | NULL | 1265229 | 100.00 | Using where |
| 1 | SIMPLE | location | ref | attribute_type_id,attribute_value,person_id | person_id | 5 | test1.bornyear.person_id | 4 | 100.00 | Using where |
| 1 | SIMPLE | eyecolor | ref | attribute_type_id,attribute_value,person_id | person_id | 5 | test1.bornyear.person_id | 4 | 100.00 | Using where |
| 1 | SIMPLE | gender | ref | attribute_type_id,attribute_value,person_id | person_id | 5 | test1.eyecolor.person_id | 4 | 100.00 | Using where |
| 1 | SIMPLE | person | eq_ref | PRIMARY | PRIMARY | 4 | test1.location.person_id | 1 | 100.00 | Using where; Using index |
+----+-------------+----------+--------+---------------------------------------------+-----------------+---------+--------------------------+---------+----------+--------------------------+
5 rows in set, 1 warning (0.02 sec)
Perfilado:
+------------------------------+-----------+
| Status | Duration |
+------------------------------+-----------+
| Sending data | 3.069452 |
| Waiting for query cache lock | 0.000017 |
| Sending data | 2.968915 |
| Waiting for query cache lock | 0.000019 |
| Sending data | 3.042468 |
| Waiting for query cache lock | 0.000043 |
| Sending data | 3.264984 |
| Waiting for query cache lock | 0.000017 |
| Sending data | 2.823919 |
| Waiting for query cache lock | 0.000038 |
| Sending data | 2.863903 |
| Waiting for query cache lock | 0.000014 |
| Sending data | 2.971079 |
| Waiting for query cache lock | 0.000020 |
| Sending data | 3.053197 |
| Waiting for query cache lock | 0.000087 |
| Sending data | 3.099053 |
| Waiting for query cache lock | 0.000035 |
| Sending data | 3.064186 |
| Waiting for query cache lock | 0.000017 |
| Sending data | 2.939404 |
| Waiting for query cache lock | 0.000018 |
| Sending data | 3.440288 |
| Waiting for query cache lock | 0.000086 |
| Sending data | 3.115798 |
| Waiting for query cache lock | 0.000068 |
| Sending data | 3.075427 |
| Waiting for query cache lock | 0.000072 |
| Sending data | 3.658319 |
| Waiting for query cache lock | 0.000061 |
| Sending data | 3.335427 |
| Waiting for query cache lock | 0.000049 |
| Sending data | 3.319430 |
| Waiting for query cache lock | 0.000061 |
| Sending data | 3.496563 |
| Waiting for query cache lock | 0.000029 |
| Sending data | 3.017041 |
| Waiting for query cache lock | 0.000032 |
| Sending data | 3.132841 |
| Waiting for query cache lock | 0.000050 |
| Sending data | 2.901310 |
| Waiting for query cache lock | 0.000016 |
| Sending data | 3.107269 |
| Waiting for query cache lock | 0.000062 |
| Sending data | 2.937373 |
| Waiting for query cache lock | 0.000016 |
| Sending data | 3.097082 |
| Waiting for query cache lock | 0.000261 |
| Sending data | 3.026108 |
| Waiting for query cache lock | 0.000026 |
| Sending data | 3.089760 |
| Waiting for query cache lock | 0.000041 |
| Sending data | 3.012763 |
| Waiting for query cache lock | 0.000021 |
| Sending data | 3.069694 |
| Waiting for query cache lock | 0.000046 |
| Sending data | 3.591908 |
| Waiting for query cache lock | 0.000060 |
| Sending data | 3.526693 |
| Waiting for query cache lock | 0.000076 |
| Sending data | 3.772659 |
| Waiting for query cache lock | 0.000069 |
| Sending data | 3.346089 |
| Waiting for query cache lock | 0.000245 |
| Sending data | 3.300460 |
| Waiting for query cache lock | 0.000019 |
| Sending data | 3.135361 |
| Waiting for query cache lock | 0.000021 |
| Sending data | 2.909447 |
| Waiting for query cache lock | 0.000039 |
| Sending data | 3.337561 |
| Waiting for query cache lock | 0.000140 |
| Sending data | 3.138180 |
| Waiting for query cache lock | 0.000090 |
| Sending data | 3.060687 |
| Waiting for query cache lock | 0.000085 |
| Sending data | 2.938677 |
| Waiting for query cache lock | 0.000041 |
| Sending data | 2.977974 |
| Waiting for query cache lock | 0.000872 |
| Sending data | 2.918640 |
| Waiting for query cache lock | 0.000036 |
| Sending data | 2.975842 |
| Waiting for query cache lock | 0.000051 |
| Sending data | 2.918988 |
| Waiting for query cache lock | 0.000021 |
| Sending data | 2.943810 |
| Waiting for query cache lock | 0.000061 |
| Sending data | 3.330211 |
| Waiting for query cache lock | 0.000025 |
| Sending data | 3.411236 |
| Waiting for query cache lock | 0.000023 |
| Sending data | 23.339035 |
| end | 0.000807 |
| query end | 0.000023 |
| closing tables | 0.000325 |
| freeing items | 0.001217 |
| logging slow query | 0.000007 |
| logging slow query | 0.000011 |
| cleaning up | 0.000104 |
+------------------------------+-----------+
100 rows in set (0.00 sec)
Tablas de estructuras:
CREATE TABLE `attribute` (
`attribute_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`attribute_type_id` int(11) unsigned DEFAULT NULL,
`attribute_value` int(6) DEFAULT NULL,
`person_id` int(11) unsigned DEFAULT NULL,
PRIMARY KEY (`attribute_id`),
KEY `attribute_type_id` (`attribute_type_id`),
KEY `attribute_value` (`attribute_value`),
KEY `person_id` (`person_id`)
) ENGINE=MyISAM AUTO_INCREMENT=40000001 DEFAULT CHARSET=utf8;
CREATE TABLE `person` (
`person_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`person_name` text CHARACTER SET latin1,
PRIMARY KEY (`person_id`)
) ENGINE=MyISAM AUTO_INCREMENT=20000001 DEFAULT CHARSET=utf8;
La consulta se realizó en el servidor virtual DigitalOcean con SSD y 1 GB de RAM.
Supongo que puede haber problemas con el diseño de la base de datos. ¿Tiene alguna sugerencia para diseñar mejor esta situación, por favor? ¿O solo para ajustar la selección anterior?
(attribute_type_id, attribute_value, person_id)
y (attribute_type_id, person_id, attribute_value)
attribute (person_id, attribute_type_id, attribute_value)