Estoy tratando de entender el mensaje MCE para encontrar qué módulo de memoria es malo en un servidor. Este mensaje aparece en /var/log/kern.log
un servidor que se congela dos veces hoy.
Apr 13 22:39:22 mbox kernel: [36247975.116860] sbridge: HANDLING MCE MEMORY ERROR
Apr 13 22:39:22 mbox kernel: [36247975.116867] CPU 0: Machine Check Exception: 0 Bank 5: 8c00004000010090
Apr 13 22:39:22 mbox kernel: [36247975.116869] TSC 0 ADDR 4a0d75900 MISC 21405cdc86 PROCESSOR 0:206d7 TIME 1428957562 SOCKET 0 APIC 0
Apr 13 22:39:22 mbox kernel: [36247975.951013] EDAC MC0: 1 CE memory read error
Sospecho que hay un módulo de memoria defectuoso. El servidor es un Xeon E5-2650 2x con módulos de memoria 8x8Go (8 ranuras de memoria para cada CPU)
Aquí está la población del módulo de memoria de lshw
:
*-memory:0
description: System Memory
physical id: 2d
slot: System board or motherboard
*-bank:0
description: DIMM DDR3 1333 MHz (0,8 ns)
product: 9965516-197.A
vendor: Kingston
physical id: 0
serial: B83AE5C2
slot: P1_DIMMA1
size: 8GiB
width: 64 bits
clock: 1333MHz (0.8ns)
*-bank:1
description: DIMM Synchronous [empty]
product: Dimm1_PartNum
vendor: Dimm1_Manufacturer
physical id: 1
serial: Dimm1_SerNum
slot: P1_DIMMA2
width: 64 bits
*-bank:2
description: DIMM DDR3 1333 MHz (0,8 ns)
product: 9965516-048.A
vendor: Kingston
physical id: 2
serial: EC309238
slot: P1_DIMMB1
size: 8GiB
width: 64 bits
clock: 1333MHz (0.8ns)
*-bank:3
description: DIMM Synchronous [empty]
product: Dimm4_PartNum
vendor: Dimm4_Manufacturer
physical id: 3
serial: Dimm4_SerNum
slot: P1_DIMMB2
width: 64 bits
*-bank:4
description: DIMM DDR3 1333 MHz (0,8 ns)
product: 9965516-048.A
vendor: Kingston
physical id: 4
serial: E9305438
slot: P1_DIMMC1
size: 8GiB
width: 64 bits
clock: 1333MHz (0.8ns)
*-bank:5
description: DIMM Synchronous [empty]
product: Dimm7_PartNum
vendor: Dimm7_Manufacturer
physical id: 5
serial: Dimm7_SerNum
slot: P1_DIMMC2
width: 64 bits
*-bank:6
description: DIMM DDR3 1333 MHz (0,8 ns)
product: 9965516-048.A
vendor: Kingston
physical id: 6
serial: E7305738
slot: P1_DIMMD1
size: 8GiB
width: 64 bits
clock: 1333MHz (0.8ns)
*-bank:7
description: DIMM Synchronous [empty]
product: Dimm10_PartNum
vendor: Dimm10_Manufacturer
physical id: 7
serial: Dimm10_SerNum
slot: P1_DIMMD2
width: 64 bits
*-memory:1
description: System Memory
physical id: 3f
slot: System board or motherboard
*-bank:0
description: DIMM DDR3 1333 MHz (0,8 ns)
product: 9965516-197.A
vendor: Kingston
physical id: 0
serial: B63A08C3
slot: P2_DIMME1
size: 8GiB
width: 64 bits
clock: 1333MHz (0.8ns)
*-bank:1
description: DIMM Synchronous [empty]
product: Dimm1_PartNum
vendor: Dimm1_Manufacturer
physical id: 1
serial: Dimm1_SerNum
slot: P2_DIMME2
width: 64 bits
*-bank:2
description: DIMM DDR3 1333 MHz (0,8 ns)
product: 9965516-048.A
vendor: Kingston
physical id: 2
serial: EA309638
slot: P2_DIMMF1
size: 8GiB
width: 64 bits
clock: 1333MHz (0.8ns)
*-bank:3
description: DIMM Synchronous [empty]
product: Dimm4_PartNum
vendor: Dimm4_Manufacturer
physical id: 3
serial: Dimm4_SerNum
slot: P2_DIMMF2
width: 64 bits
*-bank:4
description: DIMM DDR3 1333 MHz (0,8 ns)
product: 9965516-048.A
vendor: Kingston
physical id: 4
serial: E7305938
slot: P2_DIMMG1
size: 8GiB
width: 64 bits
clock: 1333MHz (0.8ns)
*-bank:5
description: DIMM Synchronous [empty]
product: Dimm7_PartNum
vendor: Dimm7_Manufacturer
physical id: 5
serial: Dimm7_SerNum
slot: P2_DIMMG2
width: 64 bits
*-bank:6
description: DIMM DDR3 1333 MHz (0,8 ns)
product: 9965516-048.A
vendor: Kingston
physical id: 6
serial: E7305B38
slot: P2_DIMMH1
size: 8GiB
width: 64 bits
clock: 1333MHz (0.8ns)
*-bank:7
description: DIMM Synchronous [empty]
product: Dimm10_PartNum
vendor: Dimm10_Manufacturer
physical id: 7
serial: Dimm10_SerNum
slot: P2_DIMMH2
width: 64 bits
*-memory:2 UNCLAIMED
physical id: 7
*-memory:3 UNCLAIMED
physical id: 9
Como puede observar, no hay un módulo de memoria en el banco # 5 que. Entonces mi pregunta es: ¿está de acuerdo con que este mensaje se trata de un fallo de memoria? Y si es así, ¿cómo puedo encontrar qué módulo se va a reemplazar?