Sunblade 1000 Rebooting by itself, ECC errors
807559Jun 29 2005 — edited Jun 29 2005Hi there,
I just got a Sunblade 1000 about two days ago, and coming from a long carreer on sgi platform, I was pleasantly surprised by the speed of the Sunblade.
I am however experiencing tons of memory problems. I was able to install and configure Solaris 10 yesterday, it ran fine for about 5-6 hours... Today, I turn the machine on, start installing some applications. Everything goes fine for a while, then all the sudden it's the big debacle: the machine reboots by itself, and /var/adm/messages shows:
Jun 29 00:33:04 Riddler EVENT-ID: 2b42ea22-2bef-cc31-ed26-acbb23282bc1
Jun 29 00:33:04 Riddler DESC: The number of errors associated with this memory module has exceeded acceptable levels. Refer to http://sun.com/msg/SUN4U-8000-35 for more information.
Jun 29 00:33:04 Riddler AUTO-RESPONSE: Pages of memory associated with this memory module are being removed from service as errors are reported.
Jun 29 00:33:04 Riddler IMPACT: Total system memory capacity will be reduced as pages are retired.
Jun 29 00:33:04 Riddler REC-ACTION: Schedule a repair procedure to replace the affected memory module. Use fmdump -v -u <EVENT_ID> to identify the module.
Running fmdump -v -u returns
Jun 29 01:00:07.4529 2b42ea22-2bef-cc31-ed26-acbb23282bc1 SUN4U-8000-35
95% fault.memory.bank
FRU: mem:///component=J0100,J0202,J0304,J0406
rsrc: mem:///component=J0100,J0202,J0304,J0406
This is with four DIMMS in bank 0... How do I interpret this exactly? Does this mean the 4 DIMS (or worse!) are faulty?