Skip to Main Content

Infrastructure Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

fmadm faulty on Supermicro board X9dri, what component has failed

ErgiMar 3 2017 — edited Mar 6 2017

Hi guys,

We are tyring to read teh fmadm faulty logs to find what component has failed or is the whole motheboard that needs replacment, can someone help to find the correct piece, below are the logs

fmadm faulty

--------------- ------------------------------------  -------------- ---------

TIME            EVENT-ID                              MSG-ID         SEVERITY

--------------- ------------------------------------  -------------- ---------

Mar 03 04:57:47 aa09c1fb-0c8d-4678-954b-bdd866ca6ce0  SUNOS-8000-J0  Major   

Problem Status    : open

Diag Engine       : eft / 1.16

System

    Manufacturer  : unknown

    Name          : unknown

    Part_Number   : unknown

    Serial_Number : unknown

System Component

    Manufacturer  : Supermicro

    Name          : X9DRi-LN4+/X9DR3-LN4+

    Part_Number   : To be filled by O.E.M.

    Serial_Number : 0123456789

    Host_ID       : 00454a86

----------------------------------------

Suspect 1 of 2 :

   Problem class : fault.sunos.eft.unexpected_telemetry

   Certainty   : 50%

   FRU

     Status           : faulty

     FMRI             : "hc://:chassis-mfg=Supermicro:chassis-name=X9DRi-LN4+-X9DR3-LN4+:chassis-part=To-Be-Filled-By-O.E.M.:chassis-serial=0123456789/motherboard=0/chip=1"

     Manufacturer     : unknown

     Name             : unknown

     Part_Number      : unknown

     Revision         : unknown

     Serial_Number    : unknown

     Chassis

        Manufacturer  : Supermicro

        Name          : X9DRi-LN4+/X9DR3-LN4+

        Part_Number   : To Be Filled By O.E.M.

        Serial_Number : 0123456789

   Resource

     Status           : faulted but still in service

----------------------------------------

Suspect 2 of 2 :

   Problem class : defect.sunos.eft.unexpected_telemetry

   Certainty   : 50%

   Resource

     FMRI             : "hc://:chassis-mfg=Supermicro:chassis-name=X9DRi-LN4+-X9DR3-LN4+:chassis-part=To-Be-Filled-By-O.E.M.:chassis-serial=0123456789/motherboard=0/chip=1"

     Manufacturer     : unknown

     Name             : unknown

     Part_Number      : unknown

     Revision         : unknown

     Serial_Number    : unknown

     Chassis

        Manufacturer  : Supermicro

        Name          : X9DRi-LN4+-X9DR3-LN4+

        Part_Number   : To-Be-Filled-By-O.E.M.

        Serial_Number : 0123456789

     Status           : faulted but still in service

Description : The diagnosis engine encountered telemetry from the listed

              devices for which it was unable to perform a diagnosis - all

              hypotheses were disproved.

Response    : Error reports have been logged for examination.

Impact      : Automated diagnosis and response for these events will not occur.

Action      : Use 'fmadm faulty' to provide a more detailed view of this event.

              Use 'fmdump -epV -u aa09c1fb-0c8d-4678-954b-bdd866ca6ce0' to view

              the unexpected telemetry. Please refer to the associated

              reference document at http://support.oracle.com/msg/SUNOS-8000-J0

              for the latest service procedures and policies regarding this

              diagnosis.

root@SV02:~# fmdump -epV -u aa09c1fb-0c8d-4678-954b-bdd866ca6ce0

TIME                           CLASS

Mar 03 2017 04:57:47.848174195 ereport.cpu.intel.quickpath.home_agent_mc_ce

nvlist version: 0

        class = ereport.cpu.intel.quickpath.home_agent_mc_ce

        ena = 0xe0e6ccb2b8814001

        detector = hc:///motherboard=0/chip=1

        compound_errorname = MC_CH1_RD_ERR

        IA32_MCG_STATUS = 0x0

        machine_check_in_progress = 0

        bank_number = 0x7

        bank_msr_offset = 0x41c

        IA32_MCi_STATUS = 0xcc00008000010091

        overflow = 1

        error_uncorrected = 0

        error_enabled = 0

        processor_context_corrupt = 0

        error_code = 0x91

        model_specific_error_code = 0x1

        threshold_based_error_status = No tracking

        IA32_MCi_ADDR = 0xa5215bd0c0

        IA32_MCi_MISC = 0x152122086

        physaddr = 0xa5215bd0c0

        resource = (array of embedded nvlists)

        (start resource[0])

        nvlist version: 0

                version = 0x1

                scheme = hc

                hc-list = (array of embedded nvlists)

                (start hc-list[0])

                nvlist version: 0

                        hc-name = motherboard

                        hc-id = 0

                (end hc-list[0])

                (start hc-list[1])

                nvlist version: 0

                        hc-name = chip

                        hc-id = 1

                (end hc-list[1])

                (start hc-list[2])

                nvlist version: 0

                        hc-name = memory-controller

                        hc-id = 0

                (end hc-list[2])

                (start hc-list[3])

                nvlist version: 0

                        hc-name = dram-channel

                        hc-id = 0

                (end hc-list[3])

                hc-specific = (embedded nvlist)

                nvlist version: 0

                        offset = 0xffffffffffffffff

                (end hc-specific)

        (end resource[0])

        (start resource[1])

        nvlist version: 0

                version = 0x1

                scheme = hc

                hc-list = (array of embedded nvlists)

                (start hc-list[0])

                nvlist version: 0

                        hc-name = motherboard

                        hc-id = 0

                (end hc-list[0])

                (start hc-list[1])

                nvlist version: 0

                        hc-name = chip

                        hc-id = 1

                (end hc-list[1])

                (start hc-list[2])

                nvlist version: 0

                        hc-name = memory-controller

                        hc-id = 0

                (end hc-list[2])

                (start hc-list[3])

                nvlist version: 0

                        hc-name = dram-channel

                        hc-id = 1

                (end hc-list[3])

                hc-specific = (embedded nvlist)

                nvlist version: 0

                        offset = 0xffffffffffffffff

                (end hc-specific)

        (end resource[1])

        cap_support_recovery = 1

        signal_mce = 0

        attention_to_recovery = 0

        __ttl = 0x1

        __tod = 0x58b9684b 0x328e1c73

root@SV02:~# c

Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Apr 3 2017
Added on Mar 3 2017
1 comment
403 views