We have a SunMicro T6320 that rebooted a couple of times.
reciosys01# last reboot | more
reboot system boot Mon Mar 28 09:45
reboot system down Mon Mar 28 09:38
reboot system boot Mon Mar 28 08:44
reboot system down Mon Mar 28 08:37
The problem the we cannot find anything in the /var/adm/messages that indicates the cause of the reboot.
A day before, there is a replacement of an emulex card in this box. But there is no error message that links to this change.
Here is the logs from /var/adm/messages:
Mar 28 09:37:08 reciosys01 inetd[411]: [ID xxxxxx daemon.notice] uptmagnt[xxxxx] from xx.xx.xx.xx xxxxx
Mar 28 09:37:09 reciosys01 inetd[411]: [ID xxxxxx daemon.notice] uptmagnt[xxxxx] from xx.xx.xx.xx xxxxx
Mar 28 09:38:40 reciosys01 inetd[411]: [ID xxxxxx daemon.notice] bgssd[xxxxx] from xx.xx.xx.xx xxxxx
Mar 28 09:44:50 reciosys01 genunix: [ID xxxxxx kern.notice] ^MSunOS Release 5.10 Version Generic_142909-17 64-bit
Mar 28 09:44:50 reciosys01 genunix: [ID xxxxxx kern.notice] Copyright (c) 1983, 2010, Oracle and/or its affiliates. All rights reserved
.
Mar 28 09:44:50 reciosys01 genunix: [ID xxxxxx kern.info] Ethernet address = x:xx:xx:xx:xx:xx
Mar 28 09:44:50 reciosys01 unix: [ID xxxxxx kern.info] NOTICE: Kernel Cage is ENABLED
Mar 28 09:44:50 reciosys01 unix: [ID xxxxxx kern.info] mem = 66977792K (0xff8000000)
Mar 28 09:44:50 reciosys01 unix: [ID xxxxxx kern.info] avail mem = 66732310528
Somehow we managed to check the event logs from the SP thru the ILOM.
And we found this specific error "Host Power Failure: MB_DC_POK Fault".
I'm thinking that this is somewhat related to power supply. The voltage output might not be at its expected levels.
-> cd /SP/logs/event
/SP/logs/event
-> show list
/SP/logs/event/list
Targets:
Properties:
Commands:
cd
show
ID Date/Time Class Type Severity
----- ------------------------ -------- -------- --------
70701 Mon Mar 28 01:41:50 2011 Chassis Log major
Host is running
70700 Mon Mar 28 01:38:20 2011 Fault Repair minor
SP detected fault cleared at time Mon Mar 28 01:38:18 2011. Host Power: M
B_DC_POK is OK
70699 Mon Mar 28 01:37:14 2011 Chassis Log major
Host has been powered on
70698 Mon Mar 28 01:37:03 2011 Chassis Log critical
Host has been powered off
70697 Mon Mar 28 01:37:03 2011 Chassis Log major
Power cycling Host System. Please wait.
70696 Mon Mar 28 01:37:01 2011 Fault Fault critical
SP detected fault at time Mon Mar 28 01:37:01 2011. Host Power Failure: M
B_DC_POK Fault
Paused: press any key to continue, or 'q' to quit
We tried to search for related incidents in the web but there is no specific cases for T6320.
We found something for T6340, "False Power Failure Faults Might Be Reported (CR 6895793)" but it is during POST or SunVTS Memory Testing.
reciosys01# last reboot | more
reboot system boot Mon Mar 28 09:45
reboot system down Mon Mar 28 09:38
reboot system boot Mon Mar 28 08:44
reboot system down Mon Mar 28 08:37
The problem the we cannot find anything in the /var/adm/messages that indicates the cause of the reboot.
A day before, there is a replacement of an emulex card in this box. But there is no error message that links to this change.
Here is the logs from /var/adm/messages:
Mar 28 09:37:08 reciosys01 inetd[411]: [ID xxxxxx daemon.notice] uptmagnt[xxxxx] from xx.xx.xx.xx xxxxx
Mar 28 09:37:09 reciosys01 inetd[411]: [ID xxxxxx daemon.notice] uptmagnt[xxxxx] from xx.xx.xx.xx xxxxx
Mar 28 09:38:40 reciosys01 inetd[411]: [ID xxxxxx daemon.notice] bgssd[xxxxx] from xx.xx.xx.xx xxxxx
Mar 28 09:44:50 reciosys01 genunix: [ID xxxxxx kern.notice] ^MSunOS Release 5.10 Version Generic_142909-17 64-bit
Mar 28 09:44:50 reciosys01 genunix: [ID xxxxxx kern.notice] Copyright (c) 1983, 2010, Oracle and/or its affiliates. All rights reserved
.
Mar 28 09:44:50 reciosys01 genunix: [ID xxxxxx kern.info] Ethernet address = x:xx:xx:xx:xx:xx
Mar 28 09:44:50 reciosys01 unix: [ID xxxxxx kern.info] NOTICE: Kernel Cage is ENABLED
Mar 28 09:44:50 reciosys01 unix: [ID xxxxxx kern.info] mem = 66977792K (0xff8000000)
Mar 28 09:44:50 reciosys01 unix: [ID xxxxxx kern.info] avail mem = 66732310528
Somehow we managed to check the event logs from the SP thru the ILOM.
And we found this specific error "Host Power Failure: MB_DC_POK Fault".
I'm thinking that this is somewhat related to power supply. The voltage output might not be at its expected levels.
-> cd /SP/logs/event
/SP/logs/event
-> show list
/SP/logs/event/list
Targets:
Properties:
Commands:
cd
show
ID Date/Time Class Type Severity
----- ------------------------ -------- -------- --------
70701 Mon Mar 28 01:41:50 2011 Chassis Log major
Host is running
70700 Mon Mar 28 01:38:20 2011 Fault Repair minor
SP detected fault cleared at time Mon Mar 28 01:38:18 2011. Host Power: M
B_DC_POK is OK
70699 Mon Mar 28 01:37:14 2011 Chassis Log major
Host has been powered on
70698 Mon Mar 28 01:37:03 2011 Chassis Log critical
Host has been powered off
70697 Mon Mar 28 01:37:03 2011 Chassis Log major
Power cycling Host System. Please wait.
70696 Mon Mar 28 01:37:01 2011 Fault Fault critical
SP detected fault at time Mon Mar 28 01:37:01 2011. Host Power Failure: M
B_DC_POK Fault
Paused: press any key to continue, or 'q' to quit
We tried to search for related incidents in the web but there is no specific cases for T6320.
We found something for T6340, "False Power Failure Faults Might Be Reported (CR 6895793)" but it is during POST or SunVTS Memory Testing.
This is not quite related.
Since there is a recent change on this box, it's a good idea to ask our vendor about this. Somehow it might be related. We update the service request for the emulex replacement with this problem inquiry.
Hopefully on our next update, we will have a better picture of this problem.
Hopefully on our next update, we will have a better picture of this problem.
No re-occurrence happened.
ReplyDeleteOracle Support is asking for the output of showfaults:
sc> showfaults -v
Last POST Run:Mon Mar 28 01:55:48 2011
Post Status: Passed all devices
No failures found in System
sc>