unix sysadmin archives
Donation will make us pay more time on the project:
          

Thursday 3 May 2012

How to Replace System Board for Sun Fire E6900 Systems

Applies to:
Sun Fire 4800 Server - Version: Not Applicable and later [Release: N/A and later ]
Sun Fire 4810 Server - Version: Not Applicable and later [Release: N/A and later]
Sun Fire 6800 Server - Version: Not Applicable and later [Release: N/A and later]
Sun Fire E4900 Server - Version: Not Applicable and later [Release: N/A and later]
Sun Fire 3800 Server - Version: Not Applicable and later [Release: N/A and later]
Information in this document applies to any platform.

H/W ON-SITE Action Plan #1. Parts: 540-6295
NOTE: THIS AUTOGENERATED CREATED USING https://actionplans.us.oracle.com/atr/.

***********************************************************
************ Start Hardware Onsite Action Plan ************


A. DISPATCH INSTRUCTIONS

A1. WHAT SKILLS DOES THE ENGINEER NEED (IS A SITE ENGINEER AVAILABLE?):

A2. PARTS REQUIRED: USE INFORMATION IN TASK UNLESS ENTERED BELOW
Part number: [F] 540-6295
Part location: SB4
Quantity: 1
Description: CPU/MEM W/ 4 US IV 1.35GHZ, 0GB (FRU)
Prior part DOA: No
Alternate parts: 540-6803
SPECIAL INSTRUCTIONS: Verify the new board's firmware matches that of the System Controller and other boards in the configuration.
See http://sunsolve.sun.com/search/document.do?assetkey=1-61-214805-1 for details.

A3. DELIVERY REQUIREMENT:
Preferred Onsite Time: Within Service SLA

A4. ONSITE VISIT DETAILS:
Account name:
Contact Name:
Contact Telephone #:
Email address:
Street Address:
City:
State:
Country:
Postal Code:
Alt. Contact name:
Alt. Contact email:
Alt. Contact phone:
Special instructions:

B. FIELD ENGINEER INSTRUCTIONS

NOTE : READ MANDATORY NOTES SECTION OF ACTION PLAN.
This Action Plan is not complete until all mandatory actions outlined below have been competed.

B1. PROBLEM OVERVIEW:
General problem: There is a component failure
Fault for part 540-6295: NA
*** Start System Error Message ***
cat-a:SC> showchs -c SB4 -v
Total # of records: 1
Component : /N0/SB4
Time Stamp : Sun Dec 04 19:26:26 EST 2011
New Status : Faulty
Old Status : OK
Event Code : HW
Initiator : ScApp
Message : 1.E6900.FAULT.ASIC.CHEETAH.AFSR_2_HI_ISAP.71191111.20-16.1

*** End System Error Message ***

B2. WHAT ACTIONS DOES THE ENGINEER NEED TO TAKE:
In this Document

Goal

Solution


Oracle Confidential (INTERNAL). Do not distribute to customers

Reason: FRU CAP

Applies to:
Sun Fire 4800 Server - Version: Not Applicable and later [Release: N/A and later ]
Sun Fire 4810 Server - Version: Not Applicable and later [Release: N/A and later]
Sun Fire 6800 Server - Version: Not Applicable and later [Release: N/A and later]
Sun Fire E4900 Server - Version: Not Applicable and later [Release: N/A and later]
Sun Fire 3800 Server - Version: Not Applicable and later [Release: N/A and later]
Information in this document applies to any platform.


Goal
How to Replace System Board for Sun Fire 3800, 4800, 4810, 6800, E4900, and E6900 Systems

******************************************************************************

To report errors or request improvements on this
procedure,

please go to http://support.us.oracle.com
and put a comment on Doc ID: 1306577.1

******************************************************************************

Solution
DISPATCH INSTRUCTIONS

WHAT SKILLS DOES ENGINEER NEED:

ScApp, lom

Task Complexity: 4

Time Estimate: 60 minutes

FIELD ENGINEER INSTRUCTIONS

CAP PROBLEM OVERVIEW:
System Board Failure

WHAT STATE SHOULD SYSTEM BE IN TO BE READY TO PERFORM RESOLUTION ACTIVITY?

Examples use a board location of '#'

1) See if DR can be used; If the board is listed in 'cfgadm -av | grep -i perm' output, you can't use DR.

2a) If able to DR, issue 'cfgadm -c disconnect N0.SB#'

2b) If unable, issue 'init 0' and then 'poweroff sb#' at SC prompt

WHAT ACTION DOES ENGINEER NEED TO TAKE:

You will need to move DIMMs from the 'old' board to the 'new' one (same slots).

1) Perform physical SB replacement per Service Manual

2) poweron SB# at SC prompt

3) showchs -b at SC prompt

4) Reset any 'Suspect' or 'Faulty' components to 'ok' from the Main SC or from the lom prompt:
setchs -s OK -r 'SR number' -c <comp>

NOTE: If ScApp 5.20.15 or higher, service mode access IS NO LONGER REQUIRED to execute setchs.
If < 5.20.15, contact service to obtain a Service Mode password or generate one yourself at https://modepass.us.oracle.com
(a backup server is also available from https://modepass-bak.us.oracle.com)
Repeat 'setchs' command until all components are 'ok'.

Verify 'showchs -b' is empty.

5) Verify new board firmware matches existing boards & SC(s) ('showboards -p proms' at SC prompt).

If needed, copy firmware from a like board 'flashupdate -c (source board) (destination board)'

6) Consider running extended POST (On domain issue 'eeprom diag-level=max' or at ok prompt 'setenv diag-level max').

7) If you're replacing a COD (Capacity on Demand) enabled board refer to
Sun Fire[TM] 12K/15K/E20K/E25K/F3800/Fx800/Ex900/ servers: How to replace a COD CPU/memory board (Doc ID <a href="<<INLINE_NOTE:
1002102.1
>>">
1002102.1
)
for the needed step to follow

8a) If able to DR, issue 'cfgadm -c configure N0.SB#' at domain level

8b) If unable to DR, issue 'setkeyswitch -d (domainID) off' followed by 'setkeyswitch -d (domainID) on'.

9) Monitor POST.
* If new errors are detected, collect POST and contact Support.

OBTAIN CUSTOMER ACCEPTANCE

WHAT ACTION DOES CUSTOMER NEED TO TAKE TO RETURN SYSTEM TO AN OPERATIONAL STATE:

Boot system if not already booting.

REFERENCE INFORMATION:

Replacement procedures are documented in the Service Manuals:

* Chapter 8, 3800/48x0/6800 Manual http://download.oracle.com/docs/cd/E19095-01/sf3800.srvr/805-7363-15/805-7363-15.pdf

* Chapter 8, E4900/E6900 Manual http://download.oracle.com/docs/cd/E19095-01/sfe4900.srvr/817-4120-13/817-4120-13.pdf


B3. SHOULD DYNAMIC RECONFIGURATION BE USED:

B4. IS OUTAGE REQUIRED AND AGREED TO BY THE CUSTOMER: Yes

B5. NOTICES THAT ENGINEER MUST TAKE INTO ACCOUNT:
ROHS NOTICE: This system has NOT been adequately identified during remote diagnosis for the purposes of RoHS. You must check the system's RoHS compliance by referring to information within FIN 102250, or by verifying with your support centre before commencing service.

B6. ADDITIONAL COMMENTS:

B7. WHAT TROUBLESHOOTING TESTS WERE DONE:


C. GENERAL ACTION PLAN INFORMATION
Action plan for case:
Action plan reference number: 1 (always reference with case id)
Affected product: SUN FIRE E6900
Platform version: N/A
(Please update MOS with correct serial number if necessary)


************* End Hardware Onsite Action Plan *************
***********************************************************


**************** GENERAL INSTRUCTIONS FOR THIS ACTION PLAN ***************
**************************************************************************
Make sure a new explorer [explorer -w all,interactive,scextended] is run and
submitted to proactive.central after installing the new parts. Email explorer to:
explorer-database-americas@sun.com - Americas
explorer-database-emea@sun.com - EMEA (Europe, Middle East, Africa)
explorer-database-apac@sun.com - APAC (Asia, Pacific)
**************************************************************************
***********************************************************************