Vous êtes sur la page 1sur 338

Sun Enterprise Server Maintenance

IT-ETC-033

Sun Microsystems LTD Citygate Cross Street Sale Manchester M33 7JF UK

Revision E June 2001, Brian Jackson

Copyright 2001 Sun Microsystems, Inc., 901 San Antonio Road, Palo Alto, California 94303, U.S.A. All rights reserved. This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any. Third-party software, including font technology, is copyrighted and licensed from Sun suppliers. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd. Sun, Sun Microsystems, the Sun Logo, SunVTS, OpenBoot, Sun Enterprise, UltraSPARC, Solstice SyMON, Gigaplane, SPARCstorage, RSM, RSM Array, SunFastEthernet, SunFDDI, StorEdge, SunDiag, SunPCI, SunBus, AnswerBook, and OBDiag are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. The OPEN LOOK and Sun Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Suns licensees who implement OPEN LOOK GUIs and otherwise comply with Suns written license agreements. U.S. Government approval required when exporting the product. RESTRICTED RIGHTS: Use, duplication, or disclosure by the U.S. Govt is subject to restrictions of FAR 52.227-14(g) (2)(6/87) and FAR 52.227-19(6/87), or DFAR 252.227-7015 (b)(6/95) and DFAR 227.7202-3(a). DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS, AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.

Please Recycle

Contents
Introduction to Sun Enterprise Servers .................................................1-1 Additional Resources ....................................................................... 1-2 Enterprise Introduction ................................................................... 1-4 Ex000 servers versus Ex500 servers................................................ 1-5 Server Specifications......................................................................... 1-6 Sun Enterprise 3000 ..................................................................1-6 Sun Enterprise 3500 .................................................................1-7 Sun Enterprise 4500 ..................................................................1-8 Sun Enterprise 5500 ..................................................................1-9 Sun Enterprise 6500 ................................................................1-10 Reliability, Availability, and Serviceability Features ................ 1-11 Reliability ......................................................................................... 1-12 Availability....................................................................................... 1-13 Serviceability.................................................................................... 1-14 Scalability ......................................................................................... 1-15 Concurrent Maintenance Tools..................................................... 1-16 Dynamic Reconfiguration......................................................1-16 Alternate Pathing ....................................................................1-16 Monitoring and Administration .................................................. 1-17 Solstice SyMON.......................................................................1-17 Hardware Component Overview............................................................2-1 The Sun Enterprise 3000 Server ...................................................... 2-2 Specifications .............................................................................2-3 The Sun Enterprise 3500 Server ...................................................... 2-4 Specifications .............................................................................2-6 The Sun Enterprise 4000/4500 Server............................................ 2-7 Specifications .............................................................................2-8 The Sun Enterprise 5500 Server ...................................................... 2-9 Specifications ...........................................................................2-10 The Sun Enterprise 6500 Server .................................................... 2-11 Specifications ....................................................................................2-12 Gigaplane Architecture ................................................................. 2-13 Centerplane Configuration....................................................2-15
iii
Copyright 1999 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

Centerplane Numbering Scheme..........................................2-16 PCM/Slot layout ............................................................................. 2-17 Performance ..................................................................................... 2-18 Hot Swap and Hot Plug ................................................................. 2-19 Power Supplies................................................................................ 2-20 Power/Cooling Module (PCM) ............................................2-20 Peripheral Power Supply (PPS) ............................................2-21 Hot Pluggable Boards .................................................................... 2-25 Hot Plug Architecture ............................................................2-25 Sun Enterprise Deskside Chassis Designs................................... 2-21 Common and unique components ............................................... 2-27 Exercise: Component Removal and Replacement...................... 2-28 Bus Structures and Types .........................................................................3-1 UPA Bus Architecture ..................................................................... 3-2 The CPU/memory Board and the UPA Bus.........................3-2 Ultra Port Architecture Features.............................................3-3 SBus Architecture ............................................................................. 3-4 SBus Features.............................................................................3-4 PCI Architecture ............................................................................... 3-5 PCI Mechanical Specifications ................................................3-5 PCI Electrical Specifications ....................................................3-5 PCI Board connectors ...............................................................3-6 SCSI Introduction ............................................................................. 3-7 Small Computer System Interface Features ..........................3-9 Fast SCSI Higher Bus Speed .................................................3-9 Wide SCSI Wider Is Better ..................................................3-10 Differential SCSI Less Interference..................................3-10 Ultra2 SCSI ...............................................................................3-10 Termination..............................................................................3-14 Cable quality............................................................................3-14 Conclusion ...............................................................................3-14 SCSI implementation on I/O boards ........................................... 3-12 Fibre Channel Interface ................................................................. 3-13 CPU/Memory and Clock Boards .............................................................4-1 CPU/Memory+ Board ......................................................................4-2 CPU Module ..............................................................................4-5 400 MHz, 8MB Ecache Module ...............................................4-6 CPU Module Handling Precautions ......................................4-8 Removing and Replacing a CPU Module............................4-10 Memory ............................................................................................ 4-11 Board Status Indicators .................................................................. 4-12 Clock+ Board ................................................................................... 4-14 Console Bus..............................................................................4-17 Clocks........................................................................................4-17 Reset logic ................................................................................4-18
iv Sun Enterprise Server Maintenance
Copyright 1999 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

TOD/NVRAM.........................................................................4-18 Serial,keyboard,mouse ports.................................................4-18 JTAG..........................................................................................4-18 Remote console commands ...................................................4-19 XIR.............................................................................................4-20 LED Status codes.....................................................................4-21 Passive Boards ................................................................................. 4-22 Filler Panel ...............................................................................4-22 Load Board...............................................................................4-23 I/O Boards....................................................................................................5-1 Types of I/O Boards: ........................................................................ 5-2 I/O Addressing .........................................................................5-2 SBus I/O Boards:............................................................................... 5-4 SBus I/O Boards Type 1.........................................................5-5 SBus + I/O Board Type 4......................................................5-6 SBus I/O Boards Type 1.........................................................5-7 SBus + I/O Boards Type 4 .....................................................5-8 Graphics I/O Boards: ....................................................................... 5-9 Graphics I/O Board Type 2 ................................................5-10 Graphics+ I/O Board Type 5..............................................5-11 Graphics I/O Board Type 2 ................................................5-12 Graphics+ I/O Board Type 5..............................................5-13 PCI I/O Boards:............................................................................... 5-14 PCI+ I/O Board Type 3 .......................................................5-14 Board Status Indicators .................................................................. 5-18 Enterprise 3500 Fibre Channel Interface Board ..................5-20 SCSI Disk Board .....................................................................5-21 SCSI Disk Board Addressing.................................................5-21 Open Boot PROM / NVRAM...................................................................6-1 Introducing OBP ............................................................................... 6-2 Features of OBP ................................................................................ 6-4 The OBP User Interface .................................................................... 6-7 System Testing Commands ............................................................ 6-8 Informational Commands ............................................................. 6-10 The Device Tree............................................................................... 6-11 Displaying the Device Tree ........................................................... 6-13 Using the .properties Command.......................................6-14 Using the dev Command .......................................................6-14 Listing System Devices................................................................... 6-15 Displaying Device Aliases ............................................................. 6-18 Device Alias Commands........................................................6-19 nvalias Command ...................................................................6-20 Open Boot PROM Commands for the NVRAM......................... 6-21 The printenv Command ......................................................6-22 General NVRAM parameters........................................................ 6-25
v
Copyright 1999 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

Platform specific NVRAM parameters ........................................ 6-27 Environmental monitoring .............................................................6-30 NVRAM security..............................................................................6-31 NVRAMRC editing commands .....................................................6-32 Updating Flash PROM and FCode................................................6-34 Correcting a Faulty Flash PROM ................................................. 6-41 Synchronizing NVRAM/TOD chips............................................ 6-43 Power on self test (POST).........................................................................7-1 Introducing POST ............................................................................ 7-2 Self test overview .............................................................................. 7-6 POST control commands ............................................................... 7-18 s-flag..........................................................................................7-18 v-flag .........................................................................................7-18 POST Menus .................................................................................... 7-20 option 7 .....................................................................................7-21 POST Board status messages......................................................... 7-23 Sample error messages................................................................... 7-24 POST error reporting...................................................................... 7-25 show-post-results ....................................................................7-26 When things go wrong................................................................... 7-29 Accessing and Displaying POST .................................................. 7-30 tip session .................................................................................7-30 Internal Disk Subsystems ........................................................................8-1 Internal Storage Capacities .............................................................. 8-2 The SCSI Disk Board.................................................................8-3 The SCSI Disk Board Addressing ...........................................8-3 Disk Addressing ............................................................................... 8-5 Examples ....................................................................................8-5 Sun Enterprise 3500 ...........................................................................8-6 Fibre Channel Interface Board ................................................8-6 Disk Addressing ............................................................................... 8-9 probe-fcal-all ............................................................................8-10 world-wide numbers ..............................................................8-10 E3500 boot disk replacement......................................................... 8-12 E3500 data disk replacement ......................................................... 8-13 Sun Enterprise 3000 .........................................................................8-14 I/O Addressing test.........................................................................8-15

vi

Sun Enterprise Server Maintenance


Copyright 1999 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

Solaris Support Utilities ...........................................................................9-1 How Solaris References System Components .............................. 9-2 Logical Device Names ..............................................................9-2 Physical Device Names ............................................................9-4 Instance Names .........................................................................9-5 Configuring Components in Solaris............................................... 9-6 Automatic Device Configuration............................................9-6 Displaying System Configuration Information ............................ 9-9 The prtconf Utility ..................................................................9-9 The sysdef Utility .................................................................9-11 The format Utility ..................................................................9-15 Displaying Diagnostic Information.............................................. 9-16 The dmesg Command .............................................................9-16 The prtdiag Command.........................................................9-18 Setting NVRAM Configuration Parameters From Solaris ........ 9-21 The eeprom Command...........................................................9-21 SunVTS System Diagnostics .................................................................10-1 Introduction ..................................................................................... 10-2 SunVTS Software Overview..................................................10-2 Test categories ................................................................................. 10-3 Hardware and software requirements......................................... 10-4 Starting the SunVTS Software ....................................................... 10-5 The SunVTS Graphical Interface................................................... 10-6 The SunVTS Window Panels......................................................... 10-7 The SunVTS Window Icons........................................................... 10-8 The SunVTS Menu Selections........................................................ 10-9 The Schedule Options Menu ....................................................... 10-11 The Test Execution Menu ............................................................ 10-12 The Advance Options Menu ....................................................... 10-14 Intervention Mode ........................................................................ 10-15 Performance Monitor Panel......................................................... 10-16 Using SunVTS in TTY Mode ....................................................... 10-18 Negotiating the SunVTS TTY Interface ..................................... 10-19 Running SunVTS Remotely......................................................... 10-20 Requirements.........................................................................10-20 Running SunVTS Through a Remote Login .....................10-20 Running SunVTS Through telnet or tip........................10-22 SunVTS Test Summary ................................................................ 10-24 Advanced Frame Buffer Test...............................................10-24 SunATM Adapter Test .........................................................10-24 Audio Test..............................................................................10-25 Bidirectional Parallel Port Printer Test ..............................10-25 Compact Disc Test ................................................................10-25 Frame Buffer, GX, GX+ and TGX Options Test................10-26 Disk and Floppy Drives Test...............................................10-26

vii
Copyright 1999 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

ECP 1284 Parallel Port Printer Test ....................................10-27 Sun Enterprise Network Array Test...................................10-27 StorEdge 1000 Enclosure Test .............................................10-28 Frame Buffer Test..................................................................10-28 Fast Frame Buffer Test..........................................................10-28 SunVTS Test Summary ........................................................10-29 Floating Point Unit Test .......................................................10-29 Sun GigabitEthernet Test .....................................................10-29 Intelligent Fibre Channel Processor Test ...........................10-29 Dual Basic Rate ISDN (DBRI) Chip ....................................10-30 M64 Video Board Test ..........................................................10-30 Multiprocessor Test ..............................................................10-30 Network Hardware Test ......................................................10-31 SPARCstorage Array Controller Test ................................10-31 Physical Memory Test ..........................................................10-32 Prestoserve Test.....................................................................10-32 Serial Asynchronous Interface Test....................................10-33 Sun Enterprise Cluster 2.0 Network Hardware Test .......10-33 Environmental Sensing Card Test ......................................10-34 Soc+ Host Adapter Card Test..............................................10-34 Serial Parallel Controller Test..............................................10-35 Serial Ports Test .....................................................................10-35 SunButtons Test.....................................................................10-35 SunDials Test .........................................................................10-36 HSI Board Test.......................................................................10-36 Sun PCi Test...........................................................................10-36 System Test ............................................................................10-37 Tape Drive Test .....................................................................10-37 Virtual Memory Test ............................................................10-37 Test Message Syntax..................................................................... 10-38 Alternate Pathing ......................................................................................A-1 Introducing Alternate Pathing ....................................................... A-2 Supported Devices ........................................................................... A-3 Disk Devices .............................................................................A-3 Network Devices......................................................................A-3 Installing AP ..................................................................................... A-4 How AP Works ................................................................................ A-5 Physical paths ................................................................................... A-6 Metadisk ............................................................................................ A-7 Disk Pathgroup ................................................................................ A-8 Metanetwork..................................................................................... A-9 AP With Mirroring......................................................................... A-11 AP and DR ...................................................................................... A-12 AP State Database .......................................................................... A-13 Creating the AP State Database ................................................... A-14

viii

Sun Enterprise Server Maintenance


Copyright 1999 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

The apinst Utility .................................................................A-16 Creating a disk pathgroup and metadisks ................................. A-18 Using the metadisks ...................................................................... A-20 Placing your boot disk under AP control................................... A-21 Manually switching the active path ............................................ A-22 Automatic disk pathgroup switching (AP2.1)........................... A-23 Creating a network pathgroup .................................................... A-24 Alternately pathing the primary network interface.................. A-25 Switching a network pathgroup .................................................. A-27 Dynamic Reconfiguration ....................................................................... B-1 Introducing Dynamic Reconfiguration.......................................... B-2 What Is Dynamic Reconfiguration? ....................................... B-2 Benefits of DR ........................................................................... B-2 Disadvantages of DR ............................................................... B-3 Supported Hardware............................................................... B-3 DR Limitations ......................................................................... B-4 Displaying Board Status.................................................................. B-5 Basic Status Display................................................................. B-5 Detailed Status Display ........................................................... B-8 Reconfiguration Considerations .................................................... B-9 Device driver interface DDI.................................................... B-9 Suspend-Safe and Suspend-Unsafe Devices ........................ B-9 Hot-Plug Hardware ............................................................... B-10 Permanent memory management ....................................... B-11 Required additions to /etc/system..................................... B-11 Dynamic Reconfiguration Procedures........................................ B-12 Removing a CPU/Memory Board....................................... B-12 Installing or Replacing a CPU/Memory Board................. B-14 Removing an I/O Board ....................................................... B-18 Removing Boards that Use Detach-Unsafe Drivers.......... B-20 Installing a New I/O Board.................................................. B-21 Installing a Replacement I/O Board ................................... B-23

ix
Copyright 1999 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

Sun Enterprise Server Maintenance


Copyright 1999 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

Introduction to Sun Enterprise Servers

1-11
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

1
Additional Resources
q q q

http://docs.sun.com Server Rack Installation Manual, Part Number 802-7573 Sun Enterprise 6500/5500/4500 Systems Installation Guide, Part Number 805-2631 SPARC Hardware Platform Guide, Part Number 802-5341 Solstice SyMON User's Guide, Part Number 802-5355 Sun Enterprise 6x00, 5x00, 4x00, and 3x00 Systems Dynamic Reconguration Users Guide, Part Number 806-0280-05. Sun Enterprise Expansion Cabinet Installation and Service Manual, Part Number 805-4009 Sun Enterprise 6/5/4/3x00 Systems SIMM Installation Guide, Part Number 802-5032 SBus+ and Graphics+ I/O Boards (100 MB/sec. Fibre Channels) for Sun Enterprise 6/5/4/3x00 Systems, Part Number 805-2704 PCI+ I/O Board Installation and Component Replacement for Sun Enterprise 6/5/4/3x00 Systems, Part Number 805-1372 Sun Enterprise 3500 System Reference Manual, Part Number 805-2630 Sun Enterprise 6500/5500/4500 System Reference Manual, Part Number 805-2632 Sun Enterprise Server Alternate Pathing User's Guide, Part Number 805-5444 Sun Enterprise 6x00/5x00/4x00 Disk Board Installation Guide, Part Number 802-6740 Sun Enterprise Systems Peripheral Power Supply Installation Guide, Part Number 802-5033 Sun Enterprise Systems Power/Cooling Module Installation Guide, Part Number 802-6244 Sun Enterprise 6/5/4/3x00 Systems Board Installation Guide, Part Number 805-4007

q q q

q q

1-12

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

Introduction to Sun Enterprise Servers


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

1-13

1
Enterprise Introduction
This course introduces you to some new concepts and some new hardware. It is intended to give you an adequate understanding of the enterprise computing environment and how Sun servers, software, and applications t into that enterprise. After you have been introduced to the systems and understand their capabilities you will be provided with an opportunity to take the systems apart, and put them back together. A main goal for this course is to help you understand the enterprise computing environment better so that you can develop the appropriate concurrent maintenance strategy. Troubleshooting a system in the enterprise computing environment is quite different than a desktop. You must understand the function that the system you are working on has in a companys enterprise computing environment and how critical it is that the system continue to operate while you troubleshoot and repair it. No longer can a company afford to shut down a missioncritical element in their enterprise operation while you perform maintenance on that system. Sun Microsystems has developed several products that can assist you in performing your tasks with a minimal effect on the customers enterprise computing environment. This course introduces you to those products and tools and shows you how to be procient with them so you can safely work on Sun Enterprise servers.

1-14

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

1
Ex000 Servers versus. Ex500 Servers
The original Enterprise servers, the E3000, E4000, E5000 and E6000 have been upgraded; a process the marketing people called a mid-life enhancement. The enhanced servers are called the E3500,E4500, E5500 and E6500. Note The key difference is that the Ex000 servers run there interconnect at 83MHz. The E3500, E4500 and E5500 run their interconnect at 100MHz using enhanced system boards and centreplane. The E6500 is constrained to run at a maximum interconnect speed of 90MHz.

E6000 v E6500 The E6000 is housed in a 56 cabinet whilst the E6500 is housed in a 68 cabinet. This makes room for an additional A5000 or D1000.

E5000 v E5500 The E5000 is housed in a 56 cabinet whilst the E5500 is housed in a 68 cabinet. This again makes room for an additional A5000 or D1000.

E4000 v E4500 No major difference, apart from faster interconnect.

E3000 v E3500 Very different. The E3500 has been totally re-designed. There are too many to outline briey here, but we shall cover them all.

Introduction to Sun Enterprise Servers


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

1-15

1
Server Specications - E3000

Figure 1-1

The Sun Enterprise 3000 Cabinet

Main system features and options:


q q q q q q q

Deskside chassis Enterprise 3000 is a four-slot model One CPU/memory+ and one I/O+ board minimum 1 to 6 UltraSPARC CPU modules 64 Mbytes to 12 Gbytes of RAM Up to ten internal SCSI disk drives Ultra SCSI-2 CD-ROM32 and 4mm- or 8mm-tape drive

1-16

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

1
Server Specications - E3500

Figure 1-2

The Sun Enterprise 3500 Cabinet

Main system features and options:


q q q q q q q q

Deskside chassis Five-slot system (Enterprise 3000 is a four-slot model) One CPU/memory+ and one I/O+ board minimum 1 to 8 UltraSPARC CPU modules 64 Mbytes to 16 Gbytes of RAM Up to eight internal FC-AL disk drives Ultra SCSI-2 CD-ROM32 and 4mm- or 8mm-tape drive Over 6 Tbytes of external storage

Introduction to Sun Enterprise Servers


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

1-17

1
Server Specications - E4500

Figure 1-3

The Sun Enterprise 4500 Cabinet

Main system features and options:


q q q q q q

Desktop chassis Eight-slot system, four in front and four in back One CPU/Memory+ and one I/O+ board minimum 1 to 14 UltraSPARC CPU modules 64 Mbytes to 28 Gbytes of RAM Up to 33.6 Gbytes of internal storage mounted on four disk boards. Ultra SCSI-2 CD-ROM32 and 4mm- or 8mm-tape drive Over 10 Tbytes of external storage

q q

1-18

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

1
Server Specications - E5500

Figure 1-4

The Sun Enterprise 5500 Cabinet

Main system features and options:


q q q q q

Datacentre cabinet An E4500, without cosmetic panels, mounted in a cabinet 1 to 14 UltraSPARC CPU modules 64 Mbytes to 28 Gbytes of RAM Up to 720 Gbytes of internal storage, comprising four disk boards and A5000 or D1000 disk trays. Ultra SCSI-2 CD-ROM32 and 4mm- or 8mm- tape drive Over 10 Tbytes of external storage

q q

Introduction to Sun Enterprise Servers


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

1-19

1
Server Specications - E6500

Figure 1-5

The Sun Enterprise 6500 Cabinet

Main system features and options:


q q q q q q

Datacentre cabinet sixteen-slot system, eight in front and eight in back Minimum conguration; one CPU/memory+ and one I/O+ board 1 to 30 UltraSPARC CPU modules 64 Mbytes to 60 Gbytes of RAM Up to 576 Gbytes of internal storage, comprising two disk boards and A5000 or D1000 disk trays. Ultra SCSI-2 CD-ROM32 and 4mm- or 8mm- tape drive Over 20 Tbytes of external storage

q q

1-20

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

1
Reliability, Availability, and Serviceability Features
RAS is a set of enterprise computing technologies that furnish a high degree of protection for corporate data (reliability), provide near continuous data access (availability), and incorporate procedures to correct problems with minimal business impact (serviceability). These capabilities, commonly known as RAS, are a standard part of traditional monolithic, centralized processing systems. Many businesses today are moving to network computing where the exible, scalable architecture enables them to easily expand IT systems as their needs grow while maintaining a reliable, stable computing environment. Sun Microsystems has become a trusted vendor of safe, innovative network computing solutions by delivering mainframeclass RAS features and capabilities in their commercial computing solutions. New features that improve data integrity, system reliability, and availability include a simpler system design, improved environmental and hardware monitoring tools, redundant power and cooling, and hot plug design for some components. Hot plug means that these system components can be replaced or added while the server is up and running. Serviceability features include requiring only one tool for disassembly and re-assembly (a Phillips screwdriver), identical components across the Sun Enterprise server family, and improved diagnostics utilities. The RAS feature set focus is to warn the operator about problems, and act on their effects. There are new sensors in the hardware, which are monitored by the software for just about everything. For example, it monitors the temperature not only of each board, but of each central processing unit (CPU) module, and the state of each fan. There are unique monitoring tools, such as Sun Management Centre, which can display the state of the machine to the board level, and works on a predictive failure model. For example, it provides the system administrator with warnings indicating what the likely effects of a detected problem are to the system.

Introduction to Sun Enterprise Servers


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

1-21

1
Reliability
Sun Enterprise Ex500 systems have many features that improve their reliability, which is dened as their ability to run continuously and correctly. These features demonstrate continuous improvement and Suns commitment to quality systems. The goal is to minimize the burden on system operators and system administrators.

ECC and Parity Protection


q q q

End-to-end error checking and correction (ECC) protection of data Address and control lines are parity protected Improved hardware monitors (time-outs and parity)

Enhanced Environmental Monitoring


q

Advanced monitoring tools for power supplies, fans, CPU/memory, input/output (I/O) boards, disks, and system temperatures. So, if a CPU modules overheats it will be taken off-line by the system.

1-22

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

1
Availability
The following describes some of the availability features of the Sun Enterprise servers:

System Monitoring
System monitoring enhancements improve reliability by directing error messages to other applications that can dynamically alter the systems conguration without stopping or rebooting the system. New capabilities of power on self test (POST) analyze parts and report failures to the automatic reconguration software.

Automatic System Reconguration (ASR)


Uses the POST output to identify and remove failed components from the systems conguration before rebooting the system. Hot pluggable power supplies and disk drives that have failed can be replaced without any system downtime or reboot, which increases the systems availability.

Redundant Components
This feature provides for an immediate replacement of a failed component. A redundant power supply provides the current necessary for the system to continue to operate if another power supply fails. Large systems have multiple power supplies, each capable of providing power for a specic number of boards (not specic boards or slots). Should two or more power supplies fail, the systems ASR software would recongure for fewer boards, reducing the power requirements to that of the available power supplies, and continue to operate in a reduced capacity until the failed power supply is replaced.

Introduction to Sun Enterprise Servers


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

1-23

1
Serviceability
The following describes some of the serviceability features of the Sun Enterprise servers:

Hot Plug and hot swap components


Does away with the need for downtime.

Dynamic Reconguration
Eliminates the need for a reboot to logically attach a new or replacement board.

Improved Diagnostics
Identify a system component failure more accurately. The tests that run on system components at power on illuminate status light emitting diodes (LEDs).

1-24

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

1
Scalability
The modular design allows customers to expand and enhance the system as they require. Because Sun has leveraged the same technology across the entire line of servers, from small (2-4 CPU) work group servers to large (up to 30 CPU) enterprise servers, upgrade costs can be kept to a minimum and customers can protect their investments. The following lists the hardware components that are the same in workgroup servers and enterprise servers:
q q q q

CPU/Memory(+), and I/O(+) boards Clock boards Power and cooling modules Peripheral power supplies 184 and 195-watt models

Introduction to Sun Enterprise Servers


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

1-25

1
Concurrent Maintenance Tools
Dynamic Reconguration
Dynamic Reconguration (DR) is the ability to alter the conguration of a running system by bringing components online or taking them off-line without disrupting system operation or requiring a system reboot. With DR, system boards can be logically and physically included in the system conguration, or logically and physically removed while the system is running. This is useful in mission-critical environments if a system board has failed and needs to be replaced or if new system boards need to be added to the system for additional performance and capacity. DR is a critical part of the concurrent maintenance strategy prevalent in the enterprise computing environment.

Alternate Pathing
Alternate Pathing (AP) creates a new layer of device drivers called meta-disks and meta-networks, which route access to one of two physical device drivers. Applications and the operating system components, including the disk management software, use the metadevice name to access the resource. Only the drivers know the actual physical paths. The active path can be manually switched from the primary to the alternate, at any time, with no interruption to data trafc. With AP software operating and congured, automatic switch-over to the alternate path occurs if the primary path fails. A manual AP switch back to the primary path is required after service has been completed. Meta-device denitions are stored in an AP state database that is used early in the boot process. There are usually several copies of this database. You must create the meta-devices yourself; the system does not automatically create these for you.

1-26

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

1
Monitoring and Administration
Sun Management Centre
Sun Management Centre, formally known as SyMON, is a comprehensive system monitoring tool for the Sun Enterprise servers. Its graphical user interface (GUI) and intuitive design make it easy to learn and use. Sun Management Centre is a powerful system management solution that dramatically increases RAS by allowing system administrators to monitor and quickly manage large enterprise system congurations. Sun Management Centre address the following system management functions:
q q q

Manages thousands of systems Supports heterogeneous GUI (Java technology-based) Supports full Simple Network Management Protocol (SNMP) connectivity Supports active conguration management controls (supports DR) Supports historical data storage Supports system management capabilities

q q q

Introduction to Sun Enterprise Servers


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

1-27

1-28

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

Hardware Component Overview

2-29
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2
The Sun Enterprise 3000 Server
The Enterprise 3000 is a deskside tower enclosure. All the boards plug into the rear of the E3000. The clock board is located in the lower right, next to board slot 1. The clock board has its own slot and does not use one of the four slots for the CPU/memory or I/O boards. There are four slots in the bottom portion of the cabinet for CPU/memory boards and I/O boards. The slots are numbered 1, 3, 5, and 7, from right to left. A fully loaded E3000 will require 2 power/cooling modules (PCMs), the rst located above slots 1 and 3, the second located above slots 5 and 7. A third PCM can be used for redundant power in a fully loaded system. If a third PCM is not used, a fan tray must be installed above the peripheral power supplies to provide cooling. The peripheral power supply is located in the lower left of the cabinet rear. A spot for a redundant peripheral power supply is located to the right of the rst peripheral power supply.

Internal Disk Drives


The E3000 holds up to ten internal hot-plug disk drives. The disks are all driven from the I/O board in slot 1. Disk addressing is covered in chapter 8.
Three 300-watt PCMs

7 5 3 Peripheral Power Supply #1

Clock board

Four board slots Peripheral Power Supply #2 (optional)

2-30

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2
The Sun Enterprise 3000 Server
Specications
Table 2-1 Features Number of Gigaplane slots Number of processors CPU interface Memory System Interconnect Three different power supply systems Sun Enterprise 3000 Server Specications and Features System/Board Conguration Four slots. Minimum conguration requires one I/O and one system board One to six Superscalar SPARC Version 9, UltraSPARC microprocessor modules One to six 128-bit Ultra Port Architecture (UPA) slots 256 Mbytes to 12 Gbytes Gigaplane, 2.68 GB/sec at 83 MHz Up to three power and cooling modules (PCM) (power supply + fan module) for system and I/O boards. A peripheral power supply (PPS1) for auxiliary power and a peripheral power supply/AC (PPS0) Up to ten 3.5 inch hot-pluggable, SCSI disk drives 8 mm, 4 mm, and .25 inches SunCD12 drive standard 65 cm (25.5 inches) 43 cm (17.0 inches) 60 cm (23.5 inches)

Internal disk Internal tape CD-ROM Height Width Depth

Hardware Component Overview


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2-31

2
The Sun Enterprise 3500 Server
The Sun Enterprise 3500 is vastly different to the E3000. There are ve slots in the bottom portion of the cabinet for CPU/memory boards and I/O boards. The slots are numbered 1, 3, 5, 7, and 9 from right to left. The Sun Enterprise 3500 server comes with at least one power/cooling module located above slots 1 and 3. If a second power/cooling module is required, it would t above slots 5 and 7, to the left of the rst power/cooling module. A fan tray above the peripheral power supply is also included in an entry conguration. A third power/cooling module can be used for redundant power in a system with three or more boards. To install the third power/cooling module, the existing fan tray, located to the left of the second power/cooling module, must be removed. The third power/cooling module ts into this slot. In addition to three power/cooling modules, a second peripheral power supply is required for full N+1 power supply redundancy in a ve-board Sun Enterprise 3500 server conguration. The rst peripheral power supply is located in the lower left of the cabinet rear. A spot for the second, optional peripheral power supply is located in the lower right of the Sun Enterprise 3500 cabinet front. This second peripheral power supply is located in the rear of the Sun Enterprise 3000 system cabinet. It was redesigned and moved to the front of the Sun Enterprise 3500 system cabinet in order to provide space for the additional system slot. The second peripheral power supply on the Sun Enterprise 3500 server is now 195 watts, instead of the 184 watts peripheral power supply used on the Sun Enterprise 3000 server.

2-32

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2
Internal FC-AL Drives
The Sun Enterprise 3500 server has two internal disk banks (four disks per disk bank), which support up to eight 9.1-GB FC-AL disks with optional dualport connections. The number of internal disks supported in the Sun Enterprise 3500 server was reduced in order to provide room for the additional system slot in the rear of the server. The inclusion of the fth system slot in the back of the cabinet required that the optional second peripheral power supply be redesigned and moved to the front of the cabinet, resulting in less space in the front of the cabinet for disk drives. The newer drives, however, can be congured to provide better disk availability than that offered by the Sun Enterprise 3000 server. Each of the two disk banks can have one or two FC-AL loops connected to the installed drives for a total of up to four loops. Dual-loop congurations provide a highly-available, redundant hardware conguration. Because the two banks are independent, a full conguration of eight disk drives requires a minimum of two loops: one for each bank of four drives. On the other hand, a minimum conguration requires only one FC-AL connection for up to four disk drives. The new FC-AL drives in the Sun Enterprise 3500 server still provide the hotswap capability offered with the internal SCSI drives on the Sun Enterprise 3000 server. Disk addressing is covered in chapter 8.
Three 300-watt power/cooling modules

32X CD-ROM

Tape drive

Key switch Internal FC-AL disks

Second peripheral power supply Peripheral Power Supply with AC inlet FC-AL Interface Board 9 7 5 3 1 Clock board

Fan tray

Five board slots Rear view

Front view

Hardware Component Overview


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2-33

2
The Sun Enterprise 3500 Server
Specications
Table 2-2 Features Number of Gigaplane slots Number of processors CPU interface Memory System Interconnect Three different power supply systems Sun Enterprise 3500 Server Specications and Features System/Board Conguration Five slots. Minimum conguration requires one I/O and one system board One to eight Superscalar SPARC Version 9, UltraSPARC microprocessor modules One to eight, 128-bit Ultra Port Architecture (UPA) slots 256 Mbytes to 16 Gbytes Gigaplane, 2.68 GB/sec at 83 MHz, 3.2 GB/sec at 100 MHz Up to three power and cooling modules (PCM) (power supply + fan module) for system and I/O boards. A peripheral power supply (PPS1) for auxiliary power and a peripheral power supply/AC (PPS0) Up to eight, 3.5 inch hot-swappable, FCAL disk drives with dual porting 8 mm, 4 mm, and .25 inches SunCD32 drive standard 65 cm (25.5 inches) 43 cm (17.0 inches) 60 cm (23.5 inches)

Internal disk Internal tape CD-ROM Height Width Depth

2-34

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2
The Sun Enterprise 4000/4500 Server
A compact mid-range server with tremendous computing power, this server nearly doubles the expendability of the Sun Enterprise 3500 server. You can install up to fourteen UltraSPARC II processor modules in a single chassis with four CPU/memory boards in the front and three CPU/memory boards in back. You can install up to four Sun Enterprise 4500 servers in a single data center cabinet. When properly congured, each Enterprise 4500 system can support over 4 Terabytes of disk storage.

32X CD-ROM drive

Tape drive (optional)

Key switch Power/cooling modules

Peripheral Power Supply Clock Board

Front view

CPU/memory, I/O and disk boards

Rear view

The Enterprise 4500, like the Sun Enterprise 5500 and 6500 servers, uses a horizontal card cage.

Hardware Component Overview


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2-35

2
The Sun Enterprise 4500 Server
Specications
Table 2-3 Features Number of Gigaplane slots Number of processors CPU interface Memory System Interconnect Two different power supply modules used Internal disk Internal tape CD-ROM Height Width Depth Sun Enterprise 4500 Server Specications and Features System/Board Conguration Eight slots. Minimum conguration requires one I/O and one system board Two to 14 Superscalar SPARC Version 9, UltraSPARCII microprocessor modules One to 14, 128-bit Ultra Port Architecture (UPA) slots 256 Mbytes to 28 Gbytes Gigaplane, 2.68 GB/sec (E4000 at 83 MHz), 3.2 GB/sec (at 100 MHz) Up to four PCM (300 watt power supply + fan module) for system and I/O boards. One PPS1 (184 watt peripheral power supply) for auxiliary power Up to eight 9.1 GByte disk drives on up to 4 Disk Boards 8 mm, 4 mm, and .25 inches SunCD32 drive standard 34cm (13.5 inches) 50 cm (19.7 inches) 56 cm (22 inches)

2-36

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2
The Sun Enterprise 5500 Server
The Sun Enterprise 5500 is a 68-inch data center cabinet with an 8-slot E4500 card cage mounted inside. The data center cabinet provides power distribution and cooling for the system and up to one half terabyte of disk space. Each Enterprise 5500 data center rack can accommodate up to four A5000 disk StorEdge subsystems. The Sun Enterprise 5000 server can accommodate up to six removable storage modules (RSMs). The system, when completed with the proper features and options, can support over six terabytes of disk space. This does require additional disk expansion racks.

32X CD-ROM drive Tape drive (optional)

Key switch

Power/cooling modules Peripheral power supply Clock board


TM

Sun StorEdge Library FlexiPack Tray or Hub Tray

Sun StorEdge Library FlexiPack Tray or Hub Tray Cabinet fan tray

CPU/memory, I/O, and disk board slots

Sun StorEdge A5000 Sun StorEdge A5000

Sun StorEdge D1000 Array Sun StorEdge D1000 Array Sun StorEdge D1000 Array

Sun StorEdge A5000 Optional second power sequencer Sun StorEdge A5000

Sun StorEdge D1000 Array Sun StorEdge D1000 Array

Front view

Rear view

Power sequencer

Note: You can have A5000s or D1000s

Hardware Component Overview


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2-37

2
The Sun Enterprise 5500 Server
Specications
Table 2-4 Features Number of Gigaplane slots Number of processors CPU interface Memory System Interconnect Two different power supply modules used Internal disk A5200 option Internal tape CD-ROM Height Width Depth Sun Enterprise 5500 Server Specications and Features System/Board Conguration Eight slots. Minimum conguration requires one I/O and one system board Two to 14 Superscalar SPARC Version 9, UltraSPARCII microprocessor modules Up to 14, 128-bit Ultra Port Architecture (UPA) slots 256 Mbytes to 28 Gbytes Gigaplane, 2.68 GB/sec (E5000 at 83 MHz), 3.2 GB/sec (at 100 MHz) Up to four PCM (300 watt power supply+ fan module) for system and I/O boards. A PPS1 (184 watt peripheral power supply) for auxiliary power Up to eight 9.1 GByte disk drives on up to four Disk Boards Up to four subassemblies for over 1 TByte of storage 8 mm, 4 mm, and .25 inches SunCD32 drive standard 173 cm (68.3 inches) 77 cm (30 inches) 99 cm (39 inches)

2-38

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2
The Sun Enterprise 6500 Server
The Sun Enterprise 6500 server is a 68-inch data center cabinet with a 16-slot card cage; 8-board slots in front as well as the back. The E6000 will have one less storage array, since it is housed in a 56-inch cabinet.

CD-ROM drive Tape drive (optional)

Key switch

Power/cooling modules
TM

Peripheral power supply Clock board

Sun StorEdge Library FlexiPack Tray or Hub Tray

Sun StorEdge Library FlexiPack Tray or Hub Tray

CPU/memory and I/O board slots

Cabinet fan tray

Sun StorEdge A5000

Sun StorEdge D1000 Array Sun StorEdge D1000 Array

Sun StorEdge A5000 Optional second power sequencer Sun StorEdge A5000

Sun StorEdge D1000 Array Sun StorEdge D1000 Array Power sequencer

Front view

Rear view

Note: You can have A5000s or D1000s

Hardware Component Overview


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2-39

2
The Sun Enterprise 6500 Server
Specications
Table 2-5 Features Number of Gigaplane slots Number of processors CPU interface Memory System Interconnect Two different power supply modules used Internal disk A5200 option Internal tape CD-ROM Height Width Depth Sun Enterprise 6500 Specications and Features System/Board Conguration 16 slots. Minimum conguration requires one I/O and one system board Two to 30 Superscalar SPARC Version 9, UltraSPARCII microprocessor modules Up to 30, 128-bit Ultra Port Architecture (UPA) slots 256 Mbytes to 60 Gbytes Gigaplane, 2.68 GB/sec at 84 MHz. Up to eight PCM (300 watt power supply+ fan module) for system and I/O boards. A PPS1 (184 watt peripheral power supply) for auxiliary power Up to four 18.2 GByte disk drives on two disk boards slots 14 and 15 only Up to three subassemblies for over 760 GByte of storage 8 mm, 4 mm, and .25 inches SunCD32 drive standard 6500 - 173 cm (68.3 inches) 6000 - 141 cm (56 inches) 77 cm (30 inches) 99 cm (39 inches)

2-40

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2
Gigaplane Architecture
Ultra Port Architecture (UPA)
The gigaplane interconnect is based around the Sun4u (UPA) architecture. Each board within the gigaplane is assigned 2 UPA port numbers, which are used by the system to derive addressing information which is passed to the Solaris kernel.

Board Layout
q

CPU/memory boards are usually in even-numbered slots in the front (component side down) of E4500, E5500, and E6500 systems. I/O boards are usually in odd-numbered slots in the back.
w

I/O boards are in the back because of the interface ports and connected cables.

Note You can install any CPU/Memory board in any slot, front or back and you can install any I/O board in any slot, front or back. You must install an I/O board in slot 1 to drive the internal CD-ROM and tape unit. The clock board has its own special slot, which is numbered slot 16 in all the systems.

Hardware Component Overview


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2-41

2
Packet Switched Bus
q q

256-bit data width (plus error correction) Out-of-order completion


w

A centerplane transaction does not tie up the bus. Due to the packet nature of bus data, you can have up to 112 transactions waiting for completion. Because there are no unused cycles when different boards access the centerplane, we have a sustained bandwidth that is 97 percent of the maximum.

Pipeline transactions
w w

Up to 7 outstanding transactions from each processor Up to 7 outstanding transactions from each board on the Gigaplane.

Gigaplane Speed
q

Sun Enterprise x000 systems use a clock speed of 83 MHz


w

83 MHz provides for up to 2.6 Gbytes of bandwidth

Sun Enterprise x500 systems use a clock speed 100 MHz


w

100 MHz provides for up to 3.2 Gbytes of bandwidth

Note You can install a 100 MHz board in the 83 MHz system and it should operate properly, although the board will only run at 83MHz. But, installing an 83 MHz board in a 100 MHz system changes the gigaplane speed to 83 MHz. The 100 MHz boards are identied with a plus (+) sign in their product name.

2-42

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2
Centerplane Conguration
The centerplane is a backplane with more connections to the bus for the same linear space.

System Front
CPU/ Mem CPU/ Mem CPU/ Mem CPU/ Mem I/O Board

Clock Board

Data Bus
Address Bus

I/O Board

I/O Board

I/O Board

I/O Board

I/O Board

I/O Board

CPU/ Mem

System Rear

It does not matter which type of board plugs into which side or slot, with the exception of slo1 which we will talk about later. The main considerations are that you want the boards as close to one another as possible to reduce noise and latency. You should place boards with external cabling in the back of the system. The next page gives a layout of the UPA port numbers assigned to each gigaplane slot. We have included the SCSI assignments for the slots; we will cover this later in the course.

Hardware Component Overview


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2-43

2
Centreplane Slot Assignment

2-44

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2
E3000 PCM and Slot Layout

Note: If you do not have PS5 in place, you will need to t a fan tray in its place to provide cooling for the PPSs

Hardware Component Overview


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2-45

2
E3500 PCM and Slot Layout

Note: If you do not have PS5 in place, you will need to t a fan tray in its place to provide cooling for the PPSs

2-46

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2
E4500 - 6500 PCM and Slot Layout

Hardware Component Overview


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2-47

2
Performance
Memory performance
Memory performance is improved by:
q

512 bits plus ECC (error correction code) = 576 bits transfer per CPU clock cycle Cache-to-cache transfers, with the same-line buffer reducing latency and processor intervention High memory bandwidth
w

500 Mbytes per second per bank (600 Mbytes per second for 2 banks on one board) Up to 16-way interleaved memory

Address and data packets, 2-cycles each, so contention delay is small

I/O performance
I/O performance is improved by:
q q

Multiple I/O boards for greater bandwidth Efcient interrupt processing


w

Interrupt packets carry data and interrupts route to any CPU

Two SBus controllers, three sbus slots per Sbus I/O+ board
w w

64-bit, 25 MHz, 64-byte bursts 100 Mbytes per second direct memory access (DMA) read, 120 Mbyte per second DMA write for each SBus Double-buffered streaming buffers for read-ahead, writebehind

Graphics I/O card replaces one SBus and slot with a UPA bus and graphics adapter slot. Other components and ports are the same.

2-48

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2
Hot swap and Hot Plug devices
Be aware of the difference between the above:

Hot Swap
The unit automatically detaches from the system software. Examples are:
q q

PCMs PPSs

Hot Plug
The unit has to be manually detached from the system software. Examples are:
q q q

Disk drives CPU/Memory boards I/O boards

Hardware Component Overview


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2-49

2
Power Supplies
Power/Cooling Module (PCM)

AC Input, 100-240V AC @ 5.5A, DC Output Maximum Continuous 300 watts

+3.3V 51A

+5V 32A

+2.0V 5A

PCM power supplies are used in Enterprise 3x00, 4x00, 5x00, and 6x00 systems. There must be a 300W PCM for every two adjacent boards in the system, because the fans inside the PCM are the only cooling for those boards. This means that if a board is added to the system, there must be an associated PCM. If one is not present, it must be added. Each 300W PCM supplies enough power for two boards, although in a fully loaded conguration one supply can be lost and there will still be enough power for the remaining boards (N+1). The PCMs:
q q q

Are hot pluggable Supply cooling for two adjacent boards Operate in redundant current share mode (N+1)

2-50

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2
Peripheral Power Supply (PPS)
The PPS is used in Enterprise systems to power internal SCSI devices (CD, tape, and disks), in addition to the devices below. There are two types of PPS; one with an AC input which is specic to the E3x00 systems and one without an AC Input common to all the servers.

Backup PPS
You will nd one PPS per 4x00, 5x00, or 6x00 system; and one or two PPS in the E3X00. This is because the PPS in the 4x00, 5x00, 6x00 systems power the CD-ROM and tape only, whilst the PPS in an E3x00 powers the internal disks. Losing a PPS in a E3x00 is a system down, hence the backup. The PPS provides the following:
q q q q q q q q q

+5Vdc and +12Vdc peripheral tray power +5Vdc and +12Vdc drive precharge (nonredundant) +3.3Vdc and +5Vdc system precharge (nonredundant) +5Vdc redundant system power +12Vdc redundant power for PCM fans +12Vdc redundant power for E3000/E3500 Auxiliary Fan Module +5Vdc auxiliary power for Clock Board remote console serial port E4000/E4500 Keyswitch Assembly fan power E5000/E5500 and E6000/E6500 AC Input Box fan power

Internal Disk Board


It is the PCM, not the PPS, that powers the disk board.

Hardware Component Overview


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2-51

2
Peripheral Power Supply (PPS)

184 Watt PPS, used in the E4x00, E5x00 and E6X00. Used as a backup PPS in an E3000. Part number 300-1301

184 Watt PPS with AC Input, used as a main PPS in an E3000. Part number 300-1307

2-52

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2
Peripheral Power Supply (PPS)

195 Watt PPS, used as a backup PPS in an E3500. Part number 300-1358 300-1358 - AC Input 100-240V AC @ 3A, DC Output Maximum Continuous 195 Watts

+5V 20A

+5V 5A

+12.0V 13A

-12.0V 1.5A

+14V 1A

300-1301/1307 - AC Input 100-240V AC @ 3A, DC Output Maximum Continuous 184 Watts

+5V 20A

+5V 5A

+12.0V 13A

+12.0V 1.5A

Hardware Component Overview


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2-53

2
PCM and PPS Status Lights
Status LEDs Codes Green Off On On Off Yellow Off Off On On Description No AC Input Normal Operation Fan Failure DC Output Failure

2-54

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2
Hot Pluggable Boards
Hot Plug Architecture
The CPU/Memory+ boards and the I/O+ boards are hot pluggable under certain conditions. You can only remove a system board if it has an amber light on only, and even then there are checks to be made to ensure the board may be removed. In the middle of the centerplane connector are three large pins that are larger and longer than the Gigaplane connector pins.

Hardware Component Overview


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2-55

2
Hot Plug Architecture
These connectors provide for connection to the power bus before data and address pins make contact in the Gigaplane connectors. Each of the power connectors is a different length, which provides for a sequential connection process. The rst pin to make contact when a board is plugged into the card slot is the ground pin. Next is the precharge voltage connection. This applies a low voltage to the logic and prepares the logic for full voltage with less current drain at contact than would be required if the precharge was not provided. This eliminates the power surge, which corrupts data and address lines, and causes systems to halt when boards are inserted.

Warning The precharge voltage is provided by the PPS. Ensure the precharge is available before attempting a hot-plug. # /usr/platform/sun4u/sbin/prtdiag -v | grep \ precharge

Trigger Pin
Just before the data and address pins in the Gigaplane are connected a logic pin called the trigger pin makes connection. This informs the clock board to suspend activity on the gigaplane for 200ms whilst the board insertion completes.

Caution You can not hot-plug the clock board for two reasons. Firstly, there are no precharge connections for the slot, and secondly because it is the clock board which controls board insertion.

2-56

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2
Common and Unique components
Common Components
One of the features of the Exx00 range is the commonality between major components. Some common components include:
q q q q q q q q

CPU/Memory boards I/O boards CPU Modules Memory PCMs PPSs Clock Boards Disk Boards

Unique Components
Some unique components are:
q q q q q q q

AC Input units Media bays Load boards E3500, IB boards E3500, auxiliary PPS E3500, FC-AL drives E6500 load boards

Hardware Component Overview


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2-57

2
Cooling Considerations
Filler Panel
The ller panel shown below directs airow inside the card cage and helps shield electromagnetic interference (EMI) type emissions from interfering with normal operations.

Springfingers

Caution Empty slots in Enterprise 4X00 and 5X00 systems must have a ller panel installed. Whenever you remove a board and do not immediately replace it, you must install a ller panel.

2-58

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2
Cooling & Loading Considerations
Load Board
The load board shown below does the same tasks as the ller panel, but it also helps maintain a constant load on the power supply system, reducing the occurrences of voltage spikes. Whenever you remove a system board in an E6x00 and it is not immediately replaced, you must install a load board in its place.

Centerplane connector

Springfingers

Caution Load Boards are used only in Enterprise 6X00 Systems. All slots in Enterprise 6X00 systems that do not contain system boards must have a load board installed.

Hardware Component Overview


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2-59

2
Exercise: Component Removal and Replacement
Sun Enterprise 4500, 5500, and 6500 Systems FRU Removal Procedures.
Caution Before beginning any procedure to remove static sensitive components from any Sun Enterprise server, attach an approved ESD wrist strap to your wrist and connect the other end to the system chassis. Connect the ESD mat provided to the same chassis and verify that it is properly grounded before preceding. Always place removed system components on the ESD mat provided

Removing the Power and Cooling Modules


Note Remember the following rules for hot-plug replacement of a PCM: The peripheral power supply must be operational (to provide precharge current). Hot-plugging requires adequate redundancy of electrical power or an overload condition might occur when a power supply is removed. Use the prtdiag command to determine if precharge current is present before removing or installing a hot pluggable power supply. 1. Use a #1 Phillips screwdriver to turn each quarter-turn access screw on the power supply to the unlocked position

2-60

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2
Exercise: Component Removal and Replacement
2. Pull the end of the extraction lever outward to release the power supply from the centerplane.

Front

Rear

Extraction levers toward near side Figure 2-1 Extracting a Power and Cooling Module

3. Slide the power and cooling module out of the chassis.

Hardware Component Overview


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2-61

2
Exercise: Component Removal and Replacement
Removing the Peripheral Power Supplie(s)
1. Use a Phillips #1 screwdriver to unlock the quarter-turn access slots on the power supply. 2. Pull the ends of the extraction levers outward to release the power supply from the centerplane

Figure 2-2

E5500/6500 PPS Removal

Figure 2-3

E3500 PPS/AC Removal

2-62

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2
Exercise: Component Removal and Replacement
Removing the Auxiliary Peripheral Power Supply 1 (PPS1) From the E3500
1. Release the power supply from the system chassis by loosening the captive screws. 2. Pull the ends of the extraction levers outward to release the power supply from the centerplane. 3. Pull the power supply straight out.

Figure 2-4

E3500 Auxiliary Peripheral Power Supply 1 Removal

Hardware Component Overview


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2-63

2
Exercise: Component Removal and Replacement
Removing the Removable Media Tray
1. E3500/4500, remove the front bezel. a. Grasp the front bezel on both sides near the center.

b. Place your thumbs on top of the front bezel and place your other ngers at the slight indentations under the front bezel for leverage. c. Pull the front bezel straight out toward you and set it aside.

2. Loosen the bottom two captive screws that secure the media tray to the chassis tray.

Figure 2-5

E3500/E4500 Media Tray Removal

3. Use a screwdriver in the notch at the bottom center of the tray to assist in separating the media tray from the rear slip connectors, and pull out the tray.

E5500/6500
1. Remove the left side panel 2. Release the device enclosure from the media tray by removing three screws on the left side of the media tray.

2-64

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2
Exercise: Component Removal and Replacement
3. Pull the device enclosure forward and disconnect the data and power cables from the rear of each device. 4. After the cabling is removed, remove the device enclosure from the media tray.

Figure 2-6

E5500/6500 Media Tray Removal

Hardware Component Overview


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

2-65

2-66

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

Bus Structures and Types

3-67
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

3
UPA Bus Architecture
The CPU/memory Board and the UPA Bus
The gure below shows the relationship between the CPU modules and the system board. The area within the shaded box is supported by the UPA bus.

UPA bus

The table below shows you the bus widths for different system functions. UPA and Gigaplane bus widths UPA bus Processor; 128 data + 16 ecc SYSIO; 64 data + 8 ecc FFB; 64 data 41 address Gigaplane bus 256 data + 32 ecc

3-68

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

3
Ultra Port Architecture Features
The Ultra Port Architecture (UPA) supports the high-performance UltraSPARC design. Sun Microsystems created this new component interconnect bus to optimize data transfers between devices and system boards. Designed specically for multitasking, multiprocessing environments, the UPA interconnect handles multiple simultaneous requests for data transfers between processors, memory, and I/O devices. UPA features include:
q q q q q

Packet-switched bus High speed (1.6 Gbytes/second) High bandwidth Direct CPU to memory without crossbar switching Improved 3D graphics acceleration

This new high-performance architecture has a processor-to-memory interconnect using the UPA bus. The UPA bus runs at one-half the CPU clock rate because it is twice as wide. This enables the CPU to load each half of the buss data before the next bus cycle. To increase the data ow between the CPU and other subsystems, the UPA uses crossbar packet switching. Packets from various subsystems, such as memory, graphics, and I/O devices can be multiplexed. This allows multiple transactions to occur seemingly simultaneously, with peak transfers in excess of 1.6 Gbytes per second.

Bus Structures and Types


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

3-69

3
SBus Architecture
SBus Features
Sun Microsystems designed the SunBus (SBus) to provide the SPARC products with a high-performance, space-efcient, and cost effective system bus. The 25 MHz 32-bit data and address SBus specications have been adopted by the Institute of Electrical and Electronic Engineers (IEEE) and are available to third-party developers. SBus provides for device autoconguration. Installing SBus expansion boards is easy because of an EPROM containing machine-independent Forth code that describes the boards function and contains a POST that is compatible with Sun systems POST commands. The system retrieves conguration information from the expansion boards at power-up, thereby identifying and initializing all devices connected on the SBus.

3-70

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

3
PCI Bus Architecture
PCI Mechanical Specications
PCI boards have two basic form factors, standard or long length (312 mm) and short length (119-167 mm). Board edge connectors are keyed for 3.3V signaling, 5V signaling, or universal signaling. Universal boards are designed to t in 3.3V or 5V connectors. The 32-Bit, 124 pin PCI connector has 120 signal pins and 4 key pins. The 32-Bit connector denes the system signaling as 3.3V or 5V. An optional 64-Bit extension is built into the same connector molding extending the number of pins to 184. A 32-Bit PCI board identies itself for 32-Bit transfers when it is installed in a 32-Bit or 64-Bit connector. A 32-Bit PCI board can be installed in either a 32-Bit or 64-Bit connector. A 64-Bit PCI board identies itself for 32-Bit transfers when it is installed in a 32-Bit connector. A 64-Bit PCI board identies itself for 64-Bit transfers when it is installed in a 64-Bit connector. The signals that enable 64-bit operation are REQ64 and ACK64. They are Side A Pin-60 and Side B Pin-60 of the 32-bit connector.

PCI Electrical Specications


The PCI specication provides for 3.3V and 5V signaling. Signaling is determined by the motherboard. Signaling for a 3.3V PCI board is at 3.3V. Signaling for a 5V PCI board is at 5V. Signaling for a universal PCI board is at 3.3V or 5V. All PCI connectors require four power rails: +3.3V, +5V, +12V, and 12V. The distinction between a 3.3V and 5V PCI boards is in the signaling protocol, not the connector power rails. The maximum power allowed for a PCI board is 25 Watts from all four power rails combined.

Bus Structures and Types


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

3-71

3
PCI Bus Architecture
PCI Board Connections
PCI Boards are shown with the solder side up because this is the orientation in many PCI systems.

3-72

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

3
SCSI SBus card
You will nd a number of scsi connections within the Exx00 servers.

Single-Ended Fast/Wide (SunSwift), part number 501-2739 There are sbus scsi cards, pci scsi cards, and each I/O board has an on-board scsi port which is driven by a FEPS chip on the board.

Bus Structures and Types


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

3-73

3
SCSI PCI card

Single-Ended Ultra/Wide (SunSwift PCI), part number 501-2741 This is a PCI SCSI card. Note the driver chip. This card will provide an Ultra-SCSI bus.

3-74

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

3
SCSI Features - Fast SCSI
Small Computer System Interface Features
The Small Computer System Interface (SCSI)-1 standard denes two modes of data transfer: asynchronous (handshaking) and synchronous (streamed) mode. SCSI-1 synchronous transfer rates are limited to 5 Mbytes per second. In many environments this is acceptable. But in congurations with multiple high-performance devices on the bus, 5 Mbytes per second can make the bus a bottleneck. Besides a better-dened set of required features, the (SCSI)-2 standard denes several optional features that have an impact on users: Fast, Wide, differential, and tagged queueing. A specic implementation can be SCSI-2-compliant, yet implement none of these four features. In fact, all current Sun Microsystems SCSI disk and CD-ROM products, as well as the tape drive devices, are compliant with SCSI-2. There are many more features to the SCSI-2 standard than these four options. This section discusses only these options, because they are the most commonly used features of SCSI-2.

Fast SCSI Higher Bus Speed


The SCSI-2 standard denes an option known as Fast SCSI, which increases the synchronous transfer rate to 10-Mbytes per second. The terms Fast SCSI and 10-Mbyte SCSI are synonymous, and are used interchangeably. The term SCSI-2 is often incorrectly used to mean Fast SCSI. 10-Mbytes per second, 5-Mbytes per second, and asynchronous devices can be mixed on a SCSI bus. Transfer rates are negotiated on an individual basis between the host and each device. Fast SCSI requires the proper protocol chips in both the host adapter and device controller, as well as a modied software driver. Solaris 2.0 (and higher) software support fast SCSI. The SPARC desktop systems developed after the SPARCstation 10 offer the fast SCSI host adapter on the system board. There are also additional host adapter SBus cards available from Sun Microsystems that support Fast SCSI.

Bus Structures and Types


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

3-75

3
Wide SCSI, Differential SCSI
Wide SCSI Wider Is Better
In SCSI-1, all data transfer paths are parallel and 8-bits wide. The SCSI-2 standard denes two options that widen the bus to 16 or 32 bits. Each of these options are referred to as Wide SCSI. Most implementations of Wide SCSI are 16-bits wide and also implement the Fast option, thus yielding burst-transfer rates of 20 Mbytes per second.

Differential SCSI Less Interference


The SCSI standard denes two types of electrical interfaces: singleended and differential. Single-ended uses a 50-pin, high-density, connector. Differential SCSI uses special hardware drivers and receivers that reference the signals to each other rather than to ground. Sun Microsystems differential implementation uses a slightly larger, industry-standard, 68-pin, high-density connector. There is no performance benet to differential SCSI, but it accommodates considerably longer SCSI bus lengths than does the single-ended interface. Differential SCSI busses can be up to 25 meters (82 feet) in length. Single-ended SCSI is limited to 6 meters (19.7 feet) total bus length. In fact, the SCSI-2 standard recommends that busses with Fast SCSI devices be limited to 3 meters. However, with high-quality shielded cables and proper active (regulated) bus termination, 6-meter Fast busses are quite reliable.

3-76

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

3
SCSI Termination, Ultra-SCSI
Termination
SCSI buses need to be correctly terminated. If the bus is not terminated, you may get signal reections on the bus which will give SCSI transport errors. There are two types of termination; active (or regulated) and passive (or standard). Active termination is the better of the two.

Ultra-SCSI
Ultra-SCSI is also known as Fast-20. It combines the features of Fast SCSI with Wide SCSI and doubles the transfer rate to 40 MBytes per second. This increase in transfer rate requires the faster (33MHz) PCI bus systems to handle the increased transfer speeds.

Bus Structures and Types


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

3-77

3
SCSI Icons, cable quality
Cable Quality
The following gures assume SUN cables are being used. Ensure your customer is using these cables, or cables of a similar quality.

SCSI icons
Below are the icons which denote single-ended and differential. The icon on the left is for a single ended scsi controller or terminator.

To the left is the icon for a differential scsi controller or terminator.

3-78

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

3
Conclusion - SCSI Cable Lengths
The Signal Frequency and the Electrical Wiring can then be used to calculate the Maximum Cable Length. The following tables show the Maximum Cable Length in meters (m): Cable length Single ended Differential 6.00m 3.00m 1.50m 25.00m 12.50m 6.25m

Signal Freq. SCSI-2 Fast/wide Ultra-SCSI Ultra-SCSI

Devices 1-16 1-4 4-8

Bus Structures and Types


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

3-79

3
SCSI Implementation on Ex00 I/O Boards

Caution You must include the internal cable-lengths of the I/O boards and peripherals in your calculations. Device I/O boards Disk boards Internal Cable length 0.5 m 1.0 m

I/O Board in Slot 1


This is a special case, since the I/O board in slot 1 drives the internal CD-ROM and Tape drive.

Rules
E3500 4.5 m cable length supported E4x00 4.5 m cable length supported E5x00,6x00 SCSI devices are not supported on slot 1 in an E6500, apart from the internal CD-rom and tape.

I/O Boards in all other slots


All other slots support 5.5m of cable length

3-80

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

3
Fibre Channel Interface
Fibre Channel
SCSI is by far the most common peripheral interconnect today, although others are in common use. The primary disk interconnect used by Sun today is Fibre Channel (FC), an ANSI standard (ANSI X3T9.3) that denes a SCSI-like command set but which is carried via a ber optic connection instead of copper wires. Suns SPARCstorage Array uses a Fibre Channel connection to carry standard SCSI-2 commands and data. Although Fibre Channel is an ANSI standard, it has been brought under the SCSI-3 umbrella. Future FC standards will be generated as a subset of the SCSI-3 specication, which includes a bewildering variety of options, for command sets, interconnect media, and interoperability.

Fibre Channel Topologies


The familiar SCSI-2 really has only one or two ways to connect: a tree of peripherals is connected to a host. Alternatively, the peripheral tree is connected to two hosts via some sort of multi-initiator arrangement. Fibre Channel has three very different topology options:
q

point-to-point, in which a device connects to exactly one other device; arbitrated loop (normally abbreviated FC-AL), in which the peripherals and one or more hosts are connected together in a ring topology using many point-to-point links;

FC-AL is architecturally similar to a full-duplex FDDI;


q

fabric, in which switches and hubs are used to create an arbitrarily complex network, possibly including multiple paths from a host to a peripheral.

These topologies are shown overleaf.

Bus Structures and Types


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

3-81

3
Topologies

3-82

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

3
World wide numbers (WWN)
Fibre Channel devices use a at, universal addressing structure in which every device is assigned a unique address, known as the world wide name (WWN). The WWN must be unique in the FC topology; because Fibre Channel domains can potentially be connected into arbitrary fabrics, the usual practice is to assign completely unique WWNs to devices, in much the same way that Ethernet addresses are assigned uniquely. The SPARCstorage Array uses the simplest of these options, a point-topoint link that connects a disk array controller to one or two hosts. The controller connects to a host via a point-to-point link using a twostrand ber cable. Fibre Channel is a full-duplex medium, requiring a strand for each direction. The SPARCstorage Array can be connected to two hosts through the simple expedient of having two (independent) FC interfaces. Expanding the point-to-point mechanism into a more complex network is impossible without resorting to hubs and switches and the use of a fabric.

Fibre Channel Transfer specications


The FC standard denes several classes of signal, corresponding to different capabilities when combined with actual ber connectors. Each signal type uses a different type of laser, so the varieties are not interchangeable. The classes are normally described in terms of their data speed, or 25 MB/sec, 50 MB/sec, and 100 MB/sec. Because FC is a full-duplex standard, transferring between two devices at double these speeds is theoretically possible, although in practice few devices are capable of handling this much data. Although Sun has elded over 20,000 SPARCstorage Arrays using FC-25, the industry as a whole deferred acceptance of Fibre Channel until the arrival of FC-100 parts. The market seems to have bypassed FC-50 completely. A few vendors are now delivering products capable of FC-100 interoperability, but little volume has been achieved to date (mid 1996). However, every major storage vendor is planning FC-100 products in late 1996 or early 1997, and a safe bet is that high-end storage will be dominated by FC100 products by 1998.

Bus Structures and Types


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

3-83

3
Fibre Channel Distance Capability
One of the most useful capabilities of the FC medium is that its lasers are capable of transmitting signals reliably over distances that are far in excess of those attainable using standard copper SCSI technology. Whereas SCSI-2 is limited to six meters in single-ended implementations and 25 meters using differential transceivers, Fibre Channel uses 50 micron multimode ber capable of 2 km transmission distance, although Sun itself offers cable lengths only up to 15 meters. The FC standard permits distances up to 10 km. One of the most useful capabilities made possible by Fibre Channel is the ability to geographically disburse storage across much wider distances than with other technologies. With a practical cabling distance of several kilometers, it is possible to mirror data onto two different disk arrays located on opposite ends of a campus, or even nearby in a metropolitan area. Because the FC connection operates at full disk subsystem speed, disaster recovery can be simplied without loss of performance. This capability is similar to the those offered by a few mainframe disk vendors, with one major exception: the FC operates at full FC speeds with negligible transmission latency, whereas the wide-area disk mirroring available on some mainframe storage units is subject to signicant delays due to wide-area networking latency. For bandwidth-sensitive applications,

3-84

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

3
Fibre Channel Cable
As its name implies, the bre channel devices use a glass bre instead of a copper wire to carry the signal from the source to the destination. The glass bre shown in Figure 3-1 is about the thickness of three sheets of paper.

Buffer coating

125 micron cladding

62.5 Micron core of pure glass fibre Figure 3-1 Cross Section of a Fibre Optic Cable

The jacket on the bre-cable provides something a connecting device can bond with because the glass bre is too thin and fragile for direct access. The connector ends of the cable are precession manufactured to guide the end of the glass ber so it matches up exactly with the transceiver port. If the glass bre is not aligned perfectly with the laser LED, the light does not pass along the cable.

Caution Be careful how you handle bre cable. It has a minimum bend radius which must not be exceeded.

Bus Structures and Types


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

3-85

3
Fibre Channel Interface - FC/OM and GBIC
The jacket helps prevent the cable from being bent or kinked. Any damage to the glass causes a loss of signal. If the cable is bent sharply, the laser beam will not go around the corner. If the cable is cracked or crushed, the laser beam bounces back because it cannot pass through.

Figure 3-2

FC/OM and GBIC Optical Cable and Connector

The bre channel optical module (FC/OM, predecessor to the GBIC) and GBIC bre cable plug and module connectors are keyed so they can connect together only one way. Always observe the two pieces and ensure they are properly aligned before connecting them.

Dual Porting
Fibre-channel allows disk drives and arrays to be dual ported. This gives a great RAS advantage; alternate pathing or dynamic multipathing (DMP) software can be installed which protects the storage from a failing I/O path. Dual porting has implications for device addressing, which we shall look at in chapter 5.

3-86

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

CPU/Memory and Clock Boards

4-87
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

4
Sun Enterprise 3x00/4x00/5x00/6x00 CPU/Memory Boards
CPU/Memory+ board block diagram showing the major component groups and the interconnecting buses.

The CPU/Memory+ board includes An Address Controller (AC+), 8 x Data Controllers (DC+s), A Bootbus Controller, also known as the fhc Onboard devices (including a Flash PROM, and SRAM), Two UPA bus CPU processor slots

4-88

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

4
CPU/Memory Board - Overview

CPU/Memory Board Component Layout. Note the plastic cover over the address and data controllers. It is there to prevent the heatsinks being knocked loose on a board insertion or removal. Loose heatsinks cause us many problems in the eld with unreliability. If you nd a loose heatsink in the eld, replace the board. The older boards do not have this cover. Be especially careful with these boards.

CPU/Memory and Clock Boards


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

4-89

4
CPU/Memory Board - Physical

A 501-2976 support 2MB cache modules and run at 83 MHz A 501-4312 support 8MB cache modules and run at 83MHz A 501-4882 support 8MB cache modules and run at 100 MHz

4-90

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

4
Memory DIMMs
Each CPU/Memory+ board has 16 DIMM sockets, which are divided into two banks of 8 DIMMs each. Bank 0 and bank 1 DIMMs occupy alternate slot locations; bank 0 DIMMs are in the even numbered slots, and bank 1 DIMMs are in odd numbered slots. Memory DIMMs come in sizes ranging from 8 MBytes to 128 MBytes each. Memory must be installed in a complete bank of eight DIMMs with each DIMM being the same size, type, and speed. Bank 0 can contain different size DIMMs than bank 1.

UPA Ports
Proc 0 is assigned the rst port number associated with the slot, proc 1 the second.

DC - DC convertors
These ensure that the CPU modules get the correct voltage they require. Yo do not necessarily have to upgrade a CPU/Memory board if you upgrade the CPU module

CPU/Memory and Clock Boards


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

4-91

4
CPU Modules
Processing power on each CPU/Memory+ board is provided by one or two UltraSPARC II CPU modules, with one to four Mbytes of local high-speed external cache memory. Supported modules are as listed below. 167 MHZ, 0.5/1.0 MB Ecache 250MHZ, 1.0/4.0 MB Ecache 336MHZ, 4.0 MB Ecache 400MHZ, 4.0/8.0 MB Ecache

144 Pin Connector

288 Pin Connector

Screws Screws

Figure 4-1

UltraSPARC II CPU Module

A CPU/Memory+ board is not required to contain an UltraSPARC II processor module and can operate as a memory-only board.

4-92

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

4
400 MHz, 8 MB Ecache processor modules
When trying to install Solaris 2.5.1 HW 11/97 or 2.6 HW 3/98 on a Ex000 server with a 400MHz/8MB cache CPU module, booting from CD-ROM or network install server gives the error message: Fast Data Access MMU Miss error or panics with; mutex_enter: bad mutex. This is because there is no support for the 8MB cache without the following patches. The procedure is as follows. NOTE: This procedure requires downloading and applying patches so the install client must have a network connection. 1. Verify OBP version by typing at the ok prompt be typing ok .version Or check at the UNIX prompt. # /usr/sbin/prtconf -V If needed, upgrade to at least ash PROM version 3.2.21 using patch 103346-22 or greater. 2. ok setenv auto-boot? false 3. ok reset 4. ok limit-ecache-size 5. ok boot cdrom (at least 2.5.1 HW 11/97 or 2.6 HW 3/98) 6. Install the OS but do not allow auto-reboot! 7. # init 0 8. ok reset (usually not needed with 2.6)
CPU/Memory and Clock Boards
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

4-93

4
9. ok limit-ecache-size 10. ok boot 11. Make sure you have a network connection, FTP to sunsolve.sun.com and get latest kernel patch (minimum levels to support 400 mhz/8mb cache listed): Solaris 2.5.1 --> 103640-27 and prtdiag patch 104595-08. Solaris 2.6 --> 105181-14

12. Change run level to single-user mode using init S. 13. Install patches from ftp download directory. 14. Reboot.

4-94

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

4
CPU Module Handling Precautions
Use the following precautions when handling UltraSPARC II modules:

Caution Do not handle the modules by touching the gold pins on the compression connectors. The natural oils on your hands causes these connectors to oxidize and corrode over a period of time. Corroded connector pins cause the module to fail, requiring you to replace the module again. Caution Handle the UltraSPARC modules by the edges only. Do not handle them by the heatsinks because they can break easily.

Warning The heatsinks attached to the UltraSPARC processor chip can get very hot. Avoid touching the heatsink because you can get a severe burn. You could damage the module if you drop it.

CPU/Memory and Clock Boards


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

4-95

4
Removing and Replacing a CPU Module
Use a 3/32 hex-driver to loosen all screws on each of the compression connectors on the module to be removed (three screws for the 288-pin connector, two screws for the 144-pin connector). Lift the module straight up, off the board mating surface and the single standoff that positions the module on the board.

Figure 4-2

Removing a CPU Module

4-96

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

4
Removing and Replacing a CPU Module
Each module is located on the main board with a single standoff and is connected to the main board by two spring loaded connectors. The pins within the connectors are compressed to the corresponding boards mating surfaces by a compression bar which, when secured with screws, connects the module connector pins to the boards corresponding connector surface. To ensure that the connectors are correctly aligned, you must align the post on the MLB with the corresponding hole in the module. When you have the post and hole aligned, you can insert the ve hexsocketed screws and nger tighten them. Now you must torque the screws, in the order described below to six inch-pounds using the torque-driver (Sun part number 560-2324) supplied with the system.

Ignore the reference to Method B. The torque sequence has gone through a number of changes. Take up the slack on each screw, then go around the screws in the order shown putting a 1/4 turn on each screw. Each screw should reach the correct torque setting at the same time. FOLLOW THIS PROCEDURE. IT IS IMPORTANT. DO NOT MAKE UP YOUR OWN SEQUENCE. DO NOT RUSH THIS PROCEDURE.

CPU/Memory and Clock Boards


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

4-97

4
Memory Interleaving
Enterprise servers allow up to 16-way interleaving. There is an OBP parameter which sets up interleaving. memory-interleave min disables interleaving, max sets interleaving to the maximum possible factor. How you populate memory will have a major effect on system performance. The rules are below. Note You must set memory-interleave=min to allow dynamic reconguration of CPU/Memory boards

Memory Conguration Rules


The following rules apply to conguring the systems memory:
q q q q q

DIMMs are 72-pin. Eight DIMMs form a bank. All DIMMs in a bank must have the same capacity. The rst bank of memory can be either Bank 0 or Bank 1. There is a better performance from mixing many smaller banks than fewer bigger banks. Install one bank on each CPU/Memory board before installing the second bank on any board. Install the largest density banks (128MB DIMMs) rst, then medium density banks (32MB DIMMs), and nally the smallest density banks (8MB DIMMs).

All DIMMs in a bank should have the same speed rating. If DIMMs of different speeds are mixed in a bank, the bank will function, but at the lowest speed.

4-98

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

4
CPU/Memory Board Status Indicators
CPU/Memory+ boards have three LEDs indicating the status of that board. With the advent of dynamic reconguration (DR), the meaning of the amber service LED has changed. Before DR, the only time a board had an amber light on was when it had failed POST. The correct meaning of the amber light on as highlighted below, is the board is in low power mode. Either it has a fault or it has been DRd out. Table 4-1 Power Off Off Off Off On On On On On On On On Service Off On Off On Off Off On On Off On Flash Flash Running Off Off On On Off On Off On Flash Flash Off On Condition Board has no electrical power Board is in low power mode, can be unplugged Undened Undened System is hung, either in POST/OpenBoot or in the operating system Hung in OS Hung in POST/OBP or hung in OS and has failed component on board Hung in POST/OBP or hung in OS and has failed component on board OS running OS running and failed component on board. Slow ash = POST. Fast ash = OBP. Undened

CPU/Memory and Clock Boards


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

4-99

4
CPU/Memory Board Status Indicators
The General Rules
The following lists the general LED condition rules for the CPU/Memory+ boards:
q q

If no LEDs are lit, there is no electrical power to the board. If the green Power and Running LEDs are not lit, and only the amber light is lit, the board is ready for removal. If no LEDs are ashing, the system is hung or in the process of booting up. It used to be the case that the board required service if the amber Service LED was lit continuously (not ashing). The amber light is not a fault light, it is a low power indicator. There may well be a fault, or equally the board my have been dynamically recongured out of operation

It is a normal condition for the Service LED to ash during POST testing.

4-100

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

4
Clock+ Board Introduction
There are, at the time of writing, four different clock boards. The main difference between them is the clock ratio. 501-2975 provides a 1:2 clock ratio. 501-4286 supports 1:2 and 1:3 clock ratios. 501-4946 supports 1:2, 1:3, and 1:4 clock ratios. 501-5365 supports 1:2, 1:3, 1:4, 1:5, and 1:6 clock ratios. Now, these ratios are used to derive the gigaplane frequency. The maximum speed is 100 MHz. So, for example, for 400 MHz processors you would need a 501-4946

Note Full details of which clock board is used alongside which processor module, is provided in the FE Handbook.

CPU/Memory and Clock Boards


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

4-101

4
Clock+ Board Block Diagram
The Clock+ board block diagram below shows a high level view of the functionality of the Clock+ board.
LEDs Serial ports Keyboard/ mouse Console Console bus led [2.0]

Centerplane connector Clock Frequency Clock bus

Clocks

Reset Button Reset Button (xir)

Reset bus Reset JTAG bus

JTAG

Clock+ Board Block Diagram The Clock+ board consists of the following subsystems:
q q q q q

Console Bus Clocks Reset logic JTAG logic and interface port for factory testing only Centerplane connector signals monitoring

4-102

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

4
Clock+ Board - Physical

Backpanel and Connectors

CPU/Memory and Clock Boards


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

4-103

4
Clock+ Board Console Bus

Note The console bus passes information such as enviromental information and POST around. It is a back door path between boards.

4-104

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

4
Clock+ Board Console Bus
Console Bus
The Console Bus provides CPU/Memory+ boards access to global system control and status as well as to the keyboard, mouse, and serial ports. In addition, there is a NVRAM/time of day (TOD) chip that maintains the date and time and 8 Kbytes of data when the power to the system is shut off. The state of physical hardware conditions is maintained in registers on the Clock+ board. Each of these registers has inputs generated from other subsystems on the Clock+ board, from other boards, or from the power supplies in the system. Some Clock+ board registers are reserved for controlling various states of the machine. The Clock+ board allows you to connect an ASCII terminal to the serial port and a Sun keyboard and mouse to the keyboard port. This allows you to interface to the local system console. The serial port allows POST messages to be displayed to a local ASCII terminal. You can congure the serial port for standard serial devices, such as modems and printers.

Clocks
The clock subsystem generates the clocks for the entire system. The base clock is synthesized and then divided into various frequencies. These clock signals are then distributed to the centerplane by an array of driver chips. Two clocks for processor slots and one system timing clock go to each of the board slots on the centerplane. Clock synthesizer and drivers. The clock synthesizer generates the base clock signal, which is divided into several different signals by the clock divider. These clocks are then distributed to the centerplane by the clock drivers.

CPU/Memory and Clock Boards


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

4-105

4
Clock+ Board - Overview
Reset Logic
Generates and sends reset commands to all system boards when either an XIR or POR reset signal is received.

TOD/NVRAM
Centralized Time-of-day (TOD) chip that includes NVRAM. You can copy the contents to each I/O board in the system for redundancy and backup

Serial, keyboard and mouse ports


There are two tty connections, along with the kbd/mouse.

JTAG
There is a JTAG (Joint Test Action Group) connection between the system ASICs and the Clock board. POST information is passed around the system via JTAG. There is a further connection on the clock board which is blanked off and used for factory testing only.

4-106

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

4
Clock+ Board Reset Logic
There are four circuits that control system reset and error state.
q q q q

Manual Reset System Reset System Error Reset Externally initiated reset (XIR)

We can initiate resets in a number of ways:


q q q

Power the machine off & on. This is the Power-on Reset POR Type reset at the ok prompt. This is a software reset SOR Use the Reset Buttons on the clock board. The button labelled POR will initiate a power-on reset. The button labelled XIR will run an externally initiated reset (See below)

We can use the remote console commands

Remote Console Commands


The remote console feature is a very basic method of controlling the Exx00 servers. A customer may send reset commands to the servers via the ttya port. The system is constantly monitoring ttya for the commands listed below. CR CR ~ CNTL SHFT P CR CR ~ CNTL SHFT R CR CR ~ CNTL SHFT X Power cycle reset Software reset XIR

On receiving the key sequences on ttya, the system will initiate the appropriate reset.

CPU/Memory and Clock Boards


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

4-107

4
Clock+ Board - XIR resets
Note The secure position of the keyswitch disables the remote console. Enter remote console characters with a 0.5 to 5 second delay.

Externally Initiated Reset XIR


This is a useful reset to use if you are resetting a hung machine. When an XIR occurs, memory is cleared and a snapshot of the CPU registers and processes is saved. To view this snapshot of CPU registers, you must be at the ok prompt. Type OK .xir-state-all This displays information similar to the following: CPU ID#1 TL=1 TT=3 TPC=e0028688 TnPC=e0028688 TSTATE=9900001e06 CPU ID#5 TL=1 TT=3 TPC=e002755c TnPC=e0027560 TSTATE=4477001e03 Note It is outside the scope of this course to go into decoding the XIR log reports. An XIR does not override the NVRAM auto-boot? variable.

You can initiate an XIR either by using the XIR button on the Clock+ board or the remote console XIR sequence.

4-108

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

4
Clock+ Board Status Indicators
LED States
Note The Clock+ Board LEDs display the same information as the system LEDs. This has led people in the past to assume that the clock board has a fault on it. Always check for other fault conditions before assuming a clock board fault. Table 4-2 Power Off Off Off Off On On On Service Off On Off On Off Off On Clock+ Board LED States Cycling Off Off On On Off On Off Condition No power Failure mode Failure mode Failure mode Hung in POST/OPB or OS Hung in OS Hung in POST/OBP Hung in OS / failed component Hung in POST/OBP Hung in OS/ failed component OS running normally OS running / failed component Slow ash=POST Fast ash=OBP OS or OBP error

On

On

On

On On On On

Off On Flashing Flashing

Flashing Flashing Off On

CPU/Memory and Clock Boards


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

4-109

4-110

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

I/O Boards

5-111
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

5
I/O Boards
Types of I/O Boards:
The enterprise systems support the ve types of I/O boards identied as follows.
q q q q q

Type 1 SBus I/O board with FC-OM Fibre Channel Type 4 SBus+ I/O board with FC-AL Fibre Channel Type 2 Graphics I/O board with FC-OM Fibre Channel Type 5 Graphics+ I/O board with FC-AL Fibre Channel Type 3 PCI+ I/O board

The + denotes boards capable of connecting to the 100MHz Gigaplane bus in the X500 series. Each board has three LEDs that provide board status codes.

I/O Addressing
It is essential that you fully understand how disk subsystems, networks, SBus cards, PCI cards are addressed. If your customer has errors on the database /engineering/parts, you need to nd where this partition is mounted. If your customer tells you hme4 is faulty, where do you start?

5-112

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

5
I/O Addressing
We will be going through many examples of I/O addresses. Physical paths are derived using UPA port numbers and device driver names. These are the most common driver names that may appear in a device path. fas - driver for fast/wide SCSI FEPS controllers hme - driver for Fast Ethernet isp - driver for differential SCSI controllers and the SunSwift card sf - driver for soc+ or socal Fiber Channel Arbitrated Loop (FC-AL) soc - driver for SPARC Storage Array (SSA) controllers socal - driver for serial optical controllers for FCAL (soc+) pln - SPARCstorage Array Nexus Driver

System Slot 1
Slot 1 in an Enterprise server will always have an I/O board installed, since it is the on-board SCSI FEPS chip, which drives the internal CDROM and tape drive.

I/O Boards
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

5-113

5
SBus I/O Boards
Block diagram of the SBus I/O board showing two SBuses connecting the components and SBus card slots.

Onboard devices include a Flash PROM, SRAM, and environmental sensors.

5-114

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

5
SBus I/O Board Type 1
The Type 1 was the original 83 MHz SBus I/O board. Part Numbers 501-2977, 501-4287, (83 MHz)

SBus 1

SBus 2

SBus 0

The SBus+ I/O board provides the following interface connections:


q q q q

Two SBus channels for three SBus slots SunFastEthernet Fast/wide SCSI-II Two OLC sockets for FC/OM (Fibre Channel Optical Module) interface converter modules

I/O Boards
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

5-115

5
SBus + I/O Board Type 4
A Type 4 I/O board is the newer 100 MHz SBus I/O board, which differs from a Type 1 in its on-board serial optical controller.

Part Numbers 501-4266 (83 MHz), 501-4883 (83, 90, 100MHz)

SBus 1

SBus 2

SBus 0

The SBus+ I/O board provides the following interface connections:


q q q q

Two SBus channels for three SBus slots SunFastEthernet Fast/wide SCSI-II Two FC-AL sockets for hot-pluggable gigabit interface converter (GBIC) modules

5-116

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

5
SBus I/O Board Type 1, Physical layout

This is the original dual SYSIO board. Type 1 boards have an on-board SOC chip, which drives two on-board Fibre channel optical modules (FC-OM). These are otherwise known as optical link controllers (OLC). The on-board FC-OMs are used to drive a Sparc Storage Array. You may drive 2km of bre cable from these boards. Note the connector layout. pln@a is on the right and pln@b is on the left.

I/O Boards
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

5-117

5
SBus + I/O Board Type 4, Physical layout

A Type 4 board has an on-board SOC+, otherwise known as the socal (SOC arbitrated loop). The SOC+ drives two on-board GBICs, which are used to drive the A500 disk systems. You may drive 500m of bre cable from these boards. The GBIC on the right is addressed as sf@0, the one on the left is addressed as sf@1

5-118

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

5
Graphics I/O Boards

The Graphics+ I/O is similar to the SBus(+) I/O board with the following differences:
q

The Graphics I/O boards (Type 2 and Type 5) have one SBus implemented with one SYSIO chip with two SBus card slots. The Graphics I/O board has one UPA port number assigned to the SYSIO chip, and one UPA port for a fast-frame buffer.

I/O Boards
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

5-119

5
Graphics I/O Board Type 2
The Graphics I/O board shown below provides you with the SBus you need and a UPA interface for those systems on which you need to install a monitor. Part Numbers 501-2749, 501-4288 (83 MHz), UPA Bus SBus 1

SBus 0

The Graphics I/O board provides the following interface connections:


q q q q q

One SBus channel, for two SBus slots One UPA slot for Creator and Creator3D graphics cards SunFastEthernet Fast/wide SCSI-II, Two OLC sockets for FC/OM interface converter modules

5-120

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

5
Graphics+ I/O Board Type 5
The Graphics I/O board shown below is the 100 MHz + version of the type 2 board. Part Number 501-4884 (83, 90, 100MHz) UPA Bus SBus 1

SBus 0

The Graphics I/O board provides the following interface connections:


q q q q q

One SBus channel, for two SBus slots One UPA slot for Creator and Creator3D graphics cards SunFastEthernet Fast/wide SCSI-II, Two FC-AL sockets for hot-pluggable gigabit interface converter (GBIC) modules

I/O Boards
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

5-121

5
Graphics I/O Board Type 2, Physical layout

The difference from a Type 1 is that both sbus0 and sbus2 are driven from one SYSIO chip, which takes the second UPA port number for the board. The rst UPA port number is assigned to the Creator 3d graphics card.

5-122

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

5
Graphics+ I/O Board Type 5, Physical layout

The difference from a Type 4 is that both sbus0 and sbus2 are driven from one SYSIO chip, which takes the second UPA port number for the board. The rst UPA port number is assigned to the Creator 3d graphics card.

I/O Boards
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

5-123

5
PCI+ I/O Board Type 3

The PCI+ I/O board provides the following interface connections:


q

There are risers for 32- or 64-bit cards, 33- or 66-MHz cards, and 3.3- or 5-volt cards. The riser must match the specication of the PCI card used One on-board 10/100-Mb-per-second Ethernet port (twisted pair) Ultra SCSI

q q

5-124

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

5
PCI+ I/O Board Type 3
The diagram of the PCI interface board shown below has two PCI interface connectors to which you must connect a riser for the specic type of PCI card you are installing. Part Numbers 501-4325 (83 MHz), 501-4926 (100MHz) PCI Bus 1

PCI Bus 0

The PCI+ I/O board provides the following interface connections:


q

Four PCI bus channels for two congurable interface riser card slots SunFastEthernet On-board SCSI implemented by an ISP 1040 controller, which gives an Ultra SCSI connection. Note: Ultra SCSI transfer rates are not supported as of 6/98, and should be disabled. Refer to PCI I/O Product Note, 805-3364-10 of September 1997.

q q

I/O Boards
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

5-125

5
PCI+ I/O Board Type 3, Physical layout

Type 3 boards have PSYCHO chips instead of SYSIO chips. PCI0 on the right takes the rst UPA port number. PCI1 on the left takes the second UPA port number.

5-126

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

5
PCI+ I/O Board Type 3 Port Denitions
/pci@x,4000/SUNW,hme@1,1 is the device path (or physical name) for the onboard fast ethernet port on a PCI I/O board. This port is controlled by the PCI 0 Psycho chip on the board. /pci@y,4000/SUNW,isptwo@3 is the device path (or physical name) for the onboard UltraSCSI port on a PCI I/O board. This port is controlled by the PCI 1 Psycho chip on the board. The pci slot labelled J3200 is driven from PCI0 and has a device path beginning with /pci@x,2000/ which denotes that it can drive pci cards at 33MHz or 66 MHz Similarly, the pci slot labelled J4200 is driven from PCI1 and has a device path beginning with /pci@y,2000/ which denotes that it can drive pci cards at 33MHz or 66 MHz

I/O Boards
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

5-127

5
Board Status Indicators
CPU/Memory+ boards and I/O boards have three LEDs indicating the status of that board. With the advent of dynamic reconguration (DR), the meaning of the amber service LED has changed. Before DR, the only time a board had an amber light on was when it had failed POST. The correct meaning of the amber light on as highlighted below, is the board is in low power mode. Either it has a fault or it has been DRd out.

LED Status Codes


Table 5-1 Power Off Off Off Off On On On On On On On On LED Codes for the CPU/Memory+ and I/O Boards Service Off On Off On Off Off On On Off On Flash Flash Running Off Off On On Off On Off On Flash Flash Off On Condition Board has no electrical power Board is in low power mode, can be unplugged Undened Undened System is hung, either in POST/OpenBoot or in the operating system Hung in OS Hung in POST/OBP or hung in OS and has failed component on board Hung in POST/OBP or hung in OS and has failed component on board OS running OS running and failed component on board. Slow ash = POST. Fast ash = OBP. Undened

5-128

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

5
Board Status Indicators
The General Rules
The following lists the general LED condition rules for the CPU/Memory+ and I/O+ boards:
q q

If no LEDs are lit, there is no electrical power to the board. If the green Power and Running LEDs are not lit, and only the amber light is lit, the board is ready for removal. If no LEDs are ashing, the system is hung or in the process of booting up. It used to be the case that the board required service if the amber Service LED was lit continuously (not ashing). The amber light is not a fault light, it is a low power indicator. There may well be a fault, or equally the board my have been dynamically recongured out of operation

It is a normal condition for the Service LED to ash during POST testing.

I/O Boards
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

5-129

5
Enterprise 3500 Fibre Channel Interface Board
This is a new board designed to provide connectivity to the internal disk drives in the Sun Enterprise 3500 server. The internal disk drives operate with the bre channel arbitrated loop (FC-AL) architecture. Each of the four potential FC-AL loops corresponds to one of four gigabit interface converter (GBIC) modules on the Fibre channel interface board.

Part Number 501-4820

GBIC LA GBIC LB GBIC UA GBIC UB

5-130

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

5
SCSI Disk Board
Part Numbers 501-3113 (no disks) 501-4168, 501-5137

High density UltraSCSI connector You can install up to four SCSI disk boards in the Sun Enterprise 4x00, and 5x00 systems and two in the Sun Enterprise 6x00. Each SCSI disk card can contain one or two, 2.1, 4.2 or 9.1 GByte 7200 RPM disk drives.

SCSI Disk Board Addressing


SCSI disk addressing is dependent on drive position and gigaplane slot the SCSI disk board is plugged into. We will cover addressing in chapter 8.

I/O Boards
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

5-131

5-132

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

Open Boot PROM/NVRAM

6-133
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6
Introducing OBP
History
The original SPARC boot PROM was based on revision 1.x A boot command at this revision was of the form >b sd(3,0,0) The rst open boot PROM was OBP 2.x The disadvantage with this revision, was that to upgrade the rmware, you had to change the chip. Enterprise servers operate on OBP3.x which has the advantage that it is downloadable. The OpenBoot architecture provides a signicant increase in functionality and portability when compared to proprietary systems of the past. Although this architecture was rst implemented by Sun Microsystems as OpenBoot on SPARC systems, its design is processorindependent.

Caution Dont get mixed up between NVRAM and OBP.

The OBP holds Device drivers, POST code and provides some user diagnostics. The NVRAM holds the hostid, MAC address, time-of-day and parameters which dictate how the OBP code will interact with the system. Refer back to your desktop course notes.

6-134

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6
Introducing OBP (cont)
Open Boot PROM on each CPU/Memory Board
The proms on each CPU/Memory board all contain the same OBP and POST and should all be at the same revision. The OBP loaded into memory at boot time will be from the POST master.

Open Boot PROM on each I/O Board


The proms on the I/O boards will hold FCODE and iPOST specic to that type of board.

Master NVRAM
Resides on the Clock board.

Backup NVRAM
Reside on each I/O board. There are no backup NVRAM chips on the CPU/Memory boards.

Open Boot PROM/NVRAM


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6-135

6
Introducing OBP (cont)
POST and OpenBoot work together in the system to test and manage system hardware. When the system is turned on, or if a system reset is issued, POST detects and tests buses, power supplies, boards, CPUs, DIMMs, and many board functions. Only POST can congure the system hardware at power up, and only POST can enable hot-pluggable boards (if DR and AP are not present and operating).

ok prompt
Once POST is completed, OBP checks the NVRAM parameters to see how it should congure the system. The OBP is then loaded into main memory. The system may then return to the ok prompt, assuming it has been setup to do so. {6} ok Note The number proceeding the ok prompt species the POST/JTAG master. It is usually the rst CPU module in the system.

6-136

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6
Features of OBP
Plug-in Device Drivers
A plug-in device driver is usually loaded from a plug-in device, such as an SBus card. You can use a plug-in device driver to boot the operating system from a device other than the default boot device. Another example would be to display text on an output device, other than the one attached to ttya, before the operating system has loaded its own device drivers.

FCode Interpreter
Plug-in drivers are written in a machine-independent interpreted language called FCode. Each OpenBoot system PROM contains an FCode interpreter. This means that the same device and driver can be used on machines with different types of CPUs (SPARC, Intel).

Device tree
The device tree is a data structure describing the devices (permanently installed and plug-in) attached to a system. Both the user and the operating system can determine the hardware conguration of the system by inspecting the device tree.

Forth toolkit
The OpenBoot User Interface is based on the interactive programming language Forth. You can combine sequences of user commands to form complete programs. This provides a powerful capability for debugging hardware and software.

Open Boot PROM/NVRAM


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6-137

6
Features of OBP (cont)
Flash Programmable
This makes upgrading the systems POST, OBP, and I/O devices Fcode fast, easy, and inexpensive. You can upgrade several Sun Enterprise servers with little downtime to the enterprise. The new OBP program information can come from a CD-ROM or a network server.

POST
The code to run power on self tests resides within the OBP chip. It too can be upgraded to include tests for new boards which come out.

6-138

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6
Recovery Features
These keyboard functions reset variable parameters in the NVRAM conguration le. Note These keyboard functions work only from a local keyboard. They do not work from an ASCII terminal or remote access terminal connected to the systems serial port A. If your system is down because it does not complete POST, you must connect a Sun keyboard to the keyboard connector to enable these recovery functions. To activate these recovery functions: 1. Start with power off. 2. Press and hold the Stop key and action key simultaneously. 3. Apply power to the system while continuing to hold the keys down until the keyboard LEDs ash. The key combinations and functions available are: Stop-F Forces I/O to ttya. Enter Forth command mode on ttya before probing hardware. Use fexit to continue probing hardware. Stop-N Resets NVRAM contents to default values. Stop-D Sets the diag-switch? parameter variable to true and enables verbose output during POST.

Open Boot PROM/NVRAM


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6-139

6
The OBP User Interface
The OBP user interface is based on an interactive command interpreter that gives you access to an extensive set of functions for hardware and software development, fault isolation, and debugging. You can enter the OpenBoot environment, that is, get to the ok prompt, in the following ways: Shutdown the operating system. # shutdown -y -g0 -i0 Execute the Stop-A keystroke sequence. You will sometimes see Stop-A referred to as L1-A Press the reset switch on systems equipped with one (not recommended unless absolutely necessary). Power-cycle the system (also not recommended). Note A reset will only get you to the OpenBoot user interface i.e. the ok prompt if the OBP parameter auto-boot? is set to false

6-140

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6
System Testing Commands
The Open Boot PROM contains many commands used to test the system hardware.

test-all
Tests all devices that have built-in self test methods. Testing starts with the current device node, or the specied device and includes all children

test (device-specier)
Tests the specied device. The NVRAM diag-switch? parameter and the front panel keyswitch control the verbosity and depth of the test command. Caution After entering the OpenBoot command to probe something, a WARNING message is displayed. It informs you that if the operating system has been running, you must type the reset-all command before you probe anything. Failure to do this causes the system to hang (lock up).

probe-scsi
Identies devices attached to the (primary) SCSI bus.

probe-scsi-all
Identies devices attached to all SCSI host adapters on all system boards.

probe-fcal-all
Identies devices within the E3500 on the FC-AL loops

Open Boot PROM/NVRAM


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6-141

6
watch-clock
Tests the clock function.

watch-net
Monitors the network connection.

probe-net-all
Monitors all network connections of built-in and plugged-in networking cards.

6-142

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6
Informational Commands
Some OpenBoot commands provide information about the system components, including their contents if applicable.

banner
Displays the power-on banner.

.enet-addr
Displays the current Ethernet address.

.idprom
Displays the ID PROM contents.

.traps
Displays a list of SPARC trap types.

.version
Displays the PROM version for all the boards in the system.

Open Boot PROM/NVRAM


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6-143

6
The Device Tree
Devices are attached to a host computer through a hierarchy of interconnected buses. OpenBoot represents the interconnected buses and their attached devices as a tree of nodes. Such a tree is called the device tree. A node representing the host computers main physical address bus forms the trees root node. The physical address generally represents a physical characteristic unique to the device (such as the bus address or the slot number where the device is installed). The use of physical addresses to identify devices prevents device addresses from changing when other devices are installed or removed. Note The system generates the device tree structure after POST and passes it to memory. It is this structure which maps low level addresses to high level addresses. E.g. /sbus@3,0/SUNWfas@3,f880000/sd@0,0 maps to /dev/dsk/c0t0d0s0

6-144

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6
Typical Device Tree
OpenBoot deals directly with hardware devices in the system. Each device has a unique name representing the type of device and where that device is located in the system addressing structure. The following example shows a typical device tree.
machine

central

fhc ac eeprom

memory

sbus

cpu-module

upa

flashprom

SUNW,hme

SUNW,fas

SUNW,socal

fhc ethernet scsi-disk zs scsi-tape eeprom ssd clock-board sf

Figure 6-1

Typical Device Tree

Open Boot PROM/NVRAM


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6-145

6
Displaying the Device Tree
You can browse the device tree to examine and modify individual device tree nodes. The device tree browsing commands are similar to the Solaris commands for changing (cd), displaying (ls) and listing the current directory (pwd) in the Solaris directory. Selecting a device node makes it the current node. Table 6-1 Command .properties dev devicepath dev node-name Commands for Browsing the Device Tree

Description Displays the names and values of the current node's properties. Chooses the indicated device node, making it the current node. Searches for a node with the given name in the subtree below the current node, and choose the rst such node found. Chooses the device node that is the parent of the current node. Chooses the root machine node. Exits the device tree.

dev .. dev / device-end

ls pwd

Displays the names of the current node's children. Displays the device path name that names the current node.

show-devs [device-path]

Displays all the devices directly beneath the specied device in the device tree. The show-devs command, used by itself shows the entire device tree.

6-146

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6
Using the .properties Command
The .properties command displays the names and values of all the properties in the current node: ok dev /zs@1,f0000000 ok .properties address ffee9000 port-b-ignore-cd port-a-ignore-cd keyboard device_type serial slave 00000001 intr 0000000c interrupts 0000000c reg 00000001 name zs ok

00000000 0000000 00000008

Using the dev Command


The dev command sets the current node to the named node so you can be view its contents. For example, to make the ACME company's SBus device named ACME,widget the current node: ok dev /sbus/ACME,widget The find-device command is identical to the dev command, differing only in the way the input pathname is passed. ok /sbus/ACME,widget find-device Note After choosing a device node with dev or find-device, usually, you cannot execute that node's methods because dev does not establish the current instance. For a detailed explanation of this issue, refer to Writing FCode 3.x Programs, part number 802-3239-10.

Open Boot PROM/NVRAM


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6-147

6
Listing System Devices
The show-devs command displays a listing of all devices currently available in the system. If a device has been added to a disable list (discussed in the next section) but the system has not been reset or gone through a POST, the device still shows up on the dev report. A device can be physically installed in the system chassis but not show up on the following report because it is listed on the disabled-board list. You must remove the entry from the disabled board list after the board has been replaced. You must do a system reset to enable POST and OBP to add the device back to the dev listing. The following device listing is from a Sun Enterprise 4000. ok show-devs /SUNW,ffb@2,0 /counter-timer@7,3c00 /sbus@7,0 /counter-timer@6,3c00 /fhc@6,f8800000 /sbus@6,0 /counter-timer@3,3c00 /sbus@3,0 /fhc@2,f8800000 /SUNW,UltraSPARC@5,0 /SUNW,UltraSPARC@4,0 /fhc@4,f8800000 /SUNW,UltraSPARC@1,0 /SUNW,UltraSPARC@0,0 /fhc@0,f8800000 /central@1f,0 /virtual-memory /memory@0,0 /aliases /options /chosen /openprom /packages /sbus@7,0/SUNW,fas@3,8800000 /sbus@7,0/SUNW,hme@3,8c00000 /sbus@7,0/SUNW,fas@3,8800000/st /sbus@7,0/SUNW,fas@3,8800000/sd /fhc@6,f8800000/sbus-speed@0,500000 /fhc@6,f8800000/eeprom@0,300000 /fhc@6,f8800000/flashprom@0,0

6-148

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6
Listing System Available Devices
/fhc@6,f8800000/environment@0,400000 /fhc@6,f8800000/ac@0,1000000 /sbus@6,0/SUNW,soc@d,10000 /sbus@3,0/SUNW,fas@3,8800000 /sbus@3,0/SUNW,hme@3,8c00000 /sbus@3,0/SUNW,soc@d,10000 /sbus@3,0/SUNW,fas@3,8800000/st /sbus@3,0/SUNW,fas@3,8800000/sd /sbus@3,0/SUNW,soc@d,10000/SUNW,pln@a0000000,78c0c9 /sbus@3,0/SUNW,soc@d,10000/SUNW,pln@a0000000,78c0c9/SUNW,ssd /fhc@2,f8800000/sbus-speed@0,500000 /fhc@2,f8800000/eeprom@0,300000 /fhc@2,f8800000/flashprom@0,0 /fhc@2,f8800000/environment@0,400000 /fhc@2,f8800000/ac@0,1000000 /fhc@4,f8800000/flashprom@0,0 /fhc@4,f8800000/sram@0,200000 /fhc@4,f8800000/environment@0,400000 /fhc@4,f8800000/simm-status@0,600000 /fhc@4,f8800000/ac@0,1000000 /fhc@0,f8800000/flashprom@0,0 /fhc@0,f8800000/sram@0,200000 /fhc@0,f8800000/environment@0,400000 /fhc@0,f8800000/simm-status@0,600000 /fhc@0,f8800000/ac@0,1000000 /central@1f,0/fhc@0,f8800000 /central@1f,0/fhc@0,f8800000/clock-board@0,900000 /central@1f,0/fhc@0,f8800000/zs@0,904000 /central@1f,0/fhc@0,f8800000/zs@0,902000 /central@1f,0/fhc@0,f8800000/eeprom@0,908000 /openprom/client-services /packages/disk-label /packages/obp-tftp /packages/deblocker /packages/terminal-emulator ok

Open Boot PROM/NVRAM


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6-149

6
Listing System Available Devices

Caution If you boot the operating system, exit from the operating system into OpenBoot without resetting the system, then use some OpenBoot commands, the commands might not work as expected. In this case, you might have to power cycle the system to restore normal operation. For example, suppose you boot the operating system, exit to OpenBoot, then execute the probe-scsi command. You nd that probe-scsi fails, hangs the system, and you cannot resume (Ok go) the operating system. To regain control of the system, you must perform a hardware reset (power cycle or reset switch). The correct method for executing OpenBoot probe commands is to reset the system before entering the command. You must type reset-all as the rst OBP command, then invoke the desired probe command, as shown:

ok reset-all ok probe-scsi-all

sifting Command
sifting acts very much like the UNIX grep command. If you have a command you wish to run and you cant remember the syntax, type: ok sifting test

6-150

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6
Displaying Device Aliases
The devalias command prints a listing of shortcuts or nicknames for long device addresses. The system has no trouble remembering long device addresses but humans do. So the device aliases list was created. You should be familiar with one or two of these aliases, such as disk and cdrom, because you have used both of these to boot the system. You can always use the entire device path at the OK prompt when booting. Systems usually have predened device aliases for the most commonly used devices, such as the following listing taken from a Sun Enterprise 3500. ok devalias disk disksocal disk diskbrd diskisp net cdrom tape scsi disk0 disk1 disk2 disk3 disk4 disk5 tape0 tape1 ttya ttyb keyboard keyboard! name ok /sbus@2,0/SUNW,socal@d,10000/sf@0,0/ssd@0,0 /sbus@2,0/SUNW,socal@d,10000/sf@0,0/ssd@0,0 /sbus@3,0/SUNW,fas@3,8800000/sd@0,0 /sbus@3,0/SUNW,fas@3,8800000/sd@a,0 /sbus@3,0/QLGC,isp@0,10000/sd@0,0 /sbus@3,0/SUNW,hme@3,8c00000 /sbus@3,0/SUNW,fas@3,8800000/sd@6,0:f /sbus@3,0/SUNW,fas@3,8800000/st@4,0 /sbus@3,0/SUNW,fas@3,8800000 /sbus@3,0/SUNW,fas@3,8800000/sd@0,0 /sbus@3,0/SUNW,fas@3,8800000/sd@1,0 /sbus@3,0/SUNW,fas@3,8800000/sd@2,0 /sbus@3,0/SUNW,fas@3,8800000/sd@3,0 /sbus@3,0/SUNW,fas@3,8800000/sd@4,0 /sbus@3,0/SUNW,fas@3,8800000/sd@5,0 /sbus@3,0/SUNW,fas@3,8800000/st@4,0 /sbus@3,0/SUNW,fas@3,8800000/st@5,0 /central/fhc/zs@0,902000:a /central/fhc/zs@0,902000:b /central/fhc/zs@0,904000 /central/fhc/zs@0,904000:forcemode aliases

Open Boot PROM/NVRAM


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6-151

6
Device Alias Commands
A device alias, or simply, alias, is a shorthand representation of a device path. For example, the boot disk, partition a, can be aliased as disk, which represents the complete device path name to the boot disk drive. The devalias commands are used to examine, create, and change aliases Table 6-2 Device Alias Commands. Description Displays all current device aliases. Displays the device path name corresponding to alias. Creates and denes an alias representing device-path. If an alias with the same name already exists, the new value supersedes the old.

Command devalias devalias alias devalias alias device-path

Caution User-dened aliases are lost after a system reset or power cycle. To create permanent aliases, use the nvalias command.

ok devalias disk disk /sbus@2,0/SUNW,socal@d,10000/sf@0,0/ssd@0,0 ok devalias disk /sbus@3,0/SUNW,fas@3,8800000/sd@0,0 ok devalias disk disk /sbus@3,0/SUNW,fas@3,8800000/sd@0,0 ok This changed the default boot disk from one in a storage subsystem connected to a GBIC (socal@d) to a local disk on a fast SCSI SBus card.

6-152

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6
nvalias command
An easy method of setting up an alias is to use the show-disks command.

Example
We will set up a boot device on the rst disk on a disk board located in slot 3. {0} ok show-disks a) /sbus@7,0/SUNW,fas@3,8800000/sd b) /sbus@3,0/SUNW,fas@3,8800000/sd q) NO SELECTION Enter Selection, q to quit: a /sbus@7,0/SUNW,fas@3,8800000/sd has been selected. Type ^Y ( Control-Y ) to insert it in the command line. e.g. ok nvalias mydev ^Y for creating devalias mydev for /sbus@7,0/SUNW,fas@3,8800000/sd {0} ok nvalias bootdisk CTRL-Y pressing CNTRL-Y here will insert /sbus@7,0/SUNW,fas@3,8800000/sd. You must add @a,0

To set boot device, the boot-device NVRAM parameter must be changed: ok setenv boot-device bootdisk ok reset

Open Boot PROM/NVRAM


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6-153

6
Open Boot PROM Commands for the NVRAM
Whenever you are not sure of the correct command or what the command is used for, you can ask for help. The OPB displays a listing of commands available. ok help After listing and selecting a command you think might be the one you want, you can ask for help on that one command. Type help command-name or help category-name for more specic help. Note Use ONLY the rst word of a category-name or category description. For example, type help select ok help select Main categories are: Repeated loops Defining new commands Numeric output Radix (number base conversions) Arithmetic Memory access Line editor System and boot configuration parameters Select I/O devices Floppy eject Power on reset Diag (diagnostic routines) Resume execution File download and boot nvramrc (making new commands permanent) Enable/Disable selected hardware subsystems Environmental monitor

6-154

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6
OBP Commands for displaying and changing the NVRAM Parameters
The printenv Command
The printenv command displays NVRAM parameter names, current values, and default values. The following is a listing of current parameter names. Each system type and model can have different parameters available. Desktops have one set, single main logic board (MLB) servers, such as the Sun Enterprise 250, have a different set and multiple CPU board servers, such as the Sun Enterprise 5500 have another set of parameters. To display the contents of the NVRAM, use the printenv command. ok printenv

Variable disabled-memory-list disabled-board-list configuration-policy memory-interleave diag-passes diag-verbosity diag-continue? tpe-link-test? scsi-initiator-id keyboard-click? keymap ttyb-mode ttya-mode ttyb-rts-dtr-off ttyb-ignore-cd ttya-rts-dtr-off ttya-ignore-cd reboot-flag reboot-posc reboot-posl reboot-cmd

Name Value

Default Value

board max 1 0 false true 7 false 9600,8,n,1,9600,8,n,1,false true false true false 4294582272 0 boot net -r

component max 1 0 false true 7 false 9600,8,n,1,9600,8,n,1,false true false true false 0 0

Open Boot PROM/NVRAM


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6-155

6
Open Boot PROM Commands for the NVRAM
diag-level env-monitor #power-cycles system-board-serial# system-board-date fcode-debug? output-device input-device load-base boot-command auto-boot? auto-boot-on-error? watchdog-reboot? diag-file diag-device boot-file boot-device local-mac-address? ansi-terminal? screen-#columns screen-#rows silent-mode? use-nvramrc? nvramrc security-mod security-password security-#badlogins oem-logo oem-logo? oem-banner oem-banner? hardware-revision last-hardware-update diag-switch? ok min enabled 4 802F01F0 34cf6a6b false screen keyboard 16384 boot true false false net net false true 80 34 false false none 0 false false false false min enabled

false screen keyboard 16384 boot true false false net disk net false true 80 34 false false

false

false

6-156

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6
Open Boot PROM Commands for the NVRAM
To show a specic parameter, for example the diag-switch? variable, type printenv and the variable name. ok printenv diag-switch? diag-switch? = true ok You can modify the values of the conguration variables, and any changes you make remain in effect even after a power cycle.

Caution Conguration variables should be adjusted cautiously. These NVRAM variables determine the startup routine of the system so their conguration, if incorrect, can cause the system to operate in an unexpected manner. To change a parameter, use the setenv command. To change the diagnostic switch: ok setenv diag-switch? true ok set-defaults The set-defaults command restores the default setting of all parameters. ok set-default variable The set-default variable command resets the value of variable to the default setting.

Open Boot PROM/NVRAM


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6-157

6
General NVRAM Parameters
Below are the NVRAM parameters which apply to all Sun servers. The list is as compiled by the Solaris eeprom command. Note Not all OpenBoot systems support all parameters. Defaults may vary depending on the system and the PROM revision. List of NVRAM Conguration Parameters Variable auto-boot? boot-command boot-device boot-le diag-device diag-le diag-level diag-switch? fcode-debug? hardware-revision input-device keyboard-click? last-hardwareupdate local-mac-address? Typical Default true boot disk net empty string net empty string platformdependent false false N/A keyboard false N/A false Description If true, boot automatically after power-on or reset. Command executed if auto-boot? is true. Device from which to boot. File to boot. An empty string lets the secondary booter choose the default. Diagnostic boot source device. File from which to boot in diagnostic mode. Diagnostics level. Values include off, min, max and menus. If true, run in diagnostic mode. If true, includes name parameter for plug-in device FCodes System version information. Input device used at power-on (usually keyboard, ttya, or ttyb). If true, enable keyboard click. System update information. If true, network drivers use their own MAC address, not systems.

6-158

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6
General NVRAM Parameters ( cont)
Variable nvramrc output-device screen-#columns screen-#rows scsi-initiator-id security-mode Typical Default empty screen 80 34 7 none Description Contents of NVRAMRC. Output device used at power-on (usually screen, ttya, or ttyb). Number of on-screen columns (characters/line). Number of on-screen rows (lines). SCSI bus address of host adapter, range 0-7. Firmware security level (options: none, command, or full). If set to command or full, system will prompt for PROM security password. Firmware security password (never displayed). Can be set only when security-mode is set to command or full. Metabytes of RAM to test. Ignored if diag-switch? is true. Enable 10baseT link test for built-in twisted pair Ethernet. TTYA line discipline (baud rate, #bits, parity, #stop, handshake). TTYB line discipline (baud rate, #bits, parity, #stop, handshake). If true, operating system ignores carrier-detect on TTYA. If true, operating system ignores carrier-detect on TTYB If true, operating system does not assert DTR and RTS on TTYA. If true, operating system does not assert DTR and RTS on TTYB. If true, execute commands in NVRAMRC during system start-up.

security-password

N/A

selftest-#megs tpe-link-test? ttya-mode ttyb-mode ttya-ignore-cd ttyb-ignore-cd ttya-ignore-cd ttyb-ignore-cd use-nvramrc?

1 true 9600,8,n,1,9600,8,n,1,true true false false false

Open Boot PROM/NVRAM


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6-159

6
Platform specic NVRAM Commands
The OpenBoot PROM Version 3.x used in Sun Enterprise server systems now includes additional parameters for managing the hardware. These new parameters include:
q

disabled-board-list Is a list of boards, by system backplane slot number to be disabled at boot up. This example puts the board in slots 4 and 6 in the NVRAM disable-board-list parameter: ok setenv disabled-board-list 46 To return disable-board-list to default value, type: ok set-default disable-board-list

disabled-memory-list (whole board at a time) Displays a list of CPU boards whose memory is to be disabled and left unused by the operating system. The value (for example, 7a) is the CPU board in slots 7 and 10 containing the memory that is to be disabled. There is no way to disable individual memory banks at this time. The CPU modules, if any, on the board continue to operate normally. To disable the memory on the CPU board in slot 7 type: ok setenv disabled-memory-list 7a

memory-interleave Used to enable or disable memory interleaving. Values are min to disable memory interleaving and max to set the maximum possible memory interleaving.

6-160

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6
q

configuration-policy Denes how the system handles devices when they fail POST. The values are component, board, or system. For example, if a SYSIO chip on an I/O board in slot 5 fails its self test, POST disables the entire board if the variable is set to board. POST disables only the SBus if the variable is set to component.

sbus-probe-default sbus-probe-default d3120 This variable denes the SBus device probe order on an I/O board per SBus, where: d = On-board SOC 3 = On-board FEPS 0-2 = SBus slots 0, 1, and 2 On a Type 2 and a Type 5 I/O board, since there is only 1 SBus, the probe order will be: d 3 2 0 (no slot 1) To change the default probe order to 123d0, enter the following at the ok prompt: ok setenv sbus-probe-default 123d0

Remember that this changes the default probe order for all boards in the system. You can also use this to skip over an SBus slot, but dont include it in the list of devices to probe. To change the probe order for a specic board, use the sbus-specic-probe variable.

Open Boot PROM/NVRAM


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6-161

6
q

sbus-specific-probe This variable controls the SBus probe order on a given list of boards. To set the probe order as 320 on I/O board 4, enter the following at the ok prompt: ok setenv sbus-specific-probe 4:320 The number preceding the : is the slot number; the numbers following it are the SBus device numbers in the desired probe order. All unlisted I/O boards in the system will use the default probe order as dened by the sbus- default-probe NVRAM variable. Multiple boards can be dened by this variable as follows: ok setenv sbus-specific-probe 4:320 6:d3210

6-162

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6
Environmental Monitoring
Some of the functions of the OBP do not use inputs from a user. These functions are preprogramed operations that start automatically after the system has booted. Some take input from the Solaris operating system and perform tasks as described in their initial conguration. These congurations might not be congurable by you or the operating system.
q

ok disable-environmental-monitor Stops the monitoring of power supply status, board temperatures, and board hot plug while the screen displays the ok prompt.

ok enable-environmental-monitor Starts monitoring power supply status, board temperatures and board hot plug while the screen displays the ok prompt.

Note This environmental-monitor function is enabled by default. Console messages for environmental conditions appear as follows:
q q q q q q q

PROM NOTICE: Overtemp detected on board <n. PROM NOTICE: System has cooled down. PROM WARNING: Board <n is too hot. PROM NOTICE: Insufcient power detected. PROM NOTICE: Power supply restored. PROM NOTICE: Board insert detected. PROM NOTICE: Reset Initiated...

If a board is over the predetermined temperature, then the PROM initiates a warning message to the console and performs a reset command resulting in POST disabling the faulty board and the system rebooting the operating system. If insufcient power is detected and is not xed in 30 seconds, the OBP initiates a reset to allow POST to decongure some of the boards according to the amount of available power.

Open Boot PROM/NVRAM


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6-163

6
NVRAM Security
The NVRAM system security variables are:
q

security-mode
Sets the rmware security level (options: none, command, or full). Default is none.

security-password
Sets the rmware security password (never displayed). No default.

security-#badlogins
Sets the number of incorrect security password attempts. No default.

Caution Do not set a password at the OBP level.

Your customer may or may not wish to. If he does and then forgets it, there is no way to recover back to a default

6-164

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6
NVRAMRC Editing Commands for the NVRAM
The script editor, nvedit, lets you create and modify the script using the commands listed in NVRAM Table 6-3 Command nvalias alias device-path $nvalias nvramrc Script Editor Commands. Description Stores the command devalias alias device-path in the script. The alias persists until either nvunalias or set-defaults is executed. Performs the same function as nvalias, except that it takes its arguments, name-string devicestring, from the stack. Enters the script editor. If data remains in the temporary buffer from a previous nvedit session, resumes editing those previous contents. If not, reads the contents of nvramrc into the temporary buffer and begins editing it. Discards the contents of the temporary buffer, without writing it to nvramrc. Prompts for conrmation. Recovers the contents of nvramrc if they have been lost as a result of the execution of setdefaults; then enters the editor as with nvedit. nvrecover fails if nvedit is executed between the time that the nvramrc contents were lost and the time that nvrecover is executed. Executes the contents of the temporary buffer. Copies the contents of the temporary buffer to nvramrc; discards the contents of the temporary buffer. Deletes the specied alias from nvramrc. Performs the same function as nvunalias except that it takes its argument, name-string, from the stack.

nvedit

nvquit

nvrecover

nvrun nvstore

nvunalias alias $nvunalias

Open Boot PROM/NVRAM


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6-165

6
NVRAMRC Editing Commands for the NVRAM ( cont)
NVRAM Command Precautions
There are two commands you should understand along with their implications:
q

set-defaults and the escape hatch Stop-N


w

Sets all NVRAM variables to the default values

Note Key switch in secure position will inhibit Stop key functions.
q

use-nvramrc? Set to false


w

Clears the nvramrc memory location.

If any device alias had been set, they would have been in nvramrc along with possible other tests or codes required to execute during POST and boot. The nvrecover command can restore the contents if you do not do the nvstore command after you type the set-defaults command. If the nvstore command was done, the contents of the nvramrc memory area are not recoverable. This is one more reason why it is important that you write down the contents of the nvramrc before attempting any changes to it.

6-166

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6
Updating Flash PROM and FCode
Do you need to update?
At the ok prompt, type .version. The banner command gives the OBP revision but not the FCode revisions. ok .version Slot 1 - I/O Type 4 1997/08/26 17:37 Slot 3 - I/O Type 3 1997/05/01 10:56 Slot 7 - I/O Type 1 1998/04/16 14:22 Slot 9 - CPU/Memory 1998/06/09 16:25 FCODE 1.8.7 1997/12/08 15:39 iPOST 3.4.4 FCODE 1.8.7 1997/05/09 11:18 iPOST 3.0.2 FCODE 1.8.3 1997/11/14 12:41 iPOST 3.4.6 OBP 3.2.16 1998/06/08 16:58 POST 3.9.4

You can use the .properties command to display the CPU/Memory Board Flash PROM revision in hexadecimal ASCII, but this is a long way round to get to the information above. It is included to demonstrate how the ash-proms connect to the fhc, aka re-hose controller, aka bootbus controller. Note Remember that the show-devs command lists all the devices in the OpenBoot device tree, which you need for the following commands. ok cd /fhc@12,f8800000/flashprom@0,0 ok .properties version 4f 42 50 20 20 20 33 2e 32 2e 31 36 20 31 39 39 39 model SUNW,525-1431 name flashprom

Note 4f 42 50 20 20 20 33 2e 32 2e 31 36 20 31 39 39 39 is the hex code for OBP 3.2.16 1999

Open Boot PROM/NVRAM


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6-167

6
Updating Flash PROM and FCode (cont)
ok cd /fhc@e,f8800000/flashprom@0,0 ok .properties version 46 43 4f 44 45 20 31 2e 38 2e 33 20 31 39 39 37 model SUNW,525-1432 name flashprom

Note 46 43 4f 44 45 20 31 2e 38 2e 33 20 31 39 39 37 = FCODE 1.8.3 1997 Use the .properties command to display the I/O Board SOC Controller FCode revision. ok cd /sbus@2,0/SUNW,soc@d,10000 ok .properties soc-fcode 1.3 95/09/28 model 501-2069 name SUNW,soc Use the .properties command to display the I/O Board SOC+ Controller FCode revision. ok cd /sbus@2,0/SUNW,socal@d,10000 ok .properties version @(#) FCode 1.11 97/12/07 model 501-3060 name SUNW,socal

Checking version under UNIX


At the UNIX prompt, you can obtain the OBP revision level using: # prtconf -V

6-168

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6
Updating Flash PROM and FCode (cont)
Where do I obtain the latest revisions?
At the time of this writing, patch 103346-24 updates the OBP to 3.2.24 The patch is available on Sunsolve CD and from sunsolve.sun.com Flash PROM and FCode are available within this patch

Caution You can not use patchadd or installpatch to upgrade the Flash PROM and FCode. You must obtain the patch, uncompress it and extract the les. After that you use the Flash PROM programming utility to update the OpenBoot PROM on the CPU/Memory board and FCode on the I/O boards.

Example
# zcat 103346-24.tar.Z | tar xvf # gzcat 103346-24.tar.gz | tar xvf -

The gzcat utility does not come as standard on Solaris 2.6 systems, but is available on the Sunsolve CD, under the directory /cdrom/cdrom0/gzip/bin/svr4

Open Boot PROM/NVRAM


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6-169

6
Procedure to update FlashPROM and FCode

Caution As a consequence of the upgrade, the systems NVRAM conguration variables MAY BE reset to their default values. If you have any custom NVRAM CONFIGURATION then you SHOULD NOTE THEM DOWN Before proceeding. Attach to the directory derived from the previous step. The ashupdate is achieved by running the UNIX programme within the directory. # cd 103346-24 # ./flash-update-<latest-rev>

Generating ashprom driver... Generating SUNW,Ultra-Enterprise ash-update program... Current System Board PROM Revisions: --------------------------------------------------------Board 0: cpu OBP 3.2.23 1999/10/01 10:07 POST 3.9.23 1999/10/01 17:54 Board 2: cpu OBP 3.2.23 1999/10/01 10:07 POST 3.9.23 1999/10/01 17:54 Board 1: Dual SBus + IO Board FCODE 1.8.23 1999/10/01 10:07 iPOST 3.4.23 1996/03/16 17:55 Board 3: Dual PCI IO Board FCODE 1.8.23 1999/10/01 10:07 iPOST 3.0.23 1999/10/01 17:55 AvailableUpdate Revisions: ----------------------------------------CPU/Memory Board: OBP 3.2.24 1999/12/23 17:31 POST 3.9.24 1999/12/23 17:35 IO Graphics Board:

6-170

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6
I/O Type 2 FCODE 1.8.24 1999/12/23 17:29 iPOST 3.4.24 1999/12/23 17:34 IO Graphics + Board: I/O Type 5 FCODE 1.8.24 1999/12/23 17:34 iPOST 3.4.24 1999/12/23 17:34 Dual Sbus IO Board: I/O Type 1 FCODE 1.8.24 1999/12/23 17:29 iPOST 3.4.24 1999/12/23 17:34 Dual Sbus + IO Board: I/O Type 4 FCODE 1.8.24 1999/12/23 17:30 iPOST 3.4.24 1999/12/23 17:34 Dual PCI IO Board: I/O Type 3 FCODE 1.8.24 1999/12/23 17:30 iPOST 3.0.24 1999/12/23 17:34 Verifying Checksums: Okay Do you wish to ash update your rmware? y/[n]: y Are you sure? y/[n]: y

Updating Board 0: Typecpu 1 Erasing ... Done. 1 Verifying Erase... Done. 1 Programming... Done. 1 Verifying Program... Done. Updating Board 2: Type cpu 1 Erasing... Done. 1 Verifying Erase... Done. 1 Programming... Done. 1 Verifying Program... Done.

Open Boot PROM/NVRAM


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6-171

6
Updating Board 1: Type dual-sbus 1 Erasing... Done. 1 Verifying Erase... Done. 1 Programming... Done. 1 Verifying Program... Done. Updating Board 3: Type upa-sbus 1 Erasing... Done. 1 Verifying Erase... Done. 1 Programming... Done. 1 Verifying Program... Done.

# NOTE: The ash proms are write protected by either of the following two conditions: a) Front panel key switch in secure mode. b) Jumper (P601) removed on clock board. At the time of writing this document systems are shipped with the jumper on the clock board installed. This means that only the front panel key switch being in secure position write protects the proms. If the proms are detected to be write protected then the ash update process will fail with the following message: FPROM Write Protected: Check Write Enable Jumper or Front Panel Key Switch.

Caution If there is a power failure while the ash proms are being upgraded then you need to follow steps listed on the following pages.

6-172

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6
Correcting a Faulty Flash PROM
You will have a problem if you lose power in the middle of a ashprom update. If the system only has on CPU/Memory board, you may need to replace it. But, if there are two CPU/Memory boards, there are a number of options for recovery.

update-proms
Assuming the system gets to the ok prompt, there will be a message stating that xxxxx Synchronize all Flash PROMs in the system of the same board types, to the most current level available in the system by typing ok update-proms

prom-copy
You can copy the contents of one I/O boards (slot 3) Flash PROM to another I/O board (slot 9)., for example. To do this, type

ok prom-copy 3 9

Open Boot PROM/NVRAM


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6-173

6
Correcting a Faulty Flash PROM - Updating within Extended POST
You can reprogram a corrupted PROM if another board of the same type with uncorrupted code is available. Refer to the Flash PROM Programming Guide, 805-5579, for more information. To reprogram a faulty FlashPROM: 1. Connect an ASCII terminal to Serial Port A. 2. Remove the board with corrupted code from the backplane. 3. Install a known good board in any available slot. 4. Turn the keyswitch to On. 5. Wait 15 seconds and press s to enter Extended POST. 6. Select f for fcopy from the Extended POST Menus. 7. Insert the board with corrupted code into the backplane (the board is hot-pluggable). 8. Select 4 for Activate System Board and follow the instructions. 9. Select 1 to copy the code and follow the instructions. 10. Turn the keyswitch to Standby.

6-174

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6
Synchronizing NVRAM/TOD chips
The NVRAM/TOD chip on the Clock board and all I/O boards contain the same information, including the NVRAM environmental variables and conguration settings. The master NVRAM/TOD parameters are kept on the NVRAM chip held on the Clock board. On occasion, you will see a message at the ok prompt stating: Clock TOD doe not match any I/0 board This means the NVRAM/TOD chip on the Clock board and the chip on all I/O boards has got out of step. Figure 7-1 illustrates how to recover a corrupted TOD Clock value.

Figure 6-2

NVRAM/TOD Contents Can Be Copied Automatically or Manually From One Source to Another

Open Boot PROM/NVRAM


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

6-175

6
Synchronizing NVRAM/TOD chips (cont)
This happens, for example, when a new I/O board is tted. To correct the time of day, copy the correct information from the clock board to the I/O boards. ok copy-clock-tod-to-io-boards

Correcting a Corrupted NVRAM/TOD


It could happen that the master chip gets corrupted.

If this happens, copy the contents from an I/O board with the correct data to the clock board TOD chip. ok (ioboard# in hex) copy-io-board-tod-to-clock-tod

In this example the correct data is on the I/O board in slot three. ok 3 copy-io-board-tod-to-clock-tod

6-176

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

Power On Self Test (POST)

7-177
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7
Introducing POST
Always runs after a reset
The Sun Enterprise servers always execute the power on self tests (POST) at power up and whenever a system reset is initiated. The POST initializes all of the hardware devices before OBP starts booting the operating system. The POST also identies new boards that have been installed in the system and makes them available to the OBP and the system.

Checks the environment


Once POST are complete, the OpenBoot PROM environmental monitoring process checks the temperature sensors in the system to detect any over heated conditions. If the temperature sensed is above the predened level, a warning message is written to the system console. If the temperature sensed exceeds a higher predened level, the OBP disables the board and places it into low power mode.

POST Output on Serial Port A


To effectively service Exx00 servers, there must be either VT100 type terminal connected to ttya or a tip session from another system.

POST resides on each system board


POST resides in the OBP on each CPU/Memory+ board.

POST sets LED indicators


POST controls the status LEDs on the system front panel and all boards. Only POST can congure the system hardware at power up, and only POST can enable hot-pluggable boards (if DR and AP are not present and operating).

7-178

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7
Level of testing
Over 90 percent of system board interconnects Over 80 percent of each system board ASICs Identify 95% of detectable faults to FRU level

Performance
Runtime should be less than 90 seconds (diag-level set to minimum) Code size should be less than 256 Kbytes for CPU boards Code size should be less than 64 Kbyes for I/O boards

Coverage
POST is designed to test just about everything that is internal to the system and the system boards. POST tests the following:
q q q q q q

CPU modules and caches System board ASICs (DC, AC, and FHC) Busses (SBus, UPA, centerplane, boot-bus) I/O ASICs (Sysio, FEPS, SOC) Clock board and console bus devices (NVRAM, TOD, EEPROM) DIMMS
w

Environmental Sensors

What POST doesnt cover


POST will not test SBus cards or PCI cards. In fact, there is a jumper on the PCI riser 501-8888 to enable or disable JTAG. Disable it or POST may hang.

Power On Self Test (POST)


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7-179

7
Introducing POST (cont)
Bootbus Controller
Otherwise known as the fhc (re hose controller!!) Each board in the system has an fhc which connects to a bootbus running on the gigaplane, and various on-board ASICs including the SRAM and temperature sesnsors. The purpose of the bootbus is twofold. It passes the POST data around the system, and is used by the clock board to pass NVRAM parameters to the CPU/Memory boards. Also connected to the fhc is the JTAG scan controller.

JTAG
JTAG is a 4-wire connection between various ASICs in the system. The spec was developed by the Joint Test Action Group, a group set up by the IEEE who give the spec its name, and is dened by IEEE 1149.1 Its purpose is to pass around POST information between boards and ASICs, assuming the ASICs are JTAG compliant.

Warning Not all ASICs in the system are JTAG compliant. Certainly not the ASICs on the PCI cards plugged into a Type 3 I/O board. Set the JTAG jumper on the PCI riser appropriately. For details regarding JTAG specs, scan rings etc refer to http://solutions.sun.com/embedded/databook/pdf/ whitepapers/WPR-0018-01.pdf

7-180

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7
Introducing POST (cont)
POST Master
After a Power-on reset (POR) each CPU module checks itself, its cache and its gigaplane interface using JTAG loops VIA the bootbus. POST runs in SRAM on each board. The rst CPU that passes is elected the POST Master, normally (0,0) The CPU and the OBP on the master system board, when determined, runs the self-test routines for each I/O board. It then sets the I/O board conguration parameters according to the resident rmware.

OBP Parameters
diag-switch? False, Diagnostic level determined by diag-level parameter True, full (verbose) diagnostics run min, minimum diagnostics run max, full (verbose) diagnostics run

diag-level

Keyswitch Positions
Normal power-on Diagnostic level determined by diag-level parameter

Diagnostic power-on Full (verbose) diagnostics run Note The diag-switch? and diag-level parameters are not particularly useful on the Enterprise servers, since if you want to run full diagnostics, you can power on the system by turning the keyswitch to the diagnostic position.

Power On Self Test (POST)


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7-181

7
Power on Self Test Overview
Sample Output
The following is an example of what you see if you have an ASCII terminal device connected to serial port ttya on the clock board of an Enterprise x000/x500 server. POST runs a complete and in-depth set of tests when the system keyswitch is set to the diagnostic position or the NVRAM parameter diag-switch? is set to true. Hardware Power ON POST COMPLETE 7,0> 7,0>@(#) POST 3.9.4 1998/06/09 16:25 7,0> SelfTest Initializing (Diag Level 10, ENV 0000ff00) IMPL 0011 MASK 20 7,0>Board 7 CPU FPROM Test 7,0>Board 7 Basic CPU Test 7,0> Set CPU UPA Config and Init SDB Data 7,0> SRAM Mode = 22, Clock Mode = 4:1, PCON = 6fa, MCAP = 0 7,0>Board 7 MMU Enable Test 7,0> DMMU Init 7,0> IMMU Init 7,0> Mapping Selftest Enabling MMUs 7,0>Board 7 Ecache Test 7,0> Ecache Probe 7,0> Ecache Tags 7,0> Ecache Quick Verify 7,0> Ecache Init 7,0> Ecache RAM 7,0> Ecache Address Line 7,0> Configure Ecache Limit 7,0>Ecache Size = 00400000, Limited to 00400000 7,0>Board 7 FPU Functional Test 7,0> FPU Enable 7,0>Board 7 Board Master Select Test 7,0> Selecting a Board Master 7,0>Board 7 FireHose Devices Test 7,0>Board 7 Address Controller Test 7,0> AC Initialization 7,0> AC DTAG Init

7-182

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7
7,0>Board 7 Dual Tags Test 7,0> AC DTAG Init 7,0>Board 7 FireHose Controller Test 7,0> FHC Initialization 7,0>Board 7 JTAG Test 7,0> Verify System Board Scan Ring 7,0>Board 7 Centerplane Test 7,0> Centerplane Join 7,0>Setting JTAG Master 7,0>Clear JTAG Master 7,0>Board 7 Setup Cache Size Test 7,0> Setting Up Cache Size 7,0>Board 7 System Master Select Test 7,0> Setting System Master 7,0>POST Master Selected (JTAG,CENTRAL) 7,0>Board 16 Clock Board Test 7,0> Clock Board Initialization 7,0> Clock Board Temperature Check 7,0>Board 16 Clock Board Serial Ports Test 7,0>Board 16 NVRAM Devices Test 7,0> M48T59 (TOD) Init 7,0>Board 7 System Board Probe Test 7,0> Probing all CPU/Memory BDA 7,0> Probing System Boards 7,0> Probing CPU Module JTAG Rings 7,0>Setting System Clock Frequency 7,0> CPU Module mid 14 Checked in OK (speed code = 4) 7,0> CPU mid 18 Version=00170011.20000507 7,0> CPU Module mid 18 Checked in OK (speed code = 4) 7,0> CPU mid 19 Version=00170011.20000507 7,0> CPU Module mid 19 Checked in OK (speed code = 4) 7,0> ******** Clock Reset - retesting 7,0>System Frequency (MHz),fcpu=248, fmod=124, fsys=82, fgen=496 7,0> 7,0>@(#) POST 3.9.4 1998/06/09 16:25 7,0> SelfTest Initializing (Diag Level 40, ENV 0000ff80) IMPL 0011 MASK 20 7,0>Board 7 CPU FPROM Test 7,0> CPU/Memory Board FPROM Checksum Test 7,0>Board 7 Basic CPU Test 7,0> FPU Registers and Data Path Test 7,0> Instruction Cache Tag RAM Test 7,0> Instruction Cache Instruction RAM Test 7,0> Instruction Cache Next Field RAM Test 7,0> Instruction Cache Pre-decode RAM Test 7,0> Data Cache RAM Test

Power On Self Test (POST)


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7-183

7
7,0> Data Cache Tags Test 7,0> DMMU Registers Access Test 7,0> DMMU TLB DATA RAM Access Test 7,0> DMMU TLB TAGS Access Test 7,0> IMMU Registers Access Test 7,0> IMMU TLB DATA RAM Access Test 7,0> IMMU TLB TAGS Access Test 7,0> Set CPU UPA Config and Init SDB Data 7,0> SRAM Mode = 22, Clock Mode = 3:1, PCON = 6fa, MCAP = 0 7,0>Board 7 MMU Enable Test 7,0> DMMU Init 7,0> IMMU Init 7,0> Mapping Selftest Enabling MMUs 7,0>Board 7 Ecache Test 7,0> Ecache Probe 7,0> Ecache Tags 7,0> Ecache Quick Verify 7,0> Ecache Init 7,0> Ecache RAM 7,0> Ecache 6N RAM Pattern Test 7,0> Ecache Address Line 7,0> Configure Ecache Limit 7,0>Ecache Size = 00400000, Limited to 00400000 7,0>Board 7 FPU Functional Test 7,0> FPU Enable 7,0>Board 7 Board Master Select Test 7,0> Selecting a Board Master 7,0>Board 7 FireHose Devices Test 7,0> PROM Datapath Test 7,0> FHC CPU SRAM Test 7,0>Board 7 Address Controller Test 7,0> AC Registers Test 7,0> AC Initialization 7,0> Memory Registers Test 7,0> Memory Registers Initialization Test 7,0> AC DTAG Init 7,0>Board 7 Dual Tags Test 7,0> AC DTAG Test 7,0> AC DTAG Init 7,0>Board 7 FireHose Controller Test 7,0> FHC Initialization 7,0>Board 7 JTAG Test 7,0> Verify System Board Scan Ring 7,0>Board 7 Centerplane Test 7,0> Centerplane and Arbiter Check Test 7,0>Setting JTAG Master

7-184

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7
7,0>Clear JTAG Master 7,0> Centerplane Join 7,0>Setting JTAG Master 7,0>Clear JTAG Master 7,0>Board 7 Setup Cache Size Test 7,0> Setting Up Cache Size 7,0>Board 7 System Master Select Test 7,0> Setting System Master 7,0>POST Master Selected (JTAG,CENTRAL)

Note At this point POST has completed the system board testing and assigned a master to start testing other boards on the backplane. For example, each I/O board has its own PROM containing information about the board (type, revision, speed) and tests for components and interfaces. The tests are initiated by the master CPU. I/O POST reports from these tests are sent to the master, indicating the state of the system. The master CPU deactivates I/O boards or components according to the report. 7,0>Board 16 Clock Board Test 7,0> Clock Board Registers Test 7,0> Clock Board Initialization 7,0> Clock Board Temperature Check 7,0>Board 16 Clock Board Serial Ports Test 7,0> 85C30 Register Test 7,0> 85C30 Serial Ports Test 7,0> Keyboard Loopback 7,0> Mouse Loopback 7,0> Serial Port B Loopback 7,0> Remote Serial Port A Loopback 7,0> Remote Serial Port B Loopback 7,0>Board 16 NVRAM Devices Test 7,0> M48T59 (TOD) Init 7,0> M48T59 (TOD) Functional Part 1 Test 7,0> NVRAM (Non-Destructive) Test 7,0>Board 7 System Board Probe Test 7,0> Probing all CPU/Memory BDA 7,0> Probing System Boards 7,0> Probing CPU Module JTAG Rings 7,0>Setting System Clock Frequency 7,0> CPU Module mid 14 Checked in OK (speed code = 4) 7,0> CPU mid 18 Version=00170011.20000507 7,0> CPU Module mid 18 Checked in OK (speed code = 4)

Power On Self Test (POST)


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7-185

7
7,0> CPU mid 19 Version=00170011.20000507 7,0> CPU Module mid 19 Checked in OK (speed code = 4) 7,0>System Frequency (MHz),fcpu=248, fmod=124, fsys=82, fgen=496 7,0>TESTING BOARD 1 7,0>Board 1 JTAG Test 7,0> Verify System Board Scan Ring 7,0>Board 1 Centerplane Test 7,0> Centerplane Check 7,0>Board 1 Address Controller Test 7,0> AC Registers Test 7,0> AC Initialization 7,0>Setting Freq to 25MHZ 7,0> Memory Registers Test 7,0> Memory Registers Initialization Test 7,0> AC DTAG Init 7,0>Board 1 FireHose Controller Test 7,0> FHC Initialization 7,0>Board 1 NVRAM Devices Test 7,0> M48T59 (TOD) Init 7,0> M48T59 (TOD) Functional Part 1 Test 7,0> NVRAM (Non-Destructive) Test 7,0>TESTING BOARD 3 7,0>Board 3 JTAG Test 7,0> Verify System Board Scan Ring 7,0>Board 3 Centerplane Test 7,0> Centerplane Check 7,0>Board 3 Address Controller Test 7,0> AC Registers Test 7,0> AC Initialization 7,0>Setting Freq to 25MHZ 7,0> Memory Registers Test 7,0> Memory Registers Initialization Test 7,0> AC DTAG Init 7,0>Board 3 FireHose Controller Test 7,0> FHC Initialization 7,0>Board 3 NVRAM Devices Test 7,0> M48T59 (TOD) Init 7,0> M48T59 (TOD) Functional Part 1 Test 7,0> NVRAM (Non-Destructive) Test 7,0>Re-mapping to Local Device Space 7,0>Begin Central Space Serial Port access 7,0>Enable AC Control Parity 7,0>Hotplug Trigger Test 7,0>Init Counters for Hotplug 7,0>Board 7 Cross Calls Test 7,0> Cross Calls Test

7-186

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7
7,0>Displaying PROM Versions 7,0>Slot 1 IO Type 4 FCODE 1.8.7 1997/12/8 15:39 iPOST 3.4.6 1998/4/16 14:22 7,0>Slot 3 IO Type 4 FCODE 1.8.7 1997/12/8 15:39 iPOST 3.4.6 1998/4/16 14:22 7,0>Slot 7 CPU/Memory OBP 3.2.16 1998/6/8 16:58 POST 3.9.4 1998/6/9 16:25 7,0>Slot 9 CPU/Memory OBP 3.2.16 1998/6/8 16:58 POST 3.9.4 1998/6/9 16:25 7,0>Board 7 Environmental Probe Test 7,0> Environmental Probe 7,0>Checking Power Supply Configuration 7,0>Power is more than adequate, load 4 ps 3 7,0>Reconfig memory due to POR or CLOCK RESET 7,0>Reconfig memory due to DIAG_LEVEL 7,0>Board 7 Probing Memory SIMMS Test 7,0> Probe SIMMID 7,0> Populated Memory Bank Status 7,0> bd # Size Address Way Status 7,0> 9 256 Normal 7,0>Board 7 Memory Configuration Test 7,0> Memory Interleaving 7,0> Total banks with 8MB SIMMs = 0 7,0> Total banks with 32MB SIMMs = 1 7,0> Total banks with 128MB SIMMs = 0 7,0> Total banks with 256MB SIMMs = 0 7,0> Overall memory default speed = 60ns 7,0>Do OPTIMAL INTLV 7,0> Board 9 AC rev 5 RCTIME = 0 (Tras 71) 7,0> Memory Refresh Enable 7,0>Board 7 SIMMs Test 7,0> MP Memory SIMM Clear Test 7,0> Memory Size is 256Mbytes 7,0> CPU MID 18 clearing 00000000.00004000 to 00000000.05500000 7,0> CPU MID 19 clearing 00000000.05500000 to 00000000.0aa00000 7,0> CPU MID 14 clearing 00000000.0aa00000 to 00000000.10000000 7,0> CPU MID 14 clearing 00000000.00000000 to 00000000.00004000 7,0> Memory Walking Rows and Columns Test 7,0> MP Memory SIMM (6N RAM Patterns) Test 7,0> Memory Size is 256Mbytes 7,0> CPU MID 18 testing 00000000.00000000 to 00000000.05500000 7,0> CPU MID 19 testing 00000000.05500000 to 00000000.0aa00000 7,0> CPU MID 14 testing 00000000.0aa00000 to 00000000.10000000 7,0> MP Memory SIMM (moving inverse) Test 7,0> Memory Size is 256Mbytes 7,0> CPU MID 18 testing 00000000.00000000 to 00000000.05500000

Power On Self Test (POST)


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7-187

7
7,0> CPU MID 19 testing 00000000.05500000 to 00000000.0aa00000 7,0> CPU MID 14 testing 00000000.0aa00000 to 00000000.10000000 7,0>Slave CPU Functional Tests 7,0> Slave CPU MID 18 started 9,0>Board 9 Functional CPU 0 Test 9,0> Dcache Init 9,0> Dcache Enable Test 9,0> Dcache Functionality Test 9,0> Ecache Stress Test 9,0> Ecache Functional Test 9,0> CPU Dispatch (Multi-Scalar) Test 9,0> SPARC Atomic Instructions Test 9,0> SPARC Prefetch Instructions Test 9,0> CPU Softint Registers and Interrupts Test 9,0> Uni-Processor Cache Coherence Test 9,0> Branch Memory Test 9,0> SDB ECC CE Test 9,0> SDB ECC Uncorrectable Test 9,0> FPU Instruction Test 7,0> Slave CPU MID 19 started 9,1>Board 9 Functional CPU 1 Test 9,1> Dcache Init 9,1> Dcache Enable Test 9,1> Dcache Functionality Test 9,1> Ecache Stress Test 9,1> Ecache Functional Test 9,1> CPU Dispatch (Multi-Scalar) Test 9,1> SPARC Atomic Instructions Test 9,1> SPARC Prefetch Instructions Test 9,1> CPU Softint Registers and Interrupts Test 9,1> Uni-Processor Cache Coherence Test 9,1> Branch Memory Test 9,1> SDB ECC CE Test 9,1> SDB ECC Uncorrectable Test 9,1> FPU Instruction Test 7,0>Board 7 Functional CPU 0 Test 7,0> Dcache Init 7,0> Dcache Enable Test 7,0> Dcache Functionality Test 7,0> Ecache Stress Test 7,0> Ecache Functional Test 7,0> CPU Dispatch (Multi-Scalar) Test 7,0> SPARC Atomic Instructions Test 7,0> SPARC Prefetch Instructions Test 7,0> CPU Softint Registers and Interrupts Test 7,0> Uni-Processor Cache Coherence Test

7-188

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7
7,0> Branch Memory Test 7,0> SDB ECC CE Test 7,0> SDB ECC Uncorrectable Test 7,0> FPU Instruction Test 7,0>TESTING IO BOARD 1 7,0>Board 1 I/O FPROM Test 7,0> I/O Board EPROM checksum Test 7,0>@(#) iPOST 3.4.6 1998/04/16 14:22 7,0> TESTING IO BOARD 1 ASICs 7,0> TESTING SysIO Port 0 7,0>Board 1 SysIO Registers Test 7,0> SysIO Register Initialization 7,0> IOMMU Registers and RAM Test 7,0> Streaming Buffer Registers and RAM Test 7,0> SBus Control and Config Registers Test 7,0> SysIO RAM Initialization 7,0>Board 1 SysIO Functional Test 7,0> Clear Interrupt Map and State Registers 7,0> SysIO Interrupts Test 7,0> SysIO Timers/Counters Test 7,0> IOMMU Virtual Address TLB Tag Compare Test 7,0> Streaming Buffer Flush Test 7,0> DMA Merge Buffer Test 7,0> SYSIO ECC Correctable Test 7,0> SYSIO ECC UnCorrectable Test 7,0> SysIO Sbus Probe Test 7,0> SysIO Register Initialization Test 7,0> SysIO RAM Initialization Test 7,0> Clear Interrupt Map and State Registers Test 7,0>Board 1 OnBoard IO Chipset (SOC) Test 7,0> SOC SRAM Test 7,0> SOC Registers Test 7,0> SOC Interrupt Test 7,0> Clear Interrupt Map and State Registers Test 7,0> TESTING SysIO Port 1 7,0>Board 1 SysIO Registers Test 7,0> SysIO Register Initialization 7,0> IOMMU Registers and RAM Test 7,0> Streaming Buffer Registers and RAM Test 7,0> SBus Control and Config Registers Test 7,0> SysIO RAM Initialization 7,0>Board 1 SysIO Functional Test 7,0> Clear Interrupt Map and State Registers 7,0> SysIO Interrupts Test 7,0> SysIO Timers/Counters Test 7,0> IOMMU Virtual Address TLB Tag Compare Test

Power On Self Test (POST)


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7-189

7
7,0> Streaming Buffer Flush Test 7,0> DMA Merge Buffer Test 7,0> SYSIO ECC Correctable Test 7,0> SYSIO ECC UnCorrectable Test 7,0> SysIO Sbus Probe Test 7,0> SysIO Register Initialization Test 7,0> SysIO RAM Initialization Test 7,0> Clear Interrupt Map and State Registers Test 7,0>Board 1 OnBoard IO Chipset (FEPS) Test 7,0> FAS366 Registers Test 7,0> ESP FAS366 DVMA burst mode read/write Test 7,0> FAS366 FIFO TO DMA Test 7,0> DMA TO FAS366 FIFO Test 7,0> FEPS (Ethernet) Registers Test 7,0> FEPS Ethernet(BM, DP83840, Twister) Internal Loopbacks Test 7,0> SysIO Register Initialization Test 7,0> SysIO RAM Initialization Test 7,0> Clear Interrupt Map and State Registers Test 7,0>IO BOARD 1 TESTED 7,0>TESTING IO BOARD 3 7,0>Board 3 I/O FPROM Test 7,0> I/O Board EPROM checksum Test 7,0>@(#) iPOST 3.4.6 1998/04/16 14:22 7,0> TESTING IO BOARD 3 ASICs 7,0> TESTING SysIO Port 0 7,0>Board 3 SysIO Registers Test 7,0> SysIO Register Initialization 7,0> IOMMU Registers and RAM Test 7,0> Streaming Buffer Registers and RAM Test 7,0> SBus Control and Config Registers Test 7,0> SysIO RAM Initialization 7,0>Board 3 SysIO Functional Test 7,0> Clear Interrupt Map and State Registers 7,0> SysIO Interrupts Test 7,0> SysIO Timers/Counters Test 7,0> IOMMU Virtual Address TLB Tag Compare Test 7,0> Streaming Buffer Flush Test 7,0> DMA Merge Buffer Test 7,0> SYSIO ECC Correctable Test 7,0> SYSIO ECC UnCorrectable Test 7,0> SysIO Sbus Probe Test 7,0> SysIO Register Initialization Test 7,0> SysIO RAM Initialization Test 7,0> Clear Interrupt Map and State Registers Test 7,0>Board 3 OnBoard IO Chipset (SOC) Test 7,0> SOC SRAM Test

7-190

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7
7,0> SOC Registers Test 7,0> SOC Interrupt Test 7,0> Clear Interrupt Map and State Registers Test 7,0> TESTING SysIO Port 1 7,0>Board 3 SysIO Registers Test 7,0> SysIO Register Initialization 7,0> IOMMU Registers and RAM Test 7,0> Streaming Buffer Registers and RAM Test 7,0> SBus Control and Config Registers Test 7,0> SysIO RAM Initialization 7,0>Board 3 SysIO Functional Test 7,0> Clear Interrupt Map and State Registers 7,0> SysIO Interrupts Test 7,0> SysIO Timers/Counters Test 7,0> IOMMU Virtual Address TLB Tag Compare Test 7,0> Streaming Buffer Flush Test 7,0> DMA Merge Buffer Test 7,0> SYSIO ECC Correctable Test 7,0> SYSIO ECC UnCorrectable Test 7,0> SysIO Sbus Probe Test 7,0> SysIO Register Initialization Test 7,0> SysIO RAM Initialization Test 7,0> Clear Interrupt Map and State Registers Test 7,0>Board 3 OnBoard IO Chipset (FEPS) Test 7,0> FAS366 Registers Test 7,0> ESP FAS366 DVMA burst mode read/write Test 7,0> FAS366 FIFO TO DMA Test 7,0> DMA TO FAS366 FIFO Test 7,0> FEPS (Ethernet) Registers Test 7,0> FEPS Ethernet (BM, DP83840, Twister) Internal Loopbacks Test 7,0> SysIO Register Initialization Test 7,0> SysIO RAM Initialization Test 7,0> Clear Interrupt Map and State Registers Test 7,0>IO BOARD 3 TESTED 7,0>SYSTEM LEVEL TESTING 7,0>Board 7 Cache Coherency Test 7,0> Multi-Processor Cache Coherence Test 7,0> Testing CPU MID 18 7,0> Testing CPU MID 19 7,0>Probing for Disk System boards 7,0>Board 7 System Interrupts Test 7,0> System Interrupts Test 7,0>Checking Power Supply Configuration 7,0>Power is more than adequate, load 4 ps 3 (Four boards, and 3 power supplies) 7,0> Check Board Present Test

Power On Self Test (POST)


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7-191

7
7,0> Board Present Interrupt Test 7,0> 7,0> System Board Status 7,0>----------------------------------------------------------------7,0> Slot Board Status Board Type Failures 7,0>----------------------------------------------------------------7,0> 0 | Not installed | | 7,0> 1 | Normal |+IO Type 4 | 7,0> 2 | Not installed | | 7,0> 3 | Normal |+IO Type 4 | 7,0> 4 | Not installed | | 7,0> 5 | Not installed | | 7,0> 6 | Not installed | | 7,0> 7 | Normal |+CPU/Memory | 7,0> 8 | Not installed | | 7,0> 9 | Normal |+CPU/Memory | 7,0> 16 | Normal | Clock Board | 7,0>----------------------------------------------------------------7,0> 7,0> CPU Module Status 7,0>----------------------------------------------------------------7,0> MID OK Cache Speed Version 7,0>----------------------------------------------------------------7,0> 14 | y | 4096 | 248 | 00170011.20000507 7,0> 18 | y | 4096 | 248 | 00170011.20000507 7,0> 19 | y | 4096 | 248 | 00170011.20000507 7,0>----------------------------------------------------------------7,0>System Frequency (MHz),fcpu=248, fmod=124, fsys=82, fgen=496 7,0> Populated Memory Bank Status 7,0> bd # Size Address Way Status 7,0> 9 256 0 0 Normal 7,0> 7,0> POST COMPLETE 7,0>Entering OBP

7-192

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7
POST Control Commands
The following are the control commands for POST. Note These commands are entered on the terminal connected to ttya or the keyboard of the workstation running the tip session. Dont try to enter these commands on the Sun keyboard connected to the clock board The toggle keys turn on and off the feature on each stoke of the key. There are two particularly useful commands:

s - Toggle Stop ag
This ag stops the POST on completion in the extended POST menus. Get into the habit of hitting the s key during POST which will then put you into the extended POST.

v - Toggle verbose print ag


Normally, the only way to get a display of POST to ttya is to power on in diagnostic mode or have diag-switch? set to true. By hitting the v key during a normal power-on, POST is displayed to ttya.

Power On Self Test (POST)


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7-193

7
POST Control Commands (cont)

Toggle Loop on full POST

7-194

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7
POST Menus

Power On Self Test (POST)


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7-195

7
POST Menus (cont)
Option 7... Display system summary
This is the most useful command, since it gives a display of the nal system conguration:

7,0> System Board Status 7,0>----------------------------------------------------------------7,0> Slot Board Status Board Type Failures 7,0>----------------------------------------------------------------7,0> 0 | Not installed | | 7,0> 1 | Normal |+IO Type 4 | 7,0> 2 | Not installed | | 7,0> 3 | Normal |+IO Type 4 | 7,0> 4 | Not installed | | 7,0> 5 | Not installed | | 7,0> 6 | Not installed | | 7,0> 7 | Normal |+CPU/Memory | 7,0> 8 | Not installed | | 7,0> 9 | Normal |+CPU/Memory | 7,0> 16 | Normal | Clock Board | 7,0>----------------------------------------------------------------7,0> 7,0> CPU Module Status 7,0>----------------------------------------------------------------7,0> MID OK Cache Speed Version 7,0>----------------------------------------------------------------7,0> 14 | y | 4096 | 248 | 00170011.20000507 7,0> 15 | y | 4096 | 248 | 00170011.20000507 7,0> 18 | y | 4096 | 248 | 00170011.20000507 7,0> 19 | y | 4096 | 248 | 00170011.20000507 7,0>----------------------------------------------------------------7,0>System Frequency (MHz),fcpu=248, fmod=124, fsys=82, fgen=496 7,0> Populated Memory Bank Status 7,0> bd # Size Address Way Status 7,0> 9 256 0 0 Normal

7-196

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

Warning Note the MID address for the processors. POST numbers processors in decimal (as does Solaris) whereas OBP numbers the processors in hex. BE AWARE OF THIS DIFFERENCE.... Experiment with the POST Menus. Some of the tests return a message STILL UNDER DEVELOPEMENT and should no be too heavily relied upon for fault nding.

Power On Self Test (POST)


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7-197

7
POST Board Status Messages
On completion of testing, POST will display the status of each board. There are four board status types:

Normal On-line/Failed Low-power mode


A component on that board has failed POST

Either the whole board has failed POST or the obp parameter configuration-policy is set to board or the board has been detached using dr

Not Installed

7,0> System Board Status 7,0>----------------------------------------------------------------7,0> Slot Board Status Board Type Failures 7,0>----------------------------------------------------------------7,0> 0 | Not installed | | 7,0> 1 | Normal |+IO Type 4 | 7,0> 2 | Not installed | | 7,0> 3 | Low Power Mode |+IO Type 4 | AC 7,0> 4 | Not installed | | 7,0> 5 | Not installed | | 7,0> 6 | Not installed | | 7,0> 7 | Online/failure |+CPU/Memory | CPU 1 7,0> 8 | Not installed | | 7,0> 9 | Normal |+CPU/Memory | 7,0> 16 | Normal | Clock Board | 7,0>-----------------------------------------------------------------

7-198

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7
Sample Error Messages

Power On Self Test (POST)


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7-199

7
POST Error Reporting
You can view the output from the last POST by running the showpost-results command. You can examine the report for error messages. The report generated by the show-post-results command displays a synopsis of the POST tests in a less confusing manner than the actual POST output you observed using the serial port connection. The symbols used in the show-post-results report are dened as follows:
q q q q

P = present *** = failed component NOT = Not found 0 = no failures

7-200

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7
POST Error Reporting (cont)
The following is a sample output from a show-post-results command. ok show-post-results Slot 0 - Status=Okay, Type: CPU/Memory Cpu0-OK=P FailCode=0 FHC=P SRAM=P Bank1=0 DTag0=P Bank1=Not DC=ff Cpu1=P FPROM=P DTag1=P Cpu1-OK=P LabCon=Not JTAG=P FailCode=0 Ovtemp=Not CntrPl=P

Cpu0=P AC=P Bank0=0 Bank0=P Slot

1 - Status=Fail, Type: IO board Type 2 Sysio1=P Sbus2=P FHC=P JTAG=P FEPS=P SRAM=P CntrPl=P FEPSFC=0 FPROM=P DC=ff SOC=P LabCon=Not FFB=P Ovtemp=Not

Sysio0=P Sbus0=P AC=P TODC=*** Slot

2 - Status=Okay, Type: CPU/Memory Cpu1=P FPROM=P DTag1=P Cpu1-OK=P LabCon=Not JTAG=P FailCode=0 Ovtemp=Not CntrPl=P

Cpu0=P Cpu0-OK=P FailCode=0 AC=P FHC=P SRAM=P Bank0=0 Bank1=0 DTag0=P Bank0=Not Bank1=Not DC=ff Slot 16 - Status=Fail, Type: Clock

Clock=P AC=P

Serial=P ACFan=P

KbdMse=P KeyFan=P

PPS-DC=P PSFail=0

DCReg0=P DCReg1=P Ovtemp=Not TODC=P RKFan=P

P = Present or Passed *** = Failed Component Not = Not present ok The following few pages provides a key to the show-post-results output.

Power On Self Test (POST)


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7-201

7
POST Error Reporting - denitions
CPU/Memory Board Cpu0/Cpu1 CPU modules on the board CPU{0,1}-OK CPU module status FailCode Failure code (valid only if CPU failed) FHC Fire Hose Controller SRAM Static RAM FPROM Flash PROM FHC Fire Hose Controller LabCon Lab Console Ovtemp Overtemp Bank0 Bank0 status (a bit indicates a missing or failed SIMM) Bank1 Bank1 status (a bit indicates a missing or failed SIMM) DTag0 DTags0 status DTag1 DTags1 status JTAG Jtag status CntrPl Centerplane status DC Data Controllers (0 bit indicates a failed DC) I/O Board Sysio0 SysIO 0 status Sysio1 SysIO 1 status FEPS Onboard FEPS chip FEPSFC FEPS fail code (valid only if failed) SOC Onboard SOC status FFB FFB card status Sbus0 SBus0 slot status Sbus1 SBus1 slot status Sbus2 SBus2 slot status AC Address Controller FHC Fire Hose Controller SRAM Static RAM FPROM Flash PROMs LabCon Lab Console Ovtemp Overtemp TODC Time of Day Clock JTAG JTAG status CntrPl Centerplane status DC Data Controllers (0 bit indicates a failed DC)

7-202

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7
Disk Board Disk0 Disk0 ID (valid only if disk present) Disk1 Disk1 ID (valid only if disk present) Disk0P Disk0 Present Disk1P Disk1 Present VDDOK SCSI VDD status Fan Fan Fail status JTAG JTAG status Clock Board Clock Clock running Serial Serial Port KBytes Keyboard Mouse status PPS-DC Peripheral PS ok (all DC levels OK) AC AC power status ACFan AC box fan status KeyFan KeySwitch fan status PSFail Power Supply fail status (bit position indicates which ps failure) Ovtemp Overtemp TODC Time of Day Clock V5-P Peripheral 5V V12-P Peripheral 12V V5-Aux Auxilary 5V V5P-PC Peripheral 5V Precharge V12-PC Peripheral 12V Precharge V3-PC System 3.3V Precharge V5-PC System 5.0V Precharge RKFan Rack Fan Status 3.3V Clock board 3.3 V 5.0V Clock board 5.0 V

Power On Self Test (POST)


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7-203

7
When things go wrong...
What constitutes a minimum system?
If you have a system which hangs under POST, or is unpredictable in its results, run POST with a minimum cong. You can run POST with a clock board, and a CPU/Memory board with one CPU module, and no memory. You do not need any memory for POST, since it runs in SRAM on each board.

Frequency Margining
Again, if you have intermittent faults, increase the frequency of the gigaplane interconnect to trap these faults. Do not margin it too high, since it will automatically fail.

loop on diagnostics
Remember the loop function which you can set on the POST control menu.

Warning POST does not check SBus cards, or peripherals. It is no use running POST with a loop command and with frequency margined high, if the fault is that the system will not see any disks.

7-204

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7
Accessing and Displaying POST
To access the hosts operating system from the console and to interact with OBP and POST programs, you must access the systems serial port A. For interactive capability you must have an ASCII terminal with keyboard attached to serial port A. Serial Port A

Null modem cable

ASCII terminal or workstation

tip session
The best method of getting POST output is to tip into the serial port A from another Sun system. Typically, you will tip out of port B on a workstaion.The method is outlined below. workstation# more /etc/remote | grep hardwire hardwire:dv=/dev/term/b:br#9600:el=^C^S^Q^U^D:ie=%$: oe=^D: workstation# tip hardwire connected

Note the tip commands.. ~# ~. break (stop-A) exit

Power On Self Test (POST)


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

7-205

7-206

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

Internal Disk Subsystems

8-207
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

8
Internal Storage Capacities
Sun Enterprise systems have the following maximum internal storage capacities:
q

Sun Enterprise 3000 Up to ten 18.2-Gbyte SCSI drives are used to populate the internal bays Sun Enterprise 3500 Up to eight 36.4-Gbyte FC-Al dual-ported disks drives can be used to populate the internal bays Sun Enterprise 4x00 and Enterprise 5x00 Up to eight 18.2-Gbyte SCSI drives, mounted on four disk boards Sun Enterprise 6x00 Up to four 18.2-Gbyte SCSI drives, mounted on two disk boards

8-208

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

8
Disk Subsystems
Sun Enterprise Servers can support several terabytes of disk storage when external assemblies are used. This module focuses on disks that are congured as internal devices.

The SCSI Disk Board


With the exception of the Sun Enterprise 3500 and 3000 systems, the Sun Enterprise x500 servers support dual-SCSI disk boards that contain one or two UltraSCSI disk drives. The disk board capacity for these servers varies as follows:
q q q

The Sun Enterprise 4x00, supports up to four disk boards. The Sun Enterprise 5x00, supports up to four disk boards. The Sun Enterprise 6x00, supports only two disk boards maximum. This is due to the fact that the disk boards do not put a load on the gigaplane. Indeed, the only thing the disk board does take from the gigaplane is power. Putting more than two disk boards in an E6x00 would leave spaces on the bus, which is not allowed. (This is why we have load boards in empty slots). Disk boards are limited to slots 14 and 15 only, which are the slots closest to the gigaplane terminators.

The SCSI Disk Board Addressing


SCSI addressing is assigned according to the Gigaplane slot in which the board is installed, as shown in Table 8-1.

Internal Disk Subsystems


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

8-209

8
Note The SCSI disk board requires a SCSI-2 interface from an I/O board that connects to the external SCSI-2 port. The SCSI disk boards can be daisy-chained so only one interface is required Table 8-1 SLOT 0 1 2 3 4 5 6 7 DISK 0 ADDRESS 4 6 0 10 2 12 8 14 Default Drive Address Settings DISK 1 ADDRESS 5 7 1 11 3 13 9 15 SLOT 8 9 10 11 12 13 14 15 DISK 0 ADDRESS 10 0 12 2 14 8 0 10 DISK 1 ADDRESS 11 1 13 3 15 9 1 11

Jumpers J0702 and J0703 override the default drive address settings as shown in Table 8-2, assigned by the centerplane slot position. Table 8-2 SCSI Disk Board Disk Addressing Override Jumper Congurations PINS 1-2 1-2 A0-A3 1-2 1-2 A0-A3 1-2 1-2 SETTING Out In As required Out In As required As required As required DESCRIPTION Disk 0 default address selection Disk 0 manual address selection Disk 0 address select Disk 1 default address selection Disk 1 manual address selection Disk 1 address select Disk 0 delay spin Disk 1 delay spin

JUMPER J0702

J0703

J0705 J0706

8-210

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

8
Disk Addressing
You can type a complete physical path name or a complete logical path name to specify the device or controller. How Solaris derives device addresses is covered in the upcoming Solaris module. In this module, you are given sample addresses both for SCSI devices and FC-AL devices.
q

Physical addresses are designed to follow a hardware tree to a specic device. Logical addresses allow applications to point to a specic device an a specic bus. Solaris performs the translation between logical and physical addresses transparent to the end-user.

Examples
A typical physical path name for a disk device is: /sbus@3,0/SUNW,fas@3,880000/sd@0,0:a,raw or /sbus@2,0/SUNW,socal@d,10000/sf@0,0/ssd@w2100002037000f96,0:a,raw A typical logical path name is: c2t1d0s1 Additional information on addressing that is specic to the server type is covered with the individual servers.

Internal Disk Subsystems


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

8-211

8
Sun Enterprise 3500
Enterprise 3500 Fibre Channel Interface Board
This is a new board designed to provide connectivity to the internal disk drives in the Sun Enterprise 3500 server. The internal disk drives operate with the bre channel arbitrated loop (FC-AL) architecture. Each of the four potential FC-AL loops corresponds to one of four gigabit interface converter (GBIC) modules on the Fibre channel interface board. Part Number 501-4820

GBIC LA GBIC LB GBIC UA GBIC UB

Figure 8-1

Sun Enterprise 3500 Fibre Channel Interface Board

The Fibre channel interface board comes with two hot-pluggable GBIC modules. The 2-meter bre channel cables establish a loop or connection with the internal disk drives. This board is part of the standard internal disk drive option. If no internal drives are ordered, this board is not present. Table 8-3 GBIC to Disk Drive Bay and Drive Port Connection Drive Port A B A B GBIC name GBIC LA (lower bank) GBIC LB (lower bank GBIC UA (upper bank) GBIC UB (upper bank)

Disk Drives 0, 1, 2, 3 0, 1, 2, 3 4, 5, 6, 7 4, 5, 6, 7

8-212

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

8
Enterprise 3500 Fibre Channel Interface Board
The Sun Enterprise 3500 can be ordered without internal disk drives. Any of the bootable external Sun StorEdge disk products (such as the Sun StorEdge UniPack, MultiPack, D1000, A3500, and A5X00 products) can be used as a boot device for a Sun Enterprise 3500 without internal disk drives. Such a conguration does not require an FC-AL Interface board because the FC-AL Interface boards only purpose is to connect to internal disk bays. The interface boards can be connected to the SBus I/O Board and the Graphics I/O Board which both come with a pair of on-board 100 MB/second FC-AL sockets. In addition, both types of boards support a SBus Host Adapter that has a pair of 100 MB/second FC-AL sockets. Each of these pairs of sockets can support the internal disk drives in the Sun Enterprise 3500 or the Sun StorEdge A5000, but they cannot be split up so that one supports one type of device while the other socket supports a different type of device. However, a PCI-only conguration in a Sun Enterprise 3500 does not provide a way to connect the internal FC-AL disk drives. This is because the PCI I/O Board does not have on-board FC-AL sockets and there currently is no PCI FC-AL card available. So, if you want to use the internal disk drives in the Sun Enterprise 3500, you must have at least one SBus I/O or one Graphics I/O Board installed. There are no plans to add on-board FC-AL sockets to the PCI I/O Board because there is not enough physical space on the board to accommodate on-board FC-AL sockets. Even though the FC-AL connection cannot be split between internal and external connection, the individual FC-AL connections on the FC-AL Interface board are logically independent. The components do get their power through a single connection. However, the power to the FC-AL Interface board comes from the backplane which is supported by redundant power supplies. Therefore the design has practically no single point of failure.

Internal Disk Subsystems


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

8-213

8
Fibre Channel Interface Board
The FC-AL board comes with two GBIC modules and one 2-meter bre channel cable to establish one loop (connection).

Fibre channel cable SBus Card SBus Card

Ethernet SCSI SBus Card

GBICs

{ {
To lower disk bays To upper disk bays Data Address Control Control 288 41 UPA Bus Gigaplane Bus Connector SBus I/O board Interface board

Figure 8-2

Basic FC-AL Loop

One GBIC module is installed on the FC-AL Interface board and, typically, the other is installed on the I/O board (or SBus card) leaving three empty FC-AL sockets on the FC-AL Interface board. Each additional loop requires two additional GBIC modules and one 2meter bre channel cable. The GBIC modules on the FC-AL Interface board are exactly the same as those used in the Sun StorEdge A5X00 arrays, FC-AL SBus Host Adapter, and on the SBus I/O board.

8-214

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

8
Sun Enterprise 3500 - Disk Addressing
A typical conguration, as illustrated in , takes advantage of the dual-ported capability of the Sun Enterprise 3500 disk structure. Having two paths to each disk allows eliminates the path to disk as a single point of failure.

IB UB 4 e1 0 ef 5 e0 1 e8 6 dc 2 e4 7 da 3 e2 LA LB

I/O

I/O

UA

9 Sun Enterprise 3500 Disk Conguration

In the Sun Enterprise 3500, the lower four drives are congurable as one group of disks, or they can be accessed as two smaller independent groups of disks. The conguration is application dependant. The same is true for the upper four disk bays.

Internal Disk Subsystems


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

8-215

8
probe-fcal-all
A new command has been introduced to look at the FC-AL disk on an E3500. {e} ok probe-fcal-all /sbus@6,0/SUNW,socal@d,10000/sf@1,0 /sbus@6,0/SUNW,socal@d,10000/sf@0,0 WWN 200d080020940232 Loopid 1 WWN 21000020370cbc0e Loopid e1 Disk SEAGATE ST19171FCSUN9.0G117E9804P938 /sbus@2,0/SUNW,socal@d,10000/sf@1,0 /sbus@2,0/SUNW,socal@d,10000/sf@0,0 WWN 2005080020940232 Loopid 1 WWN 21000020370d8ad0 Loopid ef Disk SEAGATEST19171FCSUN9.0G117E9814T324

Each disk in an E3500 has an independent world-wide number (WWN). These numbers are assigned by the manufacturer and are unique to the disk. The FC-AL specication states that each component in a bre channel loop must have a unique WWN. This includes the interface boards. The WWN of the IBs is derived from the host MAC address, in this case 8:00:20:94:02:32 The WWN is mapped to a logical path at install time. Do a long listing on the logical path to view how the numbers relate. # ls -l /dev/dsk/c0t0d0s0

8-216

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

8
lrwxrwxrwx 1 root /dev/dsk/c0t0d0s0 -> root 74 Jan 22 15:00

../../devices/sbus@2,0/SUNW,socal@d,10000/ssd@w21000 020370d8ad0,0:a Fortunately, we dont have to boot the device using the WWN. We can boot using the disk id. ok boot /sbus@2,0/SUNW,socal@d,10000/ssd@0,0 The proper approach is to put the above in the boot-device parameter of the NVRAM and then boot from the alias ok devalias disk disk=/sbus@2,0/SUNW,socal@d,10000/ssd@0,0 ok boot disk

Internal Disk Subsystems


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

8-217

8
Sun Enterprise 3500 - Boot Disk Replacement
A host that boots from a non-mirrored FCAL disk (either an A5000 or the E3500 internal disks) will have to overcome the hard-coded World Wide Number (WWN) that each of these disks uses as an integral part of their device path. On failure of the boot disk the systems administrator must ensure that this WWN is correctly updated throughout the system to ensure it will reboot.

Procedure
When the boot disk is replaced, and a system is booted from CD-ROM, a device tree is built in memory as part of the boot sequence. But, when the data is restored from a backup tape, the old path_to_inst le with the old WWN is put back on the disk. To recover, mount the root lesystem which you have now restored on /a. Run the following commands to re-build the devices tree: # # # # # drvconfig -r /a -p /a/etc/path_to_inst cd /devices find . -print | cpio -pduVm /a/devices disks -r /a devlinks -r /a

NOTE: It is currently necessary to use both "drvcong" and "nd | cpio" due to bugid 4161768, drvcong does not work properly with socal disks. Restore the other lesystems on that disk, or comment out the entries for them from /a/etc/vfstab. At least you must have all the Solaris lesystems (root, /var, /usr, /opt, etc.) recovered. Reboot the system from the recovered disk. For full details, see Internal SRDB 17658

8-218

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

8
Sun Enterprise 3500 - Data Disk Replacement
We will still have to overcome the hard-coded World Wide Number (WWN) that each of these disks uses as an integral part of their device path.

Procedure
Ensure the the following patches are installed or higher Solaris 2.6 sf/socal/ib/luxadm patch - 105375-10 ssd patch - 105356-08 Solaris 2.5.1 sf/socal/ib/luxadm patch - 105310-08 ssd patch - 104708-16 These provide support for the luxadm commands on the E3500. Unmount the disk and then stop it with # luxadm stop <logical path, physical path or WWN ...> Remove the device entries,the following command will complete this # luxadm remove_device <logical path, physical path or WWN ...> Replace the disk and then # luxadm insert <no arguments required> This will recreate the device entries, the device is now ready to be used. For full details, see Internal SRDB 18595

Internal Disk Subsystems


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

8-219

8
Sun Enterprise 3000 Disk Addressing

As you look at the front of an E3000, the top four disks are assigned scsi targets 0-3, and the bottom six disks are assigned scsi targets 10-15. Note that the system addresses the disks in hex Note: All ten drives plus the tape unit and CD-ROM are driven from the onboard scsi controller in slot 1.

8-220

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

8
I/O Addressing Test
The following output has been generated from an E3500. Outline all the boards within the system, with part numbers. You may assume that we have 400MHz processors.

{e} ok show-disks a) b) c) d) e) f) g) h) q) /pci@b,4000/SUNW,isptwo@3/sd /sbus@7,0/SUNW,fas@0,8800000/sd /sbus@7,0/SUNW,fas@3,8800000/sd /sbus@6,0/SUNW,socal@d,10000/sf@1,0/ssd /sbus@6,0/SUNW,socal@d,10000/sf@0,0/ssd /sbus@3,0/SUNW,fas@3,8800000/sd /sbus@2,0/SUNW,socal@d,10000/sf@1,0/ssd /sbus@2,0/SUNW,socal@d,10000/sf@0,0/ssd NO SELECTION

Internal Disk Subsystems


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

8-221

8-222

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

Solaris Support Utilities

9-223
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

9
How Solaris References System Components
In the Solaris 2.x and 7 operating environments, system components are referenced in three different ways:
q

Logical device names Names used by system administrators and software to access system resources. Physical device names Names that represent the full device path name in the device information hierarchy (or tree). Instance names The kernels abbreviated names for every possible device on the system. dmesg displays instance names, such as sd0 and sd1.

Logical Device Names


These names are symbolically linked to their corresponding physical device (/devices) names. The logical names are located in the /dev directory and are created at the same time as the physical names. It is important to remember that in most cases, software applications and system administrators view system resources (such as disk) through their logical names. When a system fault occurs, it might be necessary to translate a devices logical name to some physical identier so that you can repair the problem. The next few pages will show you the relationship between the logical name and the physical name. The following examples show the logical names of a diskette drive and hard disk drive 0. # ls /dev/diskette* /dev/diskette /dev/diskette0 # ls /dev/rdsk/c0t0d0* c0t0d0s0 c0t0d0s1 c0t0d0s2 c0t0d0s3 c0t0d0s4 c0t0d0s5 c0t0d0s6 c0t0d0s7

9-224

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

9
Figure 9-1 shows the relationship of the hard disk drive logical name syntax to traditional SCSI components. /dev/[r]dsk/c#t#d#s# Slice or partition number Disk or logical unit number (LUN) Target number Controller number Figure 9-1 Logical Name Syntax

Solaris Support Utilities


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

9-225

9
How Solaris References System Components (cont)
Physical Device Names
The physical names are located in the /devices directory where the entries are created during installation or subsequent automatic device conguration or by using the drvconf command. The device le provides a pointer to the kernel device drivers.
q

The following examples show the relationship of the diskette drive and hard disk drive 0 physical names to their logical names.

Note The following example is from an Enterprise 450. # ls -l /dev/diskette* lrwxrwxrwx 1 root root 49 Aug 5 13:52 /dev/diskette -> /devices/pci@1f,4000/ebus@1/fdthree@14,3023f0:c lrwxrwxrwx 1 root root 49 Aug 5 13:52 /dev/diskette0 -> /devices/pci@1f,4000/ebus@1/fdthree@14,3023f0:c # ls -l /dev/rdsk/c0t0d0s0 lrwxrwxrwx 1 root root 45 Aug 5 13:52 /dev/rdsk/c0t0d0s0 -> /devices/pci@1f,4000/scsi@3/sd@0,0:a,raw

The next two examples show the corresponding OBP device tree and devalias entries for the same two devices.

ok show-devs . /pci@1f,4000/ebus@1/fdthree@14,3023f0:c . /pci@1f,4000/scsi@3/disk . ok devalias . floppy . disk0 .

/pci@1f,4000/ebus@1/fdthree /pci@1f,4000/scsi@3/sd@0,0

9-226

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

9
How Solaris References System Components (cont)
Instance Names
In the Solaris 2.x and 7 environments, the instance name is bound to the physical name by references in the /etc/path_to_inst le. The device instance is the number on the right side of the le (the number is in bold in the displayed output for each device in the following example). The kernel uses these names to identify every possible device instance. The instance numbers are assigned in order of insertion/conguration and therefore do not necessarily follow any recognizable or usable pattern. However, they do map to groupings of the minor device numbers listed in the /devices/... sub-directories. The following example shows the entries in the /etc/path_to_inst le for the same diskette drive and hard disk drive 0 seen earlier. /pci@1f,4000/ebus@1/fdthree@14,3023f0 0 fd /pci@1f,4000/scsi@3/sd@0 0 sd

Solaris Support Utilities


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

9-227

9
Conguring Components in Solaris (cont)
Automatic Device Conguration
The kernel, consisting of a small generic core with a platform-specic component and a set of modules, is congured automatically in the Solaris environment. A kernel module is a hardware or software component that is used to perform a specic task on the system. An example of a loadable kernel module is a device driver that is loaded when the device is accessed. The system determines what devices are attached to it at boot time. Then the kernel congures itself dynamically, loading needed modules into memory. At this time, device drivers are loaded when devices, such as disk and tape devices, are accessed for the rst time. This process is called autoconguration because all kernel modules are loaded automatically when needed.

Adding New Components to Solaris


Note The following procedure should be used only when conguring components that not hot-pluggable and/or Dynamic Reconguration is unavailable. If Solaris is running, perform the following steps: 1. Become superuser. 2. Create the /reconfigure le. # touch /reconfigure The /recongure le causes the Solaris software to check for the presence of any newly installed devices the next time you turn on or boot your system. 3. Shut down the system. # shutdown -i0 -g30 -y 4. Turn off power to the system after it is shut down.

9-228

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

9
5. Turn off the system. 6. Install the device. 7. Turn on the power to the system. The system will boot to multiuser mode and the login prompt will be displayed. 8. Verify that the device has been congured. Note If the system is in OBP, execute the boot -r command to force a Solaris reconguration.

Solaris Support Utilities


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

9-229

9
How to a Add a Device Driver
This procedure assumes that the device has already been added to the system. 1. Become superuser. 2. Place the tape, diskette, or CD-ROM into the appropriate drive. 3. Use the pkgadd command to install the driver. # pkgadd -d device package-name where -d device Identies the device pathname. package-name Identies the package name that contains the device driver. 4. Verify that the package has been added correctly by using the pkgchk command. The system prompt returns with no response if the package is installed correctly. # pkgchk packagename

9-230

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

9
Displaying System Conguration Information - prtconf, sysdef and format
Solaris provides you with a variety of utilities that you can use to monitor Sun Enterprise systems. The following is a list of utilities to display system and device conguration information:
q

prtconf Displays system conguration information, including total amount of memory and the device conguration as described by the systems device hierarchy. The output displayed by this command depends upon the type of system. sysdef Displays device conguration information including system hardware, pseudo devices, loadable modules, and selected kernel parameters. format Displays both logical and physical device names.

The prtconf Utility


The following prtconf output is displayed on a Enterprise 450 system. To execute the prtconf command, type the following: # /usr/sbin/prtconf System Configuration: Sun Microsystems Memory size: 256 Megabytes System Peripherals (Software Nodes): sun4u

SUNW,Ultra-4 packages (driver not attached) terminal-emulator (driver not attached) deblocker (driver not attached) obp-tftp (driver not attached) disk-label (driver not attached) ufs-file-system (driver not attached) openprom (driver not attached) client-services (driver not attached) options, instance #0 aliases (driver not attached) memory (driver not attached) virtual-memory (driver not attached) associations slot2disk slot2led

Solaris Support Utilities


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

9-231

9
slot2dev pci, instance #0 ebus, instance #0 auxio (driver not attached) power, instance #0 (driver not attached) SUNW,pll (driver not attached) sc (driver not attached) se, instance #0 su, instance #0 su, instance #1 ecpp, instance #0 (driver not attached) fdthree, instance #0 eeprom (driver not attached) flashprom (driver not attached) . .

9-232

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

9
The sysdef Utility
The following sysdef output is displayed on a Enterprise 450 system. To execute the sysdef command, type the following: # /usr/sbin/sysdef * * Hostid * 8095febb * * sun4u Configuration * * * Devices * packages (driver not attached) terminal-emulator (driver not attached) deblocker (driver not attached) obp-tftp (driver not attached) disk-label (driver not attached) ufs-file-system (driver not attached) openprom (driver not attached) client-services (driver not attached) options, instance #0 aliases (driver not attached) memory (driver not attached) virtual-memory (driver not attached) associations (driver not attached) slot2disk (driver not attached) slot2led (driver not attached) slot2dev (driver not attached) counter-timer (driver not attached) pci, instance #0 ebus, instance #0 auxio (driver not attached) power, instance #0 (driver not attached) SUNW,pll (driver not attached) sc (driver not attached) se, instance #0 su, instance #0 su, instance #1 fdthree, instance #0 eeprom (driver not attached) flashprom (driver not attached)

Solaris Support Utilities


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

9-233

9
SUNW,envctrl, instance #0 network, instance #0 (driver not attached) scsi, instance #0 disk (driver not attached) tape (driver not attached) sd, instance #0 sd, instance #1 sd, instance #2 sd, instance #3 sd, instance #4 (driver not attached) sd, instance #5 (driver not attached) sd, instance #6 (driver not attached) sd, instance #7 (driver not attached) sd, instance #8 (driver not attached) sd, instance #9 (driver not attached) sd, instance #10 (driver not attached) sd, instance #11 (driver not attached) sd, instance #12 (driver not attached) sd, instance #13 (driver not attached) sd, instance #14 (driver not attached) scsi, instance #1 disk (driver not attached) tape (driver not attached) sd, instance #15 sd, instance #16 sd, instance #17 sd, instance #18 sd, instance #19 (driver not attached) sd, instance #20 (driver not attached) sd, instance #21 (driver not attached) sd, instance #22 (driver not attached) sd, instance #23 (driver not attached) sd, instance #24 (driver not attached) sd, instance #25 (driver not attached) sd, instance #26 (driver not attached) sd, instance #27 (driver not attached) sd, instance #28 (driver not attached) sd, instance #29 (driver not attached) pci, instance #1 mc (driver not attached) bank (driver not attached) dimm (driver not attached) dimm (driver not attached) dimm (driver not attached) dimm (driver not attached) bank (driver not attached)

9-234

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

9
bank (driver not attached) bank (driver not attached) SUNW,UltraSPARC-II (driver not attached) pci, instance #2 pci, instance #3 pci, instance #4 SUNW,m64B, instance #0 pci, instance #5 pseudo, instance #0 clone, instance #0 ip, instance #0 tcp, instance #0 . . * * Loadable Objects * * Loadable Object Path = /platform/SUNW,Ultra-4/kernel * misc/platmod misc/sparcv9/platmod * * Loadable Object Path = /platform/sun4u/kernel * cpu/sparcv9/SUNW,UltraSPARC-II cpu/sparcv9/SUNW,UltraSPARC-IIi cpu/sparcv9/SUNW,UltraSPARC * * Loadable Object Path = /kernel * drv/isp drv/log drv/le . . * * Loadable Object Path = /usr/kernel * drv/sparcv9/tnf drv/sparcv9/audiocs drv/sparcv9/dbri strmod/u8lat2 * * System Configuration * * swap files

Solaris Support Utilities


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

9-235

9
swapfile dev swaplo blocks free /dev/dsk/c0t0d0s1 32,1 16 308800 308800 * * Tunable Parameters * 5095424 maximum memory allowed in buffer cache (bufhwm) 3898 maximum number of processes (v.v_proc) 99 maximum global priority in sys class (MAXCLSYSPRI) 3893 maximum processes per user id (v.v_maxup) 30 auto update time limit in seconds (NAUTOUP) 25 page stealing low water mark (GPGSLO) 5 fsflush run rate (FSFLUSHR) 25 minimum resident memory for avoiding deadlock (MINARMEM) 25 minimum swapable memory for avoiding deadlock (MINASMEM) . .

9-236

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

9
The format Utility
The format utility is normally used to prepare a disk drive for access by the Solaris operating system. Maintenance personnel also use this utility as a visibility tool to determine which disk drives can be seen by Solaris. To execute the format command, type the following: # format AVAILABLE DISK SELECTIONS: 0. c0t0d0 <SUN9.0G cyl 4924 alt 2 hd 27 sec 133> /pci@1f,4000/scsi@3/sd@0,0 1. c0t3d0 <SUN9.0G cyl 4924 alt 2 hd 27 sec 133> /pci@1f,4000/scsi@3/sd@3,0 Specify disk (enter its number): This format example identies two 9 GByte SCSI disk drives (sd@0,0 and sd@3,0) Note Press Control-d to exit the format utility. Here is a rather more realistic example: # format AVAILABLE DISK SELECTIONS: 0. c0t12d0 <SUN2.1G cyl 2733 alt 2 hd 19 sec 80> /sbus@3,0/SUNW,fas@3,8800000/sd@c,0 1. c0t13d0 <SUN2.1G cyl 2733 alt 2 hd 19 sec 80> /sbus@3,0/SUNW,fas@3,8800000/sd@d,0 2. c1t0d0 <SUN2.1G cyl 2733 alt 2 hd 19 sec 80> /sbus@7,0/SUNW,fas@3,8800000/sd@0,0 3. c1t1d0 <SUN2.1G cyl 2733 alt 2 hd 19 sec 80> /sbus@7,0/SUNW,fas@3,8800000/sd@1,0 4. c2t4d0 <SYMBIOS-RSMArray2000-0204 cyl 8182 alt /pseudo/rdnexus@2/rdriver@4,0 5. c2t4d1 <SYMBIOS-RSMArray2000-0204 cyl 8182 alt /pseudo/rdnexus@2/rdriver@4,1 6. c2t4d2 <SYMBIOS-RSMArray2000-0204 cyl 8182 alt /pseudo/rdnexus@2/rdriver@4,2 7. c3t5d3 <SYMBIOS-RSMArray2000-0204 cyl 8182 alt /pseudo/rdnexus@3/rdriver@5,3 8. c3t5d4 <SYMBIOS-RSMArray2000-0204 cyl 8182 alt /pseudo/rdnexus@3/rdriver@5,4 9. c3t5d5 <SYMBIOS-RSMArray2000-0204 cyl 8182 alt

2 hd 64 sec 64> 2 hd 64 sec 64> 2 hd 64 sec 64> 2 hd 64 sec 64> 2 hd 64 sec 64> 2 hd 64 sec 64>

Solaris Support Utilities


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

9-237

9
/pseudo/rdnexus@3/rdriver@5,5 10. c4t5d0 <SYMBIOS-RSMArray2000-0205 /pseudo/rdnexus@4/rdriver@5,0 11. c4t5d2 <SYMBIOS-RSMArray2000-0205 /pseudo/rdnexus@4/rdriver@5,2 12. c4t5d3 <SYMBIOS-RSMArray2000-0205 /pseudo/rdnexus@4/rdriver@5,3 13. c4t5d4 <SYMBIOS-RSMArray2000-0205 /pseudo/rdnexus@4/rdriver@5,4 14. c5t4d1 <SYMBIOS-RSMArray2000-0205 /pseudo/rdnexus@5/rdriver@4,1 Specify disk (enter its number): cyl 8108 alt 2 hd 64 sec 64> cyl 8106 alt 2 hd 64 sec 64> cyl 8106 alt 2 hd 64 sec 64> cyl 8108 alt 2 hd 64 sec 64> cyl 8106 alt 2 hd 64 sec 64>

9-238

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

9
Displaying Diagnostic Information
In addition to monitoring utilities, Solaris provides you with commands that you can use to display diagnostic information. The following commands are used for this purpose:
q

dmesg Looks in a system buffer for recently printed diagnostic messages and prints them on the standard output. prtdiag Displays displays system conguration and diagnostic information. The diagnostic information lists any failed Field Replaceable Units (FRUs) in the system.

Note /var/adm/messages Contains error messages relative to the current operating system initialization.

The dmesg Command


The following dmesg output is from an Enterprise 450 system. To execute the dmesg command, type the following: # /usr/sbin/dmesg Mon Aug 9 12:50:07 MDT 1999 Aug 5 14:02:31 proto144 unix: pseudo-device: winlock0 Aug 5 14:02:31 proto144 unix: winlock0 is /pseudo/winlock@0 Aug 5 14:02:31 proto144 unix: pseudo-device: devinfo0 Aug 5 14:02:31 proto144 unix: devinfo0 is /pseudo/devinfo@0 Aug 5 14:02:32 proto144 unix: pseudo-device: vol0 Aug 5 14:02:32 proto144 unix: vol0 is /pseudo/vol@0 Aug 5 14:02:32 proto144 unix: pseudo-device: llc10 Aug 5 14:02:32 proto144 unix: llc10 is /pseudo/llc1@0 Aug 5 14:02:32 proto144 unix: pseudo-device: pm0 Aug 5 14:02:32 proto144 unix: pm0 is /pseudo/pm@0 Aug 5 14:02:32 proto144 unix: pseudo-device: tod0 Aug 5 14:02:32 proto144 unix: tod0 is /pseudo/tod@0 Aug 5 14:02:32 proto144 unix: ecpp0 at ebus0: offset 14,3043bc Aug 5 14:02:32 proto144 unix: ecpp0 is /pci@1f,4000/ebus@1/ecpp@14,3043bc Aug 5 14:02:59 proto144 unix: SUNW,hme0: Link Down - cable problem? Aug 5 14:03:09 proto144 last message repeated 2 times Aug 6 10:07:50 proto144 unix: syncing file systems... Aug 6 10:07:50 proto144 unix: done

Solaris Support Utilities


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

9-239

9
Aug 6 10:08:37 proto144 unix: SunOS Release 5.7 Version Generic_10654104 64-bit [UNIX(R) System V Release 4.0] Aug 6 10:08:37 proto144 unix: Copyright (c) 1983-1999, Sun Microsystems, Inc. Aug 6 10:08:37 proto144 unix: Ethernet address = 8:0:20:95:fe:bb Aug 6 10:08:37 proto144 unix: mem = 262144K (0x10000000) Aug 6 10:16:45 proto144 unix: avail mem = 250568704 Aug 6 10:16:45 proto144 unix: root nexus = Sun Enterprise 450 (UltraSPARC-II 296MHz) Aug 6 10:16:45 proto144 unix: pci0 at root: UPA 0x1f 0x4000 Aug 6 10:16:45 proto144 unix: pci0 is /pci@1f,4000 Aug 6 10:16:45 proto144 unix: pci1 at root: UPA 0x1f 0x2000 Aug 6 10:16:45 proto144 unix: pci1 is /pci@1f,2000 . .

9-240

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

9
The prtdiag Command
To execute the prtdiag command, type the following: # /usr/platform/platform-name/sbin/prtdiag -v Note The command options are -l, which logs information to disk if any error is found, and -v, which provides verbose output. The following is an example of a prtdiag command output.
q

CPU

========================= CPUs ========================= Run Ecache CPU CPU Brd CPU Module MHz MB Impl. Mask --- --- --------- -----------7 14 0 248 4.0 US-II 2.0 9 18 0 248 4.0 US-II 2.0 9 9 1 248 4.0 US-II 2.0
q

Memory group

=============================== Memory ================================== Intrlv Intrlv. Brd Bank MB Status Condition Speed Factor With --- ------- --------------- ---------------9 0 256 Active OK 60ns 1-way
q

I/O boards

========================= IO Cards ========================= Bus Freq Brd Type MHz Slot Name Model --- ------- --- ------------------------- -------1 SBus 25 0 DOLPHIN,sci 1 SBus 25 3 SUNW,hme 1 SBus 25 3 SUNW,fas/sd (block) 1 SBus 25 13 SUNW,socal/sf (scsi-3) 501-3060 3 SBus 25 0 DOLPHIN,sci 3 SBus 25 3 SUNW,hme 3 SBus 25 3 SUNW,fas/sd (block) 3 SBus 25 3 SUNW,socal/sf (scsi-3) 501-3060

Solaris Support Utilities


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

9-241

9
q

Detached boards

No failures found in System ===========================


q

Fatal hardware reset


w

This information is collected from components after a hardware failure. This information is useful in determining the correct FRU to be replaced.

No failures found in System ===========================


q

POST-detected failures

No System Faults found ======================


q

OS detected system faults


w

System-detected faults lights the Yellow LED on the failing board. You can repair system-detected faults. These faults will be removed from the display when they are repaired (overtemp, fan failure, power supply failure)

Most recent AC Power Failure: ============================= Fri Mar 12 10:44:07 1999


q

Environmental display

========================= Environmental Status ========================= Keyswitch position is in Normal Mode System Power Status: Redundant System LED Status: Normal Fans: ----Unit ---Rack Key AC GREEN ON YELLOW OFF GREEN BLINKING

Status -----OK OK OK

9-242

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

9
System Temperatures (Celsius): -----------------------------Brd State Current Min --- -------------1 OK 8 37 3 OK 44 44 7 OK 40 39 9 OK 44 43 CLK OK 35 35 Max --38 44 41 45 35 Trend ----stable stable stable stable stable

Power Supplies: --------------Supply Status -------------0 OK 1 OK 2 OK 3 OK PPS OK System 3.3v OK System 5.0v OK Peripheral 5.0v OK Peripheral 12v OK Auxilary 5.0v OK Peripheral 5.0v precharge OK Peripheral 12v precharge OK System 3.3v precharge OK System 5.0v precharge OK AC Power OK
q

Firmware levels

========================= HW Revisions ========================= ASIC Revisions: --------------Brd FHC AC SBus0 --- --- -- ----0 1 5 1 1 5 1 2 1 5 3 1 5 1 4 1 5 6 1 5

SBus1 ----1 1

PCI0 ----

PCI1 ----

FEPS ---22 22

Board Type ---------CPU Dual-SBus-SOC+ CPU Dual-SBus-SOC+ CPU CPU

Attributes ---------98MHz Capable 98MHz Capable 98MHz Capable 98MHz Capable 98MHz Capable 98MHz Capable

Solaris Support Utilities


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

9-243

9
System Board PROM revisions: ---------------------------Board 0: OBP 3.2.24 1999/12/23 Board 1: FCODE 1.8.24 1999/12/23 Board 2: OBP 3.2.24 1999/12/23 Board 3: FCODE 1.8.24 1999/12/23 Board 4: OBP 3.2.24 1999/12/23 Board 6: OBP 3.2.24 1999/12/23 17:31 17:30 17:31 17:30 17:31 17:31 POST iPOST POST iPOST POST POST 3.9.24 3.4.24 3.9.24 3.4.24 3.9.24 3.9.24 1999/12/23 1999/12/23 1999/12/23 1999/12/23 1999/12/23 1999/12/23 17:35 17:34 17:35 17:34 17:35 17:35

9-244

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

9
Setting NVRAM Conguration Parameters From Solaris
The eeprom Command
Solaris provides system administrators and service personnel with the ability to change system conguration parameters in NVRAM so that they can take effect when the system is restarted. This is accomplished by using the eeprom command. The eeprom command displays or changes the values of parameters in the EEPROM. It processes parameters in the order given. When processing a parameter accompanied by a value, eeprom makes the indicated alteration to the EEPROM; otherwise it displays the parameters value. When given no parameter speciers, eeprom displays the values of all EEPROM parameters. The following are examples of the eeprom commands available:
q

To display all conguration parameter settings, type # eeprom

To display the current setting of the auto-boot? parameter, type # eeprom auto-boot?

To disable boards in slots 3 and 5, type # eeprom disable-board-list=35

To set conguration policy to board, type # eeprom configuration-policy=board

Solaris Support Utilities


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

9-245

9-246

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

SunVTS System Diagnostics

10

10-247
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10
Introduction
SunVTS Software Overview
SunVTS is Suns online validation test suite. With VTS, you can verify the functionality of most of Suns hardware devices. You can use the SunVTS tests to stress certain areas of the system as needed for diagnostic and troubleshooting purposes. The SunVTS diagnostic software is the successor to SunDiag diagnostics, which is shipped with the Solaris 2.4 operating system or earlier releases. SunVTS runs on the Solaris 2.5 operating system and later releases. Like its SunDiag predecessor, SunVTS software can run concurrently with customer applications and the Solaris operating system. SunVTS is a vital part of the Sun Enterprise sever concurrent maintenance strategy.

10-248

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10
Test Categories
SunVTS is comprised of many individual tests that support testing of a wide range of products and peripherals. Most of the tests are capable of testing devices in a 32-bit or 64-bit Solaris environment. Use SunVTS to test one device or multiple devices. Some of the major test categories are:
q q q q q q q q

Audio Tests Communication (Serial and Parallel) Tests Graphic/Video Tests Memory Tests Network Tests Peripherals (Disks, Tape, CD-ROM, Printer, Floppy) Tests Processor Tests Storage Tests

SunVTS System Diagnostics


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10-249

10
Hardware and Software Requirements
The following lists the requirements to run SunVTS Version 3.1 software successfully in the common desktop environment (CDE) environment:
q q q

The Solaris 7 3/99 operating environment The SunVTS 3.1 package Operating system kernel congured to support all peripherals to be tested Superuser access to startup SunVTS software Connection of loopback connectors, installation of test media, or the availability of disk space

q q

Note In this module, all references to SunVTS imply SunVTS 3.1.

10-250

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10
Starting the SunVTS Software
The SunVTS program is run when the superuser types one of the following commands. The ex /opt/SUNWvts/bin directory needs to be dened as part of the PATH variable.
q

sunvts Runs the SunVTS kernel and default graphical interface (CDE) on the local machine sunvts -l Runs the SunVTS kernel and OpenLook graphical interface on the local machine sunvts -t Runs the SunVTS kernel in TTY mode, vtstty sunvts -h host_name Runs the graphical interface on the local machine while connecting and testing a remote machine

q q

Note The SUNvts package and, if needed, the SUNvtsx package must be installed on both local and remote machines to perform remote diagnostics.

SunVTS System Diagnostics


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10-251

10
The SunVTS Graphical Interface
The initial SunVTS graphic menu is shown in Figure 10-1.

Figure 10-1

SunVTS Graphical Interface

10-252

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10
The SunVTS Window Panels
The ve major panels of the SunVTS window are:
q

System Status Panel Test status, host name, model type, number of passes and errors, and elapsed time are displayed in the upper area of the SunVTS menu. System Map This area of the initial menu displays a logical device view consisting of a selectable list of devices to test by default. You can turn each device test on or off by clicking on the check box. You can select particular devices, such as CPUs, network interfaces, or disks, by clicking on the plus sign box. Select Devices This area of the SunVTS menu enables you to quickly select the devices to test, including a default set (shown in Figure 6-2). Select mode A SunVTS test session runs in one of two test modes: Connection test mode and Functional test mode.
w

Connection Test Mode In Connection test mode, the tests determine if the devices are connected to the system you are testing and if they are accessible. Functional testing is not done in this mode, but the devices are accessed to establish system connection and accessibility. You can safely run this mode when the system is online. When SunVTS testing is started in Connection test mode, each test is run sequentially until all tests are run.

The limited nature of the tests in this mode makes it possible to run periodic checks for conguration verication on the system.
w

Functional Test Mode Checks the operation of the system devices. This mode nds any faults and exercises the system by running tests to increase the load and stress on the system. Do not run critical applications on the system or use the system for production purposes in Functional test mode.

Test Messages This area displays any information or error messages that are issued during test executions.

SunVTS System Diagnostics


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10-253

10
The SunVTS Window Icons
Seven icons are provided at the top of the SunVTS menu. These are:
q

Start Begins the test, according to the selections made in the System map, Select devices, and Select mode areas. Progressive updates are displayed in the Information Panel during testing. Stop Stops current testing, without exiting SunVTS. Reset Sets the System map area to previous state. Host Provides a submenu in which you can enter a remote host name for a test connection. This host must be reachable, with SunVTS installed. Log Displays the log le, and provides a menu to select the amount of information to log, including errors, information, and UNIX messages (/var/adm/messages). Meter Invokes the performance monitor utility, which graphically displays system resource activity during testing. Quit Exits the SunVTS program.

q q q

10-254

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10
The SunVTS Menu Selections
The top horizontal bar of the SunVTS menu has four selections with lists of associated submenus.
q

Commands This menu provides the following commands:


w w w w w w

Start testing Begins testing Stop testing Halts testing Connect to host Species host name target host Trace test Selects a test to trace, and a location for the output Reprobe system Probes the hardware Quit VTS Exits SunVTS

View This menu provides two options:


w w

Open System map Displays full device selection list Close System map Displays default device selection list

Options The following selections are available:


w

Thresholds Species number of passes, errors, and time to run Notify Species a user to mail with test status information Schedule, Test Execution, and Advanced Runs specied number of tests with stress, verbose, core le, or run on error option (see the next page) Option les Loads, stores, or removes a test options le

w w

SunVTS System Diagnostics


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10-255

10
q

Reports Two selections are provided:


w

System conguration Displays the system conguration report as obtained with the prtconf command Log les Displays the log le and allows selection of the level of information to log

10-256

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10
The Schedule Options Menu
Clicking on the Schedule option beneath the Options selection on the horizontal bar of the SunVTS window displays the window in Figure 10-2.

Figure 10-2

Schedule Options Window

The available options are:


q

Auto Start Runs tests selected in a previously saved option le using a command-line specication when sunvts is invoked. Single Pass Runs only one pass of each selected test. System Concurrency Species the maximum number of tests that can be run concurrently on the machine being tested. Group Concurrency Species the number of tests to be run at the same time in the same group.

q q

SunVTS System Diagnostics


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10-257

10
The Test Execution Menu
Clicking on Test Execution beneath the Options selection on the upper horizontal bar of the SunVTS menu displays the window in Figure 10-3.

Figure 10-3

Text Execution Options Window

10-258

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10
The Test Execution Menu
The following is a list of options available in the Test Execution menu:
q

Stress Runs certain tests in stress mode, working the system harder than normal. Verbose Enables more information to be logged and displayed during testing. Core le Allows for a core dump generation in the SunVTS bin directory when abnormal conditions occur. The core le name format is core.testname.xxxxxx. Run on Error Continues testing until the max_errors value is reached. Max Passes Species the maximum number of passes that tests can run. A value of zero indicates no limit. Max Errors States the maximum number of errors any test allows before stopping. A value of zero causes tests to continue regardless of errors. Max Time Species the maximum number of minutes tests are allowed to run. A value of zero indicates no limit. Number of Instances Species the number of tests to run for all tests that are scalable.

SunVTS System Diagnostics


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10-259

10
The Advance Options Menu
Clicking on the Options selection on the topmost horizontal bar of the SunVTS window displays the window in Figure 10-4.

Figure 10-4

Advanced Options Window

The available options are:


q

System Override Supersedes group and test options in favor of the options selected in a Global Options window; set all options on all test group and test option menus. Group Override Supersedes specic test options in favor of the group options set in a Group Options window. Group Lock Protects specic group options from being changed by the options set at the system level. (System Override supersedes this option.) Test Lock Protects specic test options from being changed by options set at the group or system level. (System Override and Group Override supersede this option.)

10-260

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10
Intervention Mode
Certain tests require that you intervene before you can run the test successfully. These include tests that require media or loopback connectors.
q

Loopback connectors are required to run certain tests, such as serial port tests, successfully. See the SunVTS Test Reference Manual for more information about loopback connectors, and which tests need them.

Media (tapes, diskettes, or CD-ROMs) must be present in the drive(s) before the system is probed at SunVTS startup. If this is not done, the following error message is displayed:

Using old or damaged tapes and diskettes may cause errors in corresponding tests. You cannot select these tests until you enable the intervention mode. This setting reminds you that you must intervene before the test can be successfully completed.

SunVTS System Diagnostics


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10-261

10
Performance Monitor Panel
The performance monitor displays system resource activity. A brief description of each component is provided on the next page.

Figure 10-5

Perfmeter Window

10-262

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10
The Performance Monitor Panel
The information displayed with the SunVTS Performance Monitor is the same as that displayed by the operating system perfmeter utility.
q q q q q q q q q q

cpu Percentage of CPU used per second pkts Ethernet packets per second page Paging activity in pages per second swap Jobs swapped per second intr Number of device interrupts per second disk Disk use in transfers per second cntxt Number of context switches per second load Average number of processes that have run over last minute colls Collisions per second detected on the Ethernet errs Errors per second on receiving packets

SunVTS System Diagnostics


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10-263

10
Using SunVTS in TTY Mode
If you use the SunVTS software in TTY mode, no frame buffer is required. To run in TTY mode, perform the following steps: 1. Start the SunVTS kernel with the vtsk command. # /opt/SUNWvts/bin/vtsk 2. Start the SunVTS TTY User Interface with the vtstty command # /opt/SUNWvts/bin/vtstty or the sunvts command with the -t option. # /opt/SUNWvts/bin/sunvts -t

Figure 10-6

SunVTS Window

10-264

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10
Negotiating the SunVTS TTY Interface
The SunVTS TTY interface provides a screen with four working panels: Message, Status, Control, and System map. The following keys operate as follows with the TTY interface:
q q q q q

Tab Selects a screen panel for keyboard input Spacebar Selects an option within a panel Arrows Moves between the options in a panel Esc Closes pop-up option windows Control-l Refreshes the TTY window Control panel System map

Status panel

Message area

Figure 10-7 Various Working Panels of the SunVTS TTY Interface

SunVTS System Diagnostics


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10-265

10
Running SunVTS Remotely
A testing session can be run across a network or even a modem. Both the kernel and the user interface components are used in remote testing.

Requirements
The following requirements must be met to run SunVTS on a remote system:
q

There must be network connectivity between the local and remote system. You must install the same revision of SunVTS on both the local and remote system.

Running SunVTS Through a Remote Login


1. Use the xhost command to allow the remote system to display on your local system. $ /usr/openwin/bin/xhost + remote_hostname where remote_hostname is the name of the remote system. 2. Log in to the remote system and substitute user to root. $ rlogin remote_hostname $ su 3. Start SunVTS. # /opt/SUNWvts/bin/sunvts -display \ local_hostname:0 where local_hostname is the name or IP address of the local system. Note The SunVTS kernel starts on the remote system and the user interface displays on your system.

10-266

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10
4. Congure SunVTS for the test session and start the tests. 5. Review the SunVTS logs for test results. You can view the remote system test logs through the local SunVTS interface. The log les are stored on the system under test (SUT).

SunVTS System Diagnostics


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10-267

Running SunVTS Remotely


Running SunVTS Through telnet or tip
You can run SunVTS on a remote system, with the TTY interface, through a telnet or tip session. You need to set the correct terminal type and number of columns and rows before starting the interface. The steps below describe this process. 1. Use the echo command to display the value of the TERM variable: Note In this example, the TERM variable is a Korn or Bourne shell variable and the value is sun-cmd. $ echo $TERM sun-cmd 2. Use the stty command to display the settings of your terminal: $ stty speed 9600 baud; -parity hupcl rows = 60; columns = 80; ypixels = 780; xpixels = 568; switch = <undef>; brkint -inpck -istrip icrnl -ixany imaxbel onlcr echo echoe echok echoctl echoke iexten Note You must have a minimum of 80 columns and 24 rows to run the SunVTS TTY interface. Write down the values of your TERM variable and rows and columns settings. You will need these values later. 3. Connect to the remote system using either the telnet or tip commands.

10
Running SunVTS Remotely
Running SunVTS Through telnet or tip
4. Become superuser on the remote system. 5. Identify your terminal type and settings in the telnet (or tip) session window: # TERM=sun-cmd # stty rows 60 # stty columns 80 6. Start SunVTS with the TTY interface: # /opt/SUNWvts/bin/sunvts -t 7. Congure SunVTS for the test session and start the tests. 8. Review the SunVTS logs for test results. You can view the remote system test logs through the local SunVTS TTY interface. The log les are stored on the system under test (SUT).

SunVTS System Diagnostics


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10-269

10
SunVTS Test Summary
SunVTS supports a number of tests that are applicable to Sun Enterprise servers. This section gives a brief description of these tests. Further details on each test can be found in the SunVTS 3.x Test Reference Manual.

Advanced Frame Buffer Test


The afbtest veries the functionality of the Advanced Frame Buffer. Note This test supports Function Test mode only. Caution Do not run any other application or screen saver program that uses the AFB accelerator port while running afbtest. This combination causes SunVTS to return incorrect errors.

SunATM Adapter Test


The atmtest checks the functionality of the SunATM-155 and SunATM-622 SBus and PCI bus adapters. It runs only in loopback (external or internal) mode. The Asynchronous Transfer Mode (ATM) adapter, and ATM device driver must be present. To run the atmtest in external loopback mode, a loopback connector must be attached to the ATM adapter. The internal loopback mode does not require a loopback connector. Note This test supports Function Test mode only.

Note Do not run nettest while running atmtest.

Note Bring the ATM interface down to make sure that the interface is in ofine mode before running atmtest

10-270

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10
SunVTS Test Summary
Audio Test
The audiotest veries the hardware and software components of the audio subsystem. This test supports all Sun audio implementations. Note This test supports Connection and Function Test modes.

Note The audio device is an exclusive use device. Only one process or application can interface with it at a time.

Bidirectional Parallel Port Printer Test


The bpptest veries the functionality of the bidirectional parallel port. The bpptest veries that your SBus card and its parallel port are working properly by attempting to transfer a data pattern from the SBus card to the printer. Note This test supports Connection and Function Test modes.

Compact Disc Test


The cdtest checks the CD-ROM unit by reading the CD. cdtest is not a scalable test. Each track is classied as follows:
q q

Mode 1 uses error detection/correction code (288 bytes). Mode 2 uses that space for auxiliary data, or as an audio track.

Note Load a compact disc into the drive before starting the test.

Note This test supports Connection and Function Test modes.

SunVTS System Diagnostics


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10-271

10
SunVTS Test Summary
Frame Buffer, GX, GX+ and TGX Options Test
The cg6 test veries the cgsix frame buffer and the graphics options offered with most SPARC based workstations and servers. Note This test supports Function Test mode only.

Disk and Floppy Drives Test


The disktest veries the functionality of hard disk drives and oppy drives using three subtests; Media, File System, and Asynchronous I/O. Most disk drives, such as SCSI disks, native or SCSI oppy disks, IPI, and so on, are supported. The type of drive being tested is displayed at the top of the Test Parameter option menu. The WriteRead option of the Media subtest is allowed only if a selected partition is not mounted. By default, disktest does not mount any partitions.

Caution If a power failure occurs while the Media subtest is running in WriteRead mode, disk data might be destroyed.

Caution Running the Media subtest on a disk partition in the WriteRead mode can cause data corruption if the same partition is being used by other programs. Only select this mode when the system is ofine (not used by any other users or programs).

Note This test supports Connection and Function Test modes.

10-272

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10
SunVTS Test Summary
ECP 1284 Parallel Port Printer Test
The ecpptest veries the functionality of the ecpp IEEE 1284 parallel printer port device. Note The ecpp device is an exclusive use device. Only one application can interface with it at a time

Note This test supports Connection and Function Test modes.

Sun Enterprise Network Array Test


The enatest is used to provide conguration verication, fault isolation, and repair validation of the Sun Enterprise Network Array. The Sun Enterprise Network Array is a high availability mass storage subsystem consisting of:
w

SCSI bre channel protocol host adapters with dual 100Megabyte FC-AL ports. A disk enclosure. A Front panel display for conguration information. Up to two interface boards in the enclosure, which provide FC-AL connections to the enclosure and also provide status information and control of the conditions within the enclosure. Other eld-replaceable units (FRUs) within the enclosure include power supply units, fan trays and backplane.

w w w

enatest detects all Sun Enterprise Network Array enclosures connected to the host and collects relevant conguration information. Note This test supports Connection and Function Test modes.

SunVTS System Diagnostics


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10-273

10
SunVTS Test Summary
StorEdge 1000 Enclosure Test
The enctest tests the StorEdge 1000 enclosures. The enclosure can support either 12 1 4Gbyte drives or 8 1.6 9Gbyte drives and have redundant power and cooling. Two enclosure models are available:
q q

StorEdge A1000 - Disk Tray with the hardware RAID controller StorEdge D1000 - Disk Tray without the hardware RAID controller.

You can use enctest can be used for validation, conguration verication, repair verication, and fault isolation of both models. The enctest probe detects all the connected StorEdge enclosures and displays the status of the various elements in the enclosure. Note This test supports Connection and Function Test modes.

Frame Buffer Test


The fbtest is a generic test for all dumb frame buffers used with the Solaris 2.x and Solaris 7 software. Note This test supports Function Test mode only.

Fast Frame Buffer Test


The ffbtest veries the functionality of the Fast Frame Buffer. ffbtest can detect and adapt to the video modes of single- and double-buffer versions of the fast frame buffer (FFB). Note This test supports Function Test mode only.

10-274

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10
SunVTS Test Summary
Floating Point Unit Test
The fputest checks the oating point unit on machines with the SPARC-based architecture. Note This test supports Connection and Function Test modes.

Sun GigabitEthernet Test


The gemtest provides functional test coverage of the Sun GigabitEthernet SBus and PCI bus adapters. It runs in loopback (external/internal) mode and must be selected mutually exclusive with the nettest. The gemtest provides better fault isolation as compared to nettest. Note This test supports Function Test mode only.

Intelligent Fibre Channel Processor Test


The ifptest tests the functionality of the PCI FC_AL card when there are no devices attached to the loop. The driver checks for devices on the bre loop. If devices are detected the driver blocks any diagnostic commands. Note When devices are attached to the loop, do not run ifptest. Instead, run disktest tests on the individual devices. This will test the whole subsystem including the FC_AL controller.

Note This test supports Connection and Function Test modes.

SunVTS System Diagnostics


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10-275

10
SunVTS Test Summary
Dual Basic Rate ISDN (DBRI) Chip
The isdntest veries the functionality of the ISDN portion of the Dual Basic Rate ISDN (DBRI) chip. Note This test supports Function Test mode only.

M64 Video Board Test


The m64test tests the PCI-based M64 video board by performing the following subtests:
q q q

Video Memory test RAMDAC test Accelerator Port test

Caution DO NOT run any other application or screen saver program that uses the Pineapple accelerator port while running m64test. Do not run power management software. These programs cause SunVTS to return incorrect errors.

Note This test supports Function Test mode only.

Multiprocessor Test
The mptest veries the functionality of multiprocessing hardware. mptest can test up to 256 processors can be tested by mptest. Note This test supports Connection and Function Test modes.

10-276

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10
SunVTS Test Summary
Network Hardware Test)
The nettest checks all the networking hardware on the system CPU board and separate networking controllers (for example, a second SBus Ethernet controller). For this test to be meaningful, the machine under test must be attached to a network with at least one other system on the network. Note This version of nettest is used for all networking devices, including Ethernet (ie and le), token ring (tr, trp), quad Ethernet (QED), ber optic (fddi, nf, bf, pf), SPARCcluster 1 System , ATM (sa, ba), HiPPI, and 100-Mbits per second Ethernet (be,hme) devices.

Note This test supports Connection and Function Test modes.

SPARCstorage Array Controller Test


The plntest checks the functionality of the controller board on the SPARCstorage Array. The SSA controller card is an intelligent, CPU-based board with its own memory and ROM-resident software. In addition to providing a communications link to the disk drives, it also buffers data between the host system and disk drives in its nonvolatile RAM (NVRAM). For data to go from the host to a particular disk, it must rst be successfully transferred to this NVRAM space. The host machine, SBus host adapter card, ber-channel connection, and the SSA controller board must be working properly to perform this data transfer operation. By verifying and stressing this operation, plntest can isolate failures on the SSA disk drives from failures on the SSA controller board. Note This test supports Connection and Function Test modes.

SunVTS System Diagnostics


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10-277

10
SunVTS Test Summary
Physical Memory Test
The pmemtest checks the physical memory of the system. The pmemtest locates parity errors, hard and soft error correction code (ECC) errors, memory read errors, and addressing problems. This test reads through all available physical memory. It does not write to any physical memory location. Note This test supports Connection and Function Test modes.

Prestoserve Test
Prestoserve is an Network File System (NFS) accelerator. It reduces the frequency of disk I/O access by caching the written data blocks in nonvolatile memory. Prestoserve then ushes the cached data to disk asynchronously, as necessary. The pstest veries the Prestoserve accelerators functionality with the following three checks:
q q q

Board battery check Board memory check Board performance and le I/O access check

Note This test supports Function Test mode only.

10-278

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10
SunVTS Test Summary
Serial Asynchronous Interface Test
The saiptest checks the functionality of the Serial Asynchronous Interface card through its device driver. Note You must run the saiptest in intervention mode.

Note This test supports Function Test mode only.

Sun Enterprise Cluster 2.0 Network Hardware Test


The scitest veries the functionality of the Sun Enterprise Cluster 2.0 by checking the networking hardware. For this test to be meaningful, the cluster must already be congured before the test is run. After nding the cluster nodes (targets), scitest performs the following tests:
q

Random test sends out 256 packets with random data length and random data. Incremental test sends out packets with length from minimum to maximum packet size using incremental data. Pattern test sends 256 packets of maximum length

Note This test supports Function Test mode only.

SunVTS System Diagnostics


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10-279

10
SunVTS Test Summary
Environmental Sensing Card Test
The sentest checks the SCSI Environmental Sensing card (SEN) installed in the SPARCstorage RSM to monitor the enclosure environment. The SEN card monitors the enclosures over-temperature condition, fan-failures, power-supply failures, and drive activity. sentest veries the following control functions in the enclosure:
q

Alarm (enable/disable) sentest toggles the alarm to the disable state, then to the enable state. Alarm time (0-0xff seconds) sentest sets the time (from 0 to 4095), then reads it back to verify the time setting. Drive fault LED (DL0-DL6) sentest toggles each LED to its OFF and ON states.

Note This test supports Connection and Function Test.

Soc+ Host Adapter Card Test


The socaltest aids the validation and fault isolation of the SOC+ host adapter card. In the case of a faulty card, the test tries to isolate the fault to the card, the Gigabit Interface Controller (GBIC) module, or the DMA between the host adapter card and the host memory. Note This test supports Function Test mode only.

10-280

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10
SunVTS Test Summary
Serial Parallel Controller Test
The spiftest accesses card components such as the cd-180 and ppc2 chips, and the serial and parallel ports through the serial parallel controller device driver. Note The spiftest must be run in Intervention mode.

Note This test supports Function Test mode only.

Serial Ports Test


The sptest checks the systems on-board serial ports (zs[0,1], zsh[0,1], se[0,1], se_hdlc[0,1]), as well as any multi-terminal interface (ALM2) boards (mcp[0-3]). Data is written and read in asynchronous and synchronous modes utilizing various loopback paths. Note The sptest must be run in Intervention mode.

Note This test supports Connection and Function Test.

SunButtons Test
The sunbuttons test veries that the SunButtons graphics manipulation device is working correctly Note This test supports Function Test mode only.

SunVTS System Diagnostics


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10-281

10
SunVTS Test Summary
SunDials Test
The sundials test veries that the SunDials graphics manipulation device controls are working properly. sundials also veries the connection between the dialbox and serial port. Note This test supports Function Test mode only.

HSI Board Test


The sunlink test veries the functionality of the SBus and PCI bus High Speed Serial Interface (HSI) boards by using the High-level Data Link Control (HDLC) protocol. sunlink initializes and congures the selected channel. Note This test will not pass unless you install the correct loopback connectors or port to port cables on the ports you are testing.

Note This test supports Function Test mode only.

Sun PCi Test


The sunpcitest tests the SunPCi plug-in PCI card, which is an X86 processor embedded in an add-on card. Note This test supports Function Test mode only.

10-282

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10
SunVTS Test Summary
System Test
The systest checks the CPU board by exercising the I/O, memory, and CPU channels simultaneously as threads. There is no quick test option for systest; it is a CPU stress test. Note This test supports Function Test mode only.

Tape Drive Test


The tapetest synchronous I/O test writes a pattern to a specied number of blocks (or, for a SCSI tape, writes to the end of the tape). The tapetest then rewinds the tape and reads and compares the data just written. The tapetest asynchronous I/O test sends a series of up to ve asynchronous read/write requests to the tape drive, writing to the tape and then reading and comparing the data. The tapetest le test writes four les to the tape and then reads them back, comparing the data. For tape library testing, the pass count is incremented only after all tapes in the library have been tested. Note A blank writable tape (scratch tape) must be loaded before you start this test.

Note This test supports Connection and Function Test.

Virtual Memory Test


The vmemtest checks virtual memory; that is, it tests the combination of physical memory and the swap partitions of the disk(s). Note This test supports Function Test mode only.

SunVTS System Diagnostics


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

10-283

10
Test Message Syntax
All SunVTS test messages follow this format: SUNWvts.testname[.subtest_name].message_number date time testname device_name [FRU_path]ERROR|FATAL|INFO|WARNING|VERBOSE message Table 10-1 lists the SunVTS test message arguments and gives a brief description. Table 10-1 SunVTS Test Message Arguments Argument SUNWvts testname subtest_name message_number Description SunVTS package name SunVTS test name The subtest module name (optional) The message identier, which is a unique number for the test. The number is usually within the following ranges: VERBOSE: 1 - 1999 INFO: 2000 - 3999 WARNING: 4000 - 5999 ERROR/FATAL: 6000 - 7999 FATAL: 8000 - 9998 (The number 9999 is reserved for any possible old message types in previous SunVTS releases for compatibility reasons.) Tells when the error occurred The name of the test reporting the error The device being tested when the error occurred A full Solaris device path of the failed FRU; this argument varies, depending on the type of test running when the error occurred Contains test messages, in addition to probable cause and recommended action

date time testname device_name FRU_path

message

10-284

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

Alternate Pathing

A-285
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

Introducing Alternate Pathing


Alternate Pathing (AP) provides high availability to storage and network devices. With AP, you have two physical paths to the same A5000 or SSA storage array or network interface, transparent to the operating system. Only one path can be active at a time. If a path fails, the alternate path can be switched in place of the failed path. Path switching does not always occur automatically; you might need to switch it manually. The system uses the meta-device, a name representing the end object (such as the disk partition or network interface), but does not use the physical path names to access the device. Note The AP material covered in this module applies to the AP 2.2 support that Solaris 7 provides for the Sun Enterprise x000 and x500 servers.

A-286

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

A
Supported Devices
Disk Devices
AP supports the StorEdge A5000 and SPARCstorage arrays. SCSI devices are not supported. The StorEdge A3000 is not supported, but has its own internal AP capability. After you set up AP for disks, you can use Solstice DiskSuite Version 4.1 and Sun Volume Manager Versions 2.3, 2.4, and 2.5 normally. (However, on installation Dynamic Multipathing (DMP) automatically disables itself in Volume Manger 2.5 if AP is already installed.)

Caution You must make sure that any AP devices used by these products are used by their meta-device names only. You can place your boot disk and primary network interface under AP control. This makes it possible for the system to boot unattended, even if the primary network or boot disk controller is not accessible, as long as a usable alternate path for these devices is dened and available.

Network Devices
The following network devices are supported by AP 2.2:
q q q q q q

SunFastEthernet 2.0 (hme) Sun FDDI 5.0 (nf) SAS and DAS Lance Ethernet (le) Quad Ethernet (qe) Sun Quad FastEthernet (qfe) GigabitEthernet (ge)

Alternate Pathing
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

A-287

A
Installing AP
Solaris 7
Solaris 7 supports AP 2.2.

Solaris 2.6
Solaris 2.6 supports AP 2.1.

Solaris 2.5.1
Solaris 2.5.1 supports AP 2.0.

Installing AP
Install the following packages:
q q q

SUNWapr AP subsystem (root) SUNWapu AP subsystem (usr) SUNWapdv

Documentation:
q

SUNWabap AP AnswerBook
q

AP 2.0 only. AP 2.1 documentation is in the Hardware AnswerBook, SUNWabhdw.

q q

SUNWapdoc AP man pages Apply all appropriate patches

The installation process uses the pkgadd command to install the AP packages. There is no order dependency.

A-288

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

A
How AP Works
AP creates a new layer of device drivers (meta-disks and metanetworks), which accesses one of two physical device drivers to access the device. Applications and the OS components, including the disk management software, use the meta-device name to access the resource. Only the drivers know the actual physical paths. No component other than AP is aware that the normal device paths are to the same device. This can cause problems for applications that use the physical paths instead of the meta-devices to scan or inspect disk or network devices; they might identify the meta-device paths as separate devices. AP automatically switches from the active disk path to the alternate disk path if the active path fails. Additionally, you can manually switch the active path to the alternate, at any time, with no interruption to active trafc using the meta-device for both disk and networks. Note In the Enterprise x000 and x500 computers there is no automatic switch-over to the alternate path during a DR operation. Meta-device denitions are stored in an AP state database that is used early in the boot process. There are usually several copies of this database. You must create the meta-devices yourself; the system will not automatically create these for you. Note AP can be used with Sun Enterprise Volume Manager (SEVM) or Solstice DiskSuite (SDS)

Alternate Pathing
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

A-289

Physical Paths
For the purposes of AP, an I/O device is either a disk or network device. The only types of disk device currently supported by AP are the StorEdge A5000 and the SPARCstorage Array (SSA). In this module, the term disk always refers to one of these devices. An I/O adapter is the controller for an I/O device such as an A5000 SOC+ adapter. A device node is a path in the devices directory that is used to access a physical device, such as /dev/dsk/c0t1d1s0. The term physical path refers to the electrical path from the host to a disk or network.

A-290

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

Metadisk
A metadisk is logical name that enables you to access a physical disk device without having to specify the particular path to the device. You reference a metadisk just as if it were a real device, using an APspecic device node, such as /dev/ap/dsk/mc0t1d1s0. The AP software determines which path is active and uses that path to access the device. The path, /dev/ap/dsk/mc0t1d1s0 is used to access a slice on a metadisk, regardless of which pln port is currently active (handling I/O) for the metadisk. For the A5000, the sf ports (representing an SOC+ adapter) are where AP activates the paths.

Alternate Pathing
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

A-291

A
Disk Pathgroup
A disk pathgroup consists of two physical paths leading to the same storage array. When a physical path is part of a pathgroup, it is called an alternate path. An alternate path to a disk can be uniquely identied by the pln or sf port that the alternate path uses. Make sure that you understand the use of the term alternate. It means either possible path, not just the spare path. The path in use is the active alternate. Only one alternate path at a time is allowed to handle disk I/O. The alternate path that is currently handling I/O is called the active alternate. One of the alternate paths is designated the primary path. The primary path is initially made the active alternate. Although you can change which path is the active alternate, the primary path is always the same. The primary path has several properties.
q q q

It is initially the active alternate. It provides the metadisk name. Identies the metadisk.

You reference a disk pathgroup by specifying the pln or sf port (such as pln1 or sf7) that corresponds to the primary path. For example, if the primary path is sf1, the pathgroup name is msf1. Some considerations are:
q

Both array interfaces in a pathgroup must be attached to the same array Only one interface is active at a time through the meta-device There must be exactly two adapters in a pathgroup If you have two interface boards, consider connecting a path to each If you are using hubs in your conguration, use a separate hub for each interface

q q q

A-292

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

Metanetwork
A metanetwork, just like a metadisk, is a logical interface that enables you to access a network through either of two physical paths without having to reference either path explicitly within your scripts and programs. You reference a metanetwork by using a metanetwork interface name such as mle1. Interface mle1 is used to access the metanetwork, regardless of which physical adapter (le1 or le6) is currently active for the metanetwork device.

Alternate Pathing
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

A-293

Network Pathgroup
Similar to a disk pathgroup, a network pathgroup consists of two network adapters connected to the same physical network. To specify a network pathgroup, use the metanetwork interface name, such as mle1. Just as with a disk pathgroup, this is how you would switch the active alternate. Some considerations are:
q

Network adapters in a pathgroup must be attached to the same subnet Only one adapter is active at a time Use a separate hub for each path for even more redundancy There must be exactly two adapters in a pathgroup Both network adapters must be of the same device type

q q q q

A-294

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

AP With Mirroring
AP is similar to, but not the same as, disk mirroring. Disk mirroring replicates data to separate devices and thus achieves data redundancy. AP, on the other hand, achieves pathing redundancy. Disk mirroring and AP are complementary; you can use them together to achieve both data redundancy and pathing redundancy. Mirroring occurs on top of AP, which enables switching of the underlying adapters used to implement the mirror from one board to another without disruption of the disk mirroring or any active I/O. AP does not provide mirroring itself.

Alternate Pathing
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

A-295

AP and DR
AP supports DR which is used to logically attach and detach system boards from the operating system without having to halt and reboot. For example, with DR you can detach a board from the operating system, physically remove and service the board, and then re-insert the board and attach it to the operating system again. You can do all of this without halting the operating system or terminating any user applications. To detach a board that is connected to an I/O device, and if that I/O device is alternately pathed, you can rst use AP to redirect the I/O ow to a controller on a different board. You can then use DR to detach the system board without interrupting the I/O ow.

A-296

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

A
The AP State Database
AP maintains a database that contains information about all dened meta-disks, meta-networks, and their corresponding alternate paths and properties. Each system will have its own database. Conceptually, a single AP database is maintained in a single system. However, you should set up multiple copies of this database. In this way, if a given database copy is not accessible or becomes corrupted, AP can automatically begin to use a current, non-corrupted database copy. All of the AP databases synchronize their contents during system initialization and DR operations.
q

You must dedicate an entire raw disk slice, of at least 300 Kbytes, to each AP database copy. As congured at the factory, slice 4 of the root disk is appropriately sized for an AP database (2 Mbytes) and is not allocated to any other purpose.

When choosing partitions for the AP database, remember that:


q

You should set up at least three to ve database copies. The database copies should have no I/O adapters in common with each other. This helps protect against an adapter failure. The copies can be on any slice of any type of disk device. They do not need to be on devices that AP supports, and do not need to have alternate paths. Especially if you are using Dynamic Reconguration (DR), the database copies should be on I/O adapters on different system boards so that at least one database copy is always accessible if one of the system boards is detached. Generally, you should have one separate copy per system board.

Alternate Pathing
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

A-297

A
Creating the AP Database
Before you can begin conguring AP, you must create at least one AP database. The AP database is created with the apdb command. You can use apdb to create the original database or a copy.

The apdb Command


# apdb -c /dev/rdsk/c0t3d0s4 -f The -c (create) option is followed by the raw disk slice that will contain the new AP database copy. Each copy requires its own dedicated slice, which must be at least 300 Kbytes in size. The -f (force) option is only necessary to create the rst AP database copy. It is not used otherwise. If you want an AP database copy to reside on an AP disk, you must create two copies of the AP database. The AP conguration process can only access database locations by the physical disk slice address, and is not aware of meta-devices at this level. You must create this database copy twice, specifying each of the physical paths to the AP meta-disk. For example, if c1 and c9 are connected to the same AP pathgroup, to create a copy of the AP database residing on target 3, slice 4, use the following two commands: # apdb -c /dev/rdsk/c1t3d0s4 -f # apdb -c /dev/rdsk/c9t3d0s4 The AP software will be aware of two copies of the database when actually there is only one, because the disk is accessible through two paths. This database "alias" is safe, because AP always updates and accesses its database copies sequentially. The AP copy is updated twice with the same information, but this is insignicant overhead.

A-298

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

A
The whole process works outside of AP. AP is not aware that these are two separate copies of the database.

Alternate Pathing
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

A-299

A
Creating the AP Database
Example
# apdb -c /dev/rdsk/c0t1d0s4 -f The -c option species the raw disk slice (under /dev/rdsk) where you want to create the database copy. You must dedicate an entire disk partition to each database copy. The disk partition must have at least 300 Kbytes. The -f (force) option is only necessary to create the rst AP database copy. # apconfig -D path: /dev/rdsk/c3t3d0s1 major: 32 minor: 145 timestamp: Wed Mar 10 18:45:58 1999 checksum: 2636010350 default: yes corrupt: no inaccessible: no path: /dev/rdsk/c3t3d0s6 major: 32 minor: 150 timestamp: Wed Mar 10 18:50:43 1999 checksum: 2636010350 default: no synced: yes corrupt: no inaccessible: no

A-300

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

A
AP Utility Examples
Identifying Disk Host Adapter Instances
Identies all ports and provides the name, instance number, and disk special les (/dev/dsk) targets attached to each port. # apinst isp0 /dev/dsk/c0t0d0 /dev/dsk/c0t1d0 /dev/dsk/c0t2d0 pln0 /dev/dsk/c1t0d0 /dev/dsk/c1t1d0 /dev/dsk/c1t2d0 /dev/dsk/c1t3d0 /dev/dsk/c1t4d0 /dev/dsk/c1t5d0 pln1 /dev/dsk/c2t0d0 /dev/dsk/c2t1d0 /dev/dsk/c2t2d0 /dev/dsk/c2t3d0 /dev/dsk/c2t4d0 /dev/dsk/c2t5d0 sf0 /dev/dsk/c3t0d0 /dev/dsk/c3t1d0 /dev/dsk/c3t2d0 /dev/dsk/c3t3d0 /dev/dsk/c3t4d0 /dev/dsk/c3t5d0 sf1 /dev/dsk/c4t0d0 /dev/dsk/c4t1d0 /dev/dsk/c4t2d0 /dev/dsk/c4t3d0 /dev/dsk/c4t4d0 /dev/dsk/c4t5d0

Alternate Pathing
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

A-301

A
Meta-Disk Conguration
# ssaadm disp c1 SPARCstorage Array 110 Configuration (ssaadm version: 1.20 97/05/14) Controller path:/devices/sbus@45,0/SUNW,soc@0,0/SUNW,pln@b0000000,8a0e2f: ctlr DEVICE STATUS TRAY 1 TRAY 2 TRAY 3 slot 1 Drive: 0,0 Drive: 2,0 Drive: 4,0 2 NO SELECT NO SELECT NO SELECT 3 NO SELECT NO SELECT NO SELECT 4 NO SELECT NO SELECT NO SELECT 5 NO SELECT NO SELECT NO SELECT 6 Drive: 1,0 Drive: 3,0 Drive: 5,0 7 NO SELECT NO SELECT NO SELECT 8 NO SELECT NO SELECT NO SELECT 9 NO SELECT NO SELECT NO SELECT 10 NO SELECT NO SELECT NO SELECT CONTROLLER STATUS Vendor: SUN Product ID: SSA110 Product Rev: 1.0 Firmware Rev: 3.12 Serial Num: 00000083BE1D Accumulate Performance Statistics: Enabled For A5000s, you would use: # luxadm disp c2 Note that the luxadm command includes the ssaadm command functionality. You could use luxadm to obtain information for both A5000 and SSA devices.

A-302

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

A
Creating a Disk Pathgroup and Meta-Disks
1. Use apdisk to create an uncommitted disk pathgroup. The apdisk command creates the metadisk names and updates the AP database with the alternate paths for all six SSA disks. # apdisk -c -p pln0 -a pln1 The -c operand species creation of a pathgroup, and the -p and the -a operands specify the primary and alternate paths, respectively. 2. Verify the results with apconfig -S -u. # apconfig -S -u c1 c3 pln0 P A pln1 metadiskname(s): mc1t5d0 mc1t4d0 mc1t3d0 mc1t2d0 mc1t1d0 mc1t0d0

U U U U U U

Note that the entries are uncommitted. 3. Use apdb -C to commit the new database entries. # apdb -C 4. Use apconfig -S to view the new disk entries in the database. Note that the U is now gone. # apconfig -S c1 c3 pln0 P A pln1 metadiskname(s): mc1t5d0 mc1t4d0 mc1t3d0 mc1t2d0 mc1t1d0 mc1t0d0

Alternate Pathing
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

A-303

A
Creating a Disk Pathgroup and Meta-Disks
5. Run drvconfig to create the new metadevice entries in the /devices directory. The -i operand ensures that only AP metadevices are created. # drvconfig -i ap_dmd 6. Use the ls command to conrm that the device nodes have been created. # ls /devices/pseudo/ap_dmd* /devices/pseudo/ap_dmd@0:128,blk /devices/pseudo/ap_dmd@0:128,raw /devices/pseudo/ap_dmd@0:129,blk /devices/pseudo/ap_dmd@0:129,raw /devices/pseudo/ap_dmd@0:130,blk /devices/pseudo/ap_dmd@0:130,raw ... 7. Use apconfig -R to create the /dev directory links to the new /devices directory nodes. /dev/ap/dsk and /dev/ap/rdsk links for each possible partition on each drive will be created, just like the disks command does for regular disk devices. # apconfig -R 8. Use the ls command to conrm that the /dev links to the device nodes have been created.

# ls -l /dev/ap/dsk total 8 lrwxrwxrwx 1 root 40 Jul 27 16:47 mc1t0d0s0 -> ../../../devices/pseudo/ap_dmd@0:128,blk lrwxrwxrwx 1 root 40 Jul 27 16:47 mc1t0d0s1 -> ../../../devices/pseudo/ap_dmd@0:129,blk lrwxrwxrwx 1 root 40 Jul 27 16:47 mc1t0d0s2 -> ../../../devices/pseudo/ap_dmd@0:130,blk Similar entries will exist for /dev/ap/rdsk.

A-304

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

A
Using the Meta-Devices
You must modify every reference to a physical device node (such as a path name that begins with /dev/dsk or /dev/rdsk) to use the corresponding meta-disk device node, the path that begins with /dev/ap/dsk or /dev/ap/rdsk. If a partition is currently mounted under a physical path name, it should be unmounted and remounted under the meta-disk path name. This can be done by changing the vfstab le and having the meta-device become active on the next reboot. Do not do this for the boot device. If you are placing the boot disk under AP control, you will need to modify the vfstab le by using the apboot command. Refer to the following page for further information.

Alternate Pathing
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

A-305

A
Placing the Boot Disk Under AP Control
1. 2. Create an AP pathgroup for physical path that includes the boot disk. Run apboot, specifying the boot meta-disk name, to dene the new AP boot device. apboot modies /etc/vfstab and /etc/system. # apboot mc2t0d0 where mc2t0d0 is the meta-disk name of the boot disk. apboot examines /etc/vfstab and replaces the physical device name of the boot disk, such as /dev/dsk/c2t0d0sx, with the meta-disk name, such as /dev/dsk/mc2t0d0sx. It also edits /etc/system so that the drivers required for AP boot disk usage are force loaded. Do not manually replace the physical devices in /etc/vfstab with meta-disks for the boot disk. Instead, use the apboot command to ensure that all required changes are made. Just changing /etc/vfstab will prevent the system from booting. 3. Set the OBP environment variable boot-device to the physical path most likely to be used for booting. Do not use multiple device names from the devalias command, including the other path. Dene an OBP devalias for the alternate boot device physical path in case you need to perform a manual boot from the alternate path. Set the OBP boot-device parameter to this name. Do not add it to the boot-device parameter value. At this point, just reboot the system to begin using the AP boot device.

4.

5.

Warning If you want to create a new AP database copy after you have placed the boot disk under AP control, and the new database copy is to be located on a partition

A-306

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

A
controlled by a pln port that does not control any of the current AP database copies, you must rst remove the boot disk from AP control. Make sure that the new AP database has been created. Then place the boot disk under AP control again. Failure to follow this procedure may cause the AP database to become inaccessible during boot.

Alternate Pathing
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

A-307

A
Manually Switching the Active Path
Note You can perform a switch at any time, even while I/O is occurring on the device. You might want to experiment with the switching process to verify that you understand it and that your system is set up properly, rather than wait until a critical situation occurs. 1. Use apconfig -S to view the current conguration: # apconfig -S c1 c3 pln0 P A pln1 metadiskname(s): mc1t5d0 mc1t4d0 mc1t3d0 mc1t2d0 mc1t1d0

2.

To perform the switch, use apconfig -P -a, where -P identies the pathgroup and -a species the path to become active. # apconfig -P pln0 -a pln1

3.

Verify the results with the apconfig -S command. You can see that the active alternate has been switched to pln1. # apconfig -S c1 c3 pln0 P pln1 A metadiskname(s): mc1t5d0 mc1t4d0 mc1t3d0 mc1t2d0 mc1t1d0

A-308

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

A
Note Remember that switch operations take effect immediately.

Alternate Pathing
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

A-309

A
Automatic Disk Pathgroup Switching (AP 2.1)
AP 2.1 provides the ability to automatically switch the active path of a disk pathgroup. This will occur only under two conditions:
q q

The currently active path has failed DR requests the switch (Enterprise 10000 only)

If AP detects that a path has failed, it will be marked with a T in the apconfig -S output. # apconfig -S c1 c3 pln0 P A pln1 T metadiskname(s): mc1t5d0 mc1t4d0 mc1t3d0 mc1t2d0 mc1t1d0

When a path is marked T (tried), AP will not automatically switch to it. You can reset the tried ag by:
q q q

Rebooting the system Using DR detach and then DR attach the board Resetting the ag manually with apdisk -w. Specify the tried path, not the pathgroup name. # apdisk -w pln1 #

Note Resetting the ag manually should only be done after the cause of the failure has been repaired. You can still manually switch to a path marked tried with the apdisk -P command.

A-310

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

A
Creating a Network Pathgroup
This example assumes that you are creating a network pathgroup using physical interfaces le0 and le2, with le0 as the primary interface. 1. Use apnet to create an uncommitted network pathgroup. The apnet command creates the metainterface names and updates the AP database with the alternate paths. # apnet -c -p le0 -a le2 The -c operand species creation of a pathgroup, and the -p and the -a operands specify the primary and alternate paths, respectively. 2. Verify the results with apconfig -N -u. # apconfig -N -u metanetwork: mle0 U physical devices: le2 le0 P A 3. Use apdb -C to commit the new database entries. # apdb -C 4. Use apconfig -N to view the new network entries in the database. Note that the U is now gone. # apconfig -N metanetwork: mle0 physical devices: le2 le0 P A

Alternate Pathing
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

A-311

A
Alternately Pathing the Primary Network Interface
The primary network interface between your system and the other machines on the network is difcult to congure down. There are three ways to solve this problem:
q

Create the appropriate AP database entries, create a new /etc/hostname.mxxx le or rename the corresponding /etc/hostname.xxx le, and then reboot your system. Set up a script le to perform the transition in your system without rebooting. Log in to your system from another network interface so that you can stay connected when the primary network interface is disabled. You can also execute these commands all on one line, separated with semi-colons. Ensure that you do not have any syntax errors. Remember to remove any /etc/hostname.qfe0 and /etc/hostname.qfe4 les, and add the /etc/hostname.mqfe0 le.

# # # #

ifconfig ifconfig ifconfig ifconfig

qe0 down unplumb qe4 down unplumb mqe0 plumb mqe0 inet 136.162.22.45 netmask + broadcast + up An example of a script to perform this operation is shown overleaf..

A-312

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

A
Alternately Pathing the Primary Network Interface
q

Generate a script to congure the qe0 and qe4 interfaces down, then congure up the meta-network interface. This method does not require you to reboot your system, but you will briey lose all communication over the primary network interface.

# ifconfig -a lo0: flags=849<UP,LOOPBACK,RUNNING,MULTICAST> mtu 8232 inet 127.0.0.1 netmask ff000000 qe0: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu 1500 inet 136.162.22.45 netmask ffffff00 broadcast 136.162.22.255 ether 0:0:be:0:8:c5 # cat > /tmp/washington.restart ifconfig qe0 down unplumb ifconfig qe4 down unplumb ifconfig mqe0 plumb ifconfig mqe0 inet 136.162.22.45 netmask + broadcast + up ^D # chmod 700 /tmp/washington.restart # nohup /tmp/washington.restart & # ifconfig -a lo0: flags=849<UP,LOOPBACK,RUNNING,MULTICAST> mtu 8232 inet 127.0.0.1 netmask ff000000 mqe0: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu 1500 inet 136.162.22.45 netmask ffffff00 broadcast 136.162.22.255 ether 0:0:be:0:8:c5 #

Boot Time Interface Failure


If the primary network path fails at boot time, AP will switch the primary interface to the other alternate. An automatic switch due to an error will not occur at any other time.

Alternate Pathing
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

A-313

A
Switching a Network Pathgroup
Remember that you can switch the active interface of a network pathgroup while the meta-interface is active. The change is recorded in the state databases. The new active path will be used until you switch back, even after a reboot. To switch the active interface, use the apconfig command. The change will occur immediately. There is no commit process for pathgroup switching. # apconfig -P mle0 -a le2 You can see that the switch has occurred by using the apconfig -N command. # apconfig -N metanetwork: mle0 physical devices:le2 le0

A P

Note Remember that switch operations take effect immediately; there is no commit process for them.

Warning When you switch interfaces, AP does not check that the interface you are going to is the correct path. AP does not know if the new interface is connected to the wrong subnet, disconnected, or inoperative.

A-314

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

Dynamic Reconguration

B-315
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

B
Introduction to Dynamic Reconguration
What Is Dynamic Reconguration?
Dynamic Reconguration (DR) is the ability to alter the conguration of a running system by bringing components online or taking them ofine without disrupting system operation or requiring a system reboot. With the availability of DR, system boards can be logically and physically included in the system conguration, or logically deactivated and removed while the system is running. DR is useful in mission-critical environments if a system board fails and must be replaced or if new system boards need to be added to the system for additional performance and capacity. It is a critical part of the concurrent maintenance strategy prevalent in the enterprise computing environment. Note DR capability requires that the system OBP be at revision 3.2.22 or later (refer to the prtconf -V command) and the operating system be Solaris 7 5/99 or later (refer to the /etc/release le).

Benets of DR
DR increases system availability and exibility by allowing the hotswap CPU/memory and I/O board functionality that the Sun Enterprise 3000-6000 server hardware has supported from the beginning. Hot-swap functionality means that the components can be physically and logically removed or added while the system is running. DR includes:
q

Dynamic attachment of system boards making them available for use without rebooting the system Dynamic detachment of system boards making them ready for physical removal without rebooting the system Display of board status Initiation of board testing

q q

B-316

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

B
Disadvantages of DR
The main disadvantage is that to dynamically add and remove CPU/Memory boards, you must set memory_interleaving to min, i.e. disable it, since dr can not handle memory spread across boards. This has a major impact on performance.

Supported Hardware
Table 2-1 lists the supported system board types that the cfgadm command displays. System I/O boards are classied by numerical type value. Table 2-1 Type CPU/mem Mem Disk board Type 1 Type 2 Type 3 Type 4 Type 5 DR Supported Boards

Name and Identifying Characteristics CPU/memory board with at least one CPU module CPU/memory board with no CPU module System board containing two SCSI disk drives SBus I/O board with 3 SBus slots and 2 FC-OM Graphics I/O with 1 UPA slot, 2 SBus slots and 2 FC-OM PCI+ I/O board with 2 PCI card adapter slots SBus+ I/O board with 3 SBus slots and 2 GBIC Graphics+ I/O with 1 UPA slot, 2 SBus slots and 2 GBICs.

Caution Do not assume that just because an I/O board will dr, the SBus cards on it will dr. For a complete list of supported hardware, refer to http://sunsolve5.sun.com/sunsolve/Enterprise-dr/

Dynamic Reconguration
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

B-317

B
Limitations to Dynamic Reconguration
Slot 1 board can not be removed
q

The slot 1 board provides the electrical path to devices on the clock board, and is normally the lowest-numbered working I/O board.

First CPU board can not be removed


q

This is due to the fact that the POST Master is also set up as the JTAG Master, and can not be drd since the JTAG Master controls the dr POST.

It is not too difcult to crash the system...


q

Inserting a failed board can immediately crash the system. Connecting a bad board that passes POST can also crash the system. Bending a pin when inserting a board can crash system. Hardware slots are not isolated. Inserting a board in too slowly can panic Solaris. If an interrupt is in ight when the pause pin is asserted during insert for more than one second, Solaris will panic.

Fails using 168MHz modules


q

POST fails during DR connect on 168 MHz machine. DR connect operation with a CPU/Memory board that has UltraSparc I modules can fail or take a long time.

Fails in single user mode


q

DR connect operation hangs in single user mode. DR connect operations performed in single user mode causes the system to hang.

B-318

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

B
Displaying Board Status
Basic Status Display using cfgadm
When used without options, the cfgadm command displays information about all known attachment points, the collective term for a board and its card cage slot (or receptacle). There are two types of system names for attachment points:
q

Physical attachment point Describes the software driver and location of the card cage slot. For example: /devices/central@1f,0/fhc@0,f8800000/clockboard@0,900000:sysctrl,slot0

Logical attachment point An abbreviated name created by the system to refer to the physical attachment point. For example: sysctrl0:slot0

DR displays the status of the slot, the board, and the attachment point. The DR denition of a board also includes the devices connected to it. The term occupant is used to refer to the combination of board and attached devices. The following display shows a typical cfgadm output: Ap_Id ac0:bank0 ac0:bank1 ac1:bank0 ac1:bank1 ac2:bank0 ac2:bank1 sysctrl0:slot0 sysctrl0:slot1 sysctrl0:slot2 sysctrl0:slot3 Receptacle connected empty connected empty connected empty connected connected disconnected connected Occupant configured unconfigured configured unconfigured configured unconfigured configured configured unconfigured configured Condition ok unknown ok unknown ok unknown ok ok unknown ok

Dynamic Reconguration
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

B-319

B
Displaying Board Status
The following is a lists the possible conditions of the receptacle and occupant. The LED assignments are as you look at them from left to right.

Receptacle Status Empty Disconnected

Explanation

No board is present in the slot. All LEDs are off. A board is present but is electrically disconnected. The system can identify the board type. The board is in low power mode and can be unplugged at any time. LED state off on off The board is electrically connected and powered up. The system is actively monitoring the board for temperature and cooling. LED state on off off

Connected

Occupant Status Congured

Explanation

Devices on the board are fully initialized and can be mounted or congured for use. LED state on off blink The uncongured state covers all device states that are not congured, including receptacles in the empty state. LED state on off off

Uncongured

B-320

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

B
Displaying Board Status

Conditions Unknown

Explanation The current condition cannot be determined. This situation results either when a new board is inserted in a running system, or a board is placed on the disabled board list before a reboot. A transition to a connected receptacle state changes an attachment point condition from unknown to either OK or Failed. No problems detected. This condition occurs only after a board has been connected. This condition persists either until the board is physically removed, or a problem is detected. An OK condition requires correct hardware compatibility, correct rmware revision, adequate power, adequate cooling, and adequate precharge. A failing condition occurs when a board that was in the OK condition develops a problem. The board has failed POST/OBP. A failed condition can occur either during bootup or after a failed connect attempt. This condition is considered uncorrectable and will persist until the board is physically removed. Either an attachment point has incompatible hardware, or an empty attachment point lacks power, cooling, or precharge current.

OK

Failing

Failed

Unusable

Dynamic Reconguration
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

B-321

B
Detailed Status Display using cfgadm -v
For a more detailed status report, use the command cfgadm -v. The -v option turns on expanded (verbose) descriptions. Figure B-1 shows a breakdown of each eld found in the output of the cfgadm -v command. The example shown is of a 64MB memory module.
Board operational condition Attachment point Slot electrical condition Board status Location

ac0:bank0 connected configured ok slot0 64mb base 0x00000000 May 1 13:00 memory n /devices/fhc@0,f8800000/ac@0,1000000/bank0

Board type

Board Activity (board not busy)

Physical ID and location

Figure B-1

Detail status display entry

B-322

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

B
Reconguration Considerations
Device Driver Interface DDI
For a device to fully conform to dr, it must comply with the following: The device driver must support DDI_ATTACH, DDI_DETACH and DDI_SUSPEND/RESUME. All drivers support DDI_ATTACH but not all drivers support DDI_DETACH and DDI_SUSPEND/RESUME. A dr detach must pause the operating system, i.e. quiesce it, and to do this the driver must be suspend-safe.

Suspend-Safe and Suspend-Unsafe Devices


A driver is suspend-safe if it supports operating system quiescence, that is, one that does not access memory or interrupt the system while the operating system is in quiescence (suspend/resume). It also guarantees that when a suspend request is successfully completed, the device that the driver manages does not attempt to access memory, even if the device is open when the suspend request is made. A suspend-unsafe device is one that allows a memory access or a system interruption while the operating system is in quiescence. Suspend-safe drivers allows you to:
q q q

Stop user threads. Execute the DDI_SUSPEND call in each device driver. Stop the clock and CPUs.

The operating system refuses a quiescence request if a suspend-unsafe device is open. To manually suspend the device, you will have to issue a modunload command.

Dynamic Reconguration
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

B-323

B
Testing for Suspend-Safe Drivers
The quiesce-test option tests for suspendable drivers. For example: # cfgadm -x quiesce-test sysctrl0:slot<number> Note All tape drivers are considered suspend-unsafe.

Hot-Plug Hardware
Hot-plug boards and modules have special connectors that supply electrical power to the board or module before the data pins make contact. Boards and devices that do not have hot-plug connectors cannot be inserted or removed while the system is running.

Caution Before inserting a board into the centreplane, it is essential that the precharge voltages are present. Ensure the PPS is supplying these voltages by typing: /usr/platform/sun4u/sbin/prtdiag -v | grep precharge

I/O boards and CPU/memory boards used in Enterprise x000 and x500 systems are hot-plug devices. Some devices, such as the clock board, are not hot-plug modules and cannot be removed while the system is running.

B-324

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

B
Permanent Memory Management
Certain parts of memory can not be paged out during a detach. This permanent memory includes the kernel and OBP. The kernel is loaded to high order memory during boot up. The kernel must be conned to one system board, a process known as caging the kernel. The only system board that can not be removed from an operating system is the board in the lowest numerical slot. It is recommended that steps be taken to force the kernel to load on that board so only one system board is restricted.

Required additions to /etc/system


The following entries must be added to the /etc/system le.The following enables dr on I/O boards: set soc:soc_enable_detach_suspend=1 set pln:pln_enable_detach_suspend=1

The following enables dr on CPU/Memory boards: set kernel_cage_enable=1

Dynamic Reconguration
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

B-325

B
Procedures - Removing a CPU/Memory Board

Note Performing the following board removal procedures is the responsibility of the system administrator. However, it is important for you to understand these procedures in order to assist where possible. The memory modules on a CPU/memory board can be shared by other CPU/memory boards. Therefore, you must halt all use of memory modules on a board before you can remove the board. 1. Log into the system console as root. 2. Use the cfgadm command to determine the system name for the CPU/memory board and associated memory banks. Note A CPU/memory board can have up to two banks of memory. Memory banks have logical names of the form acnumber:banknumber. The term acnumber identies the driver instance, but the number is not directly related to the board slot number. The banknumber is either bank0 or bank1.

Note For the example in this procedure, the board is ac1, which has one memory bank (bank1). Also, verify that you can relocate the memory modules on the CPU board. # cfgadm -v You cannot uncongure non-relocatable memory pages in the memory span (a section of memory that is reserved for system use). Non-relocatable memory is identied as permanent in a cfgadm listing. 3. If the memory is relocatable, stop all activity in the memory modules on the board. # cfgadm -c unconfigure ac1:bank1

B-326

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

B
This step halts all accesses by other CPU/memory boards and prevents any further memory use until the board is replaced. 4. Verify that the CPUs on the board are not bound to any processes running in the system. If a CPU is bound to a process, the board cannot be removed until the process is unbound. The CPUs are identied by numbers that are related to the board number. The rst CPU number is twice the board number (2*n). The second CPU number is twice the board number, plus one (2*n + 1). To list all bound processes, use the pbind command. If any of the listed processes show the CPUs in question, the related boards cannot be removed until those processes are unbound. The following example shows that process ID 1145 is bound to processor 10 (board number 5, CPU 0). The pbind -u (unbind) command unbinds the process. The pbind -q (query) command shows that process ID 1145 is no longer bound. # pbind process id 1145: 10 # pbind -u 1145 # pbind -q 1145 process id 1145: not bound 5. Uncongure the board. # cfgadm -c unconfigure sysctrl0:slot<number> where slot<number> is the slot location (number) in the card cage. 6. If the previous step did not also disconnect the board, disconnect the board by typing the following command: # cfgadm -c disconnect sysctrl0:slot<number> 7. When the LEDs on the board indicate that the board is ready for removal (two outer LEDs off and the middle LED on), you can physically remove the board.

Dynamic Reconguration
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

B-327

Caution If a replacement board is not available and you remove the board, you must ll the empty slot to maintain the proper ow of cooling air in the card cage. For Sun Enterprise 3000, 3500, 4000, 4500, 5000, and 5500 systems, use a dummy board (part number 504-2592). For Sun Enterprise 6000 or 6500 systems, use a load board (part number 501-3142).

B-328

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

B
Procedures - Installing or Replacing a CPU/Memory Board
1. Verify that the selected board slot can accept a board. # cfgadm The states and conditions should be:
w

Empty, Uncongured, Unknown

2. Physically insert the board into the slot and watch for an acknowledgment on the system console or in the system log le. The acknowledgment is of the form, Name inserted into slot<number> where Name is the name of the system board being installed and <number> is the slot location (number) in the card cage. After a CPU/memory board is inserted, the states and conditions should become:
w

Disconnected, Uncongured, Unknown

Note Any other states or conditions should be considered an error.

Dynamic Reconguration
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

B-329

B
Operating System Quiescence
During an insertion operation of a board, the operating system is briey paused, which is known as operating system quiescence. All operating system and device activity on the backplane must cease for a few seconds during a critical phase of the operation. You must reply with a yes to continue or no to stop the conguration process and allow the operating system to continue operating normally. Before quiescence can be achieved, the operating system must temporarily suspend all processes, CPUs, and device activities. If the operating system cannot achieve quiescence, it displays the reasons, which can include the following:
q q q

A user thread did not suspend Real-time processes are running A device exists that cannot be paused by the operating system

The conditions that cause processes to fail to suspend are generally temporary. Examine the reasons for the failure. If the operating system encountered a transient condition causing a failure to suspend a process, you can try the operation again. 3. Congure the board. # cfgadm -v -c configure sysctrl0:slot<number> This command should both connect and congure the receptacle. Us the cfgadm command to verify this. The states and conditions for a connected and congured attachment point should be:
w

Connected, Congured, OK

Now the system is aware of the usable devices on the board and the devices can be used. 4. Congure the memory devices on the board in Solaris. # drvconfig -i ac 5. Determine the system numbers of the new CPU modules.

B-330

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

B
# psrinfo 4 5 6 on-line on-line powered-off since 5/15/99 08:01:14 since 5/15/99 08:01:19 since 5/16/99 09:27:21

In this example, there is one new CPU module (system number 6). The module has not yet been enabled, so it is listed as being powered off. The system number for a CPU is equal to twice the board number, plus 0 for CPU module 0, or 1 for CPU module 1. In the example shown, system number 6 represents module 0 on board number 3. 6. Enable the new CPU module or modules. # psradm -n 6 where 6 is the system number of the CPU module to be enabled. 7. Test the new memory banks. # cfgadm -o test_type -t acnumber:bank0 # cfgadm -o test_type -t acnumber:bank1 where test_type is one of three memory tests:
w w w

Quick Writes a pattern of ones and zeros. Normal Detects specic memory address failures. Extended Tests interference between memory cells.

Note The acnumber can be found in the basic or detailed status display. 8. Congure the new memory banks. # cfgadm -c configure acnumber:bank0 # cfgadm -c configure acnumber:bank1 9. Verify that the board and the memory banks are congured.
w w

For the CPU status, use the psrinfo or mpstat commands. For the memory status, use the prtconf or vmstat commands.

Dynamic Reconguration
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

B-331

B
Procedures - Removing an I/O Board
This procedure assumes that all activity going to the I/O board to be removed has been stopped, le systems have been unmounted, and network interfaces have been shut down. Or, if AP is in use, all I/O functions have been switched to the alternate I/O board. 1. Verify that all I/O activity to the board has been terminated. 2. Check the status of the board. # cfgadm For a board removal or replacement, the states and conditions must be one of the following sets: If the board is ok, state is:
w

Connected, Congured, OK

If the board is failing, state is:


w

Connected, Congured, Failing

3. Uncongure the board. # cfgadm -c unconfigure sysctrl0:slot<number> 4. Use the cfgadm command to conrm that the board is uncongured. If the uncongure operation failed, verify that:
w w

The board is Detach-Safe. Activity on the board has been quiesced.

Caution A failure of step 4 results in a partially uncongured condition. If this happens, attempt to uncongure again. A conguration operation is not permitted at this point. 5. When the board is uncongured, you can do one of the following:

B-332

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

B
w w w

Leave the board in the system uncongured Congure the board Disconnect the board manually, if the unconguration operation did not do so automatically by typing the following command: # cfgadm -v -c disconnect sysctrl0:slot<number>

6. If you wish to remove the board from the card cage, rst verify the board status.
w

Use the cfgadm command to verify that the board is logically disconnected. Check the LEDs on the board to verify that the board is electrically disconnected. The two outer LEDs must be off and the middle LED must be on.

Caution If a replacement board is not available and you remove the board, you must ll the empty slot to maintain the proper ow of cooling air in the card cage. For Sun Enterprise 6000 or 6500 systems, use a load board (part number 501-3142), for all other systems use a dummy board (part number 504-2592).

Dynamic Reconguration
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

B-333

B
Procedures - Removing Boards that Use Detach-Unsafe Drivers
Some drivers do not yet support DR on Sun Enterprise 3x00, 4x00, 5x00, and 6x00 systems. DR cannot detach these drivers, but you can remove some undetachable drivers manually. 1. Halt all use of the device controller. 2. Halt the use of all other controllers of the same type on all boards in the machine. The remaining controllers can be used again after the DR uncongure operation is complete. 3. Use Unix commands to manually close all such drivers on the board and use the modunload command to unload them. # modinfo | grep tape 107 f66a0000 dfe9 33 1 st (SCSI tape driver 1.1) # modunload -i 107 # 4. Disconnect the board. # cfgadm -c disconnect sysctrl0:slot<number> The disconnected board can be physically removed now or at a later time. Caution Many third-party drivers (those purchased from vendors other than Sun Microsystems) do not yet properly support the standard Solaris software modunload interface. Test these driver functions during the qualication and installation phases of any thirdparty device.

B-334

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

B
Procedures - Installing a New I/O Board
1. Verify that the selected board slot is ready for a board. # cfgadm The states and conditions should be:
w

Empty, Uncongured,Unknown

2. Physically insert the board into the slot and look for an acknowledgment on the console in the form of Name board inserted into slot<number> After an I/O board is inserted, the states and conditions should become:
w

Disconnected,Uncongured,Unknown

Note Any other states or conditions should be considered an error. 3. Connect any peripheral cables and interface modules to the board. 4. Congure the board with the command. # cfgadm -v -c configure sysctrl0:slot<number> Note This command should both connect and congure the receptacle. 5. Verify with the cfgadm command. The states and conditions for a connected and congured attachment point should be
w

Connected, Congured, OK

Now the system is also aware of the usable devices that reside on the board and all devices that can be mounted or congured to be used.

Dynamic Reconguration
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

B-335

B
If the command fails to connect and congure the board and slot, try the connection and conguration as separate steps: a. Connect the board and slot by typing the following: # cfgadm -v -c connect sysctrl0:slot<number> The states and conditions for a connected attachment point should be:
w

Connected, Uncongured, OK Now the system is aware of the board, but not the usable devices which reside on the board. Temperature is monitored and power and cooling affect the attachment point condition.

b. Congure the board and slot by typing the following: # cfgadm -v -c configure sysctrl0:slot<number> The states and conditions for a congured attachment point should be:
w

Connected, Congured, OK Now the system is also aware of the usable devices that reside on the board and all devices that can be mounted or congured.

6. Recongure the devices on the board. # drvconfig; devlinks; disks; ports; tapes; Reconguring the system normally falls under one or more of the following categories:
q

Board removal If you remove a board that is not to be replaced, you can (but do not have to) execute the reconguration sequence to clean up the /dev links for disk devices. Board change If you remove a board and then insert it into a different slot, or replace a board with another board that has different I/O devices, you must execute the reconguration sequence to congure the I/O devices associated with the board.

B-336

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

B
q

Board installation When adding a board, you must execute the reconguration sequence to congure the I/O devices associated with the board. Board replacement If you replace a board with another board that hosts the same set of I/O devices, inserting the replacement into the same slot, you might not need to execute the reconguration sequence. The console should display a list of devices and their addresses.

7. Activate the devices on the board using commands, such as mount and ifconfig, as appropriate.

Dynamic Reconguration
Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

B-337

B
Procedures - Installing a Replacement I/O Board
This procedure assumes that you have previously performed the Removing an I/O Board procedure discussed earlier in this module. 1. If you are not continuing from the procedure Removing an I/O Board, use the cfgadm command and select a card cage slot to use, but do not insert the board yet. 2. View the conguration list and verify that the slot is uncongured. # cfgadm 3. Insert the board in the slot and look for an acknowledgment on the console, such as: Name board inserted into slot<number>. 4. Use the cfgadm command again to look for the system name assigned to the new board. 5. Congure the board using the system name for the board. # cfgadm -c configure sysctrl0:slot<number> 6. Congure any I/O devices on the board using commands, such as drvconfig and devlinks, as appropriate. 7. Activate the devices on the board using commands, such as mount and ifconfig, as appropriate.

B-338

Sun Enterprise Server Maintenance


Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E

Vous aimerez peut-être aussi