Unix SysAdmin Archives: June 2011

Thursday, 30 June 2011

Where is the mac address located physically?

Here is a quick hint.

For SPARC systems, it is in the NVRAM or the Non Volatile Random Access Memory. The banner command read it from here.
There are times you will find values like ff:ff:ff:ff:ff:ff. This could probably mean that there is a problem reading it from the NVRAM. Perhaps the NVRAM can be defective.
In other SPARC systems, there is a separate chip wherein the mac address is written.

For x86 systems, it is only in the NIC or the Network Interface Card that the mac address is written. No special chip or separate memory holds this information.
Now, what if it's a new machine and you need to know the mac address. Of course you cannot run ifconfig during this occasion. It is easy. Just power on the box and press F12 as it prompts you with bios setup. It will broadcast the mac address. But there is a requirement; the box should be equipped with updated PXE boot environment.

Tuesday, 28 June 2011

How to change timeout value on GRUB

When your x86 Solaris system boots, BIOS loads boot loader (GRUB) from boot device.Then GRUB takes control of the booting.

The only issue with this is that the default timeout is only 10 seconds. You may want to decrease or increase this amount.

This is quite simple. Just open up the /boot/grub/menu.lst file in your favorite text editor. I’m using vi:

# vi /boot/grub/menu.lst

Now find the section that looks like this:

# menu timeout in second before default OS is booted
# set to -1 to wait for user input
timeout 10

The timeout value is in seconds. You can set it to -1 which stands for user input.

Save the file, and when you reboot the change will be set.

How to create ASM links in Solaris hosts

Here is a quick guide in creating ASM links in Solaris boxes.
First thing to do is to check that ASM instance is running. If it is running have the name of disks/LUNs ready.
Do not forget to turn on the MPXIO on Solaris 10 servers by running the following command.

#stmsboot –e

Note that this will reboot the system if the MPXIO is off.

Make sure the disks are not part of any SVM meta devices, VXFS disk group or not in used by any ZPOOL.
Use appropriate commands to verify.

Now, find the physical device path for each disk.
#ls –l /dev/rdsk/|grep <disk name>
For example:
# ls -l /dev/rdsk/ | grep c4t600A0B8000562790000005D04998C446d0
lrwxrwxrwx   1 root     root          64 Feb 16 02:10 c4t600A0B8000562790000005D04998C446d0s0 ->

../../devices/scsi_vhci/ssd@g60060e800542f000000042f000001290:a,raw
lrwxrwxrwx   1 root     root          64 Feb 16 02:10 c4t600A0B8000562790000005D04998C446d0s1 ->

../../devices/scsi_vhci/ssd@g60060e800542f000000042f000001290:b,raw

The idea is to figure out which partition points to what physical device path.

Make sure none of the links under /dev/ora_rdsk are pointing to the disks that you are going to use.
Look for both physical and logical name of the disk.

# ls -l /dev/ora_rdsk/ | grep c4t600A0B8000562790000005D04998C446d0

# ls -l /dev/ora_rdsk/ | grep ssd@g600a0b8000562790000005d04998c446

Now partition the disk in such a way that the slot 0 gets started from the sector 256 to the end.
Here is the trick, create a temp zpool using the disks.

#zpool create tmp <disk name>

Then destroy the pool.

#zpool destroy tmp

This works better than the format command.

This will create 9 slices for your disk. The first slice (slice 0) is the one that must be used for ASM links.

Create the symbolic links to the physical devise names for the slice 0 of the disks under /dev/ora_rdsk for the requested ASM links.

#ln –s ../../devices/scsi_vhci/ssd@g600a0b8000562790000005d04998c446:a,raw /dev/ora_rdsk/VOLUME_NAME

Change the ownership of the physical device to oracle:dba, the default owner is root:sys.

# ls -lhL /dev/rdsk/c4t600A0B8000562790000005D04998C446d0s0
crw-r-----   1 root     sys      118, 64 Feb 16 02:10 /dev/rdsk/c4t600A0B8000562790000005D04998C446d0s0
# chown oracle:dba /dev/rdsk/c4t600A0B8000562790000005D04998C446d0s0

#chown oracle:dba /devices/scsi_vhci/ssd@g600a0b8000562790000005d04998c446:a,raw

# ls -lhL /dev/rdsk/c4t600A0B8000562790000005D04998C446d0s0
crw-r----- 1 oracle dba 118, 64 Feb 16 03:00 /dev/rdsk/c4t600A0B8000562790000005D0

You may now ask your DBA to verify.

Monday, 27 June 2011

How to roll back a patch

Let's say you finally realize that the patch you have installed in your box is not doing anything good. Here is a sort roll back instruction. Hopefully you have not attached your secondary mirror yet or else it is too late for this reliever.

Note: The naming conventions defends on you SVM disks naming standards.

Mount the secondary disk

# mount /dev/dsk/cxtxdxsx /mnt

Edit system file on the secondary disk. Comment out the lines that start with rootdev and set md.

# vi /mnt/etc/system

i.e. set md:mirrored_root_flag=1
rootdev:/pseudo/md@0:0,0,blk

Backup your vfstab file.
# /mnt/etc/vfstab /mnt/etc/vfstab.md

And edit the file to remove all references to /dev/md devices replacing them with the equivalent non-encapsulated devices.

Then unmount the /mnt and boot from secondary disk.
Then re-create metadevices. Destroy the original mirror and re-create the volumes.
This time the secondary disk as the primary sub-mirror.

# metaclear -r d0
# metaclear -r d6
# metainit d0 -m d20
# metainit d6 -m d26
# metaroot d0

Reboot again from secondary mirror then re-create the metadevice for the primary disk.

# metainit d10 1 1 cxtxdxs0
# metainit d16 1 1 cxtxdxs6

Then re-attach the metadevices to resync.

# metattach d0 d10
# metattach d6 d16

After the resync, your back to the original patch level.

Patching a global zone with more than 3 local zones

Just incase you are patching a global zone with more than 3 local zones. If this is the case, we should detach all of the zones from the container before patching. There is a chance you might encounter some errors while re-attaching resulting to an incomplete state of the zones. One of the cause might be full file system. Make a quick check abd perform cleanup before re-attaching. If it fails, here is a workaround.

globalsys# zoneadm -z zone-sys1 attach -u
Getting the list of files to remove
Removing 5 files
Remove 7 of 7 packages
Installing 594 files
Add 192 of 192 packages
Installation of these packages generated warnings: SUNWzfsr SUNWzfsu SUNWzoner SUNWzoneu
Updating editable files
pkgserv: ERROR: pkglog is not complete
pkgserv: ERROR: Ignoring 5889 bytes from log
pkgserv: ERROR: cannot rewrite the contents file
The file </var/sadm/system/logs/update_log> within the zone contains a log of the zone update.
cat: output error (0/240 characters written)
No space left on device
zoneadm: zone 'zone-sys1': '/etc/release' failed with exit code 2.
could not update zone

globalsys# zoneadm -z zone-sys1 attach -u
zoneadm: zone 'zone-sys1': zone is incomplete; uninstall required.

globalsys# zoneadm list -vic
ID NAME             STATUS     PATH                           BRAND    IP
   0 global           running    /                              native   shared
   - zone-sys1          incomplete /zones/zone-sys1                 native   shared
   - zone-sys2          installed /zones/zone-sys2                 native   shared
   - zone-sys3          configured /zones/zone-sys3                 solaris9 shared
   - zone-sys4          configured /zones/zone-sys4                 native   shared
   - zone-sys5          configured /zones/zone-sys5                 solaris9 shared
   - zone-sys6          configured /zones/zone-sys6                 native   shared
   - zone-sys7          configured /zones/zone-sys7                 solaris9 shared
   - zone-sys8          configured /zones/zone-sys8                 native   shared
globalsys# cd /zones/zone-sys1

globalsys# ls
SUNWdetached.xml lost+found        root
dev               lu

Move the SUNWdetached.xml file somewhere safe.

globalsys# mv SUNWdetached.xml SUNWdetached.xml_bak.zone-sys1
globalsys# mv SUNWdetached.xml_bak.zone-sys1 /var/tmp

Then make a copy of the /etc/zones/index file.

globalsys# cp /etc/zones/index /etc/zones/index.incomplete

**Now, edit the /etc/zones/index. Change the line containing the "incomplete" keyword and replace it with "installed".

globalsys# vi /etc/zones/index

# Copyright 2004 Sun Microsystems, Inc. All rights reserved.
# Use is subject to license terms.
#
# ident "@(#)zones-index        1.2     04/04/01 SMI"
#
# DO NOT EDIT: this file is automatically generated by zoneadm(1M)
# and zonecfg(1M). Any manual changes will be lost.
#
global:configured:/:
zone-sys1:incomplete:/zones/zone-sys1:66e6b6eb-8666-ebe6-ebe8-b6fb6bf66868
zone-sys2:installed:/zones/zone-sys2:f6b66668-6866-eb6e-eeb6-8b66f6bb66fb
zone-sys3:installed:/zones/zone-sys3:ef66e8e6-688b-e6b8-bfbe-866e8f666f6e
zone-sys4:installed:/zones/zone-sys4:66b66b66-68e6-668f-ebe6-eb6bb68efbf6
zone-sys5:installed:/zones/zone-sys5:6e666bf6-8bbb-6688-b6b6-b66eebb666be
zone-sys6:installed:/zones/zone-sys6:6e6b6e6f-6eb8-6b68-b66e-ee686bebbe66
zone-sys7:installed:/zones/zone-sys7:6bf6e6e6-8f6f-66b6-e66e-b6668e66febb
zone-sys8:installed:/zones/zone-sys8:e6666b6b-86ee-e6e6-bbb6-86e6eb66f686

Now, detach the zone to put it in configured state

globalsys# zoneadm -z zone-sys1 detach
globalsys# zoneadm list -vic
ID NAME             STATUS     PATH                           BRAND    IP
   0 global           running    /                              native   shared
   - zone-sys2          installed /zones/zone-sys2                 native   shared
   - zone-sys1          configured /zones/zone-sys1                 native   shared
   - zone-sys3          installed /zones/zone-sys3                 solaris9 shared
   - zone-sys4          installed /zones/zone-sys4                 native   shared
   - zone-sys5          installed /zones/zone-sys5                 solaris9 shared
   - zone-sys6          installed /zones/zone-sys6                 native   shared
   - zone-sys7          installed /zones/zone-sys7                 solaris9 shared
   - zone-sys8          installed /zones/zone-sys8                 native   shared

That's it. Try to reattach the zone again. It should be good by now.

globalsys# zoneadm -z zone-sys1 attach -u
Getting the list of files to remove
Removing 5 files
Remove 8 of 8 packages
Installing 6 files
Add 9 of 9 packages
Updating editable files
The file </var/sadm/system/logs/update_log> within the zone contains a log of the zone update.
globalsys# zoneadm list -vic
ID NAME             STATUS     PATH                           BRAND    IP
   0 global           running    /                              native   shared
   - zone-sys2          installed /zones/zone-sys2                 native   shared
   - zone-sys1          installed /zones/zone-sys1                 native   shared
   - zone-sys3          installed /zones/zone-sys3                 solaris9 shared
   - zone-sys4          installed /zones/zone-sys4                 native   shared
   - zone-sys5          installed /zones/zone-sys5                 solaris9 shared
   - zone-sys6          installed /zones/zone-sys6                 native   shared
   - zone-sys7          installed /zones/zone-sys7                 solaris9 shared
   - zone-sys8          installed /zones/zone-sys8                 native   shared

Wednesday, 22 June 2011

Paging vs Swapping

These two terms are associated with the virtual filesystem.

In the past, the cost of RAM (physical memory) is relatively high compared to the cost of hard disk drives.

During then, it was almost always preferrable to use the virtual file system.

In a nut shell, virtual filesystem is combining the use of physical RAM and a particular slice or file in the hard disk as the total memory.

In the event the system runs out of physical memory and a more important job is waiting to be executed it transfer some of its load in the RAM to the swap memory.

The term paging and swapping refers to the transfer of data from RAM to the HDD. They only differ on the bulk of data they transfer. In paging only portion on the data in the memory is being transferred while in swapping ALL data is being transfered.

Paging happens in normal operation. In many cases, it is totally inevitable not to page some of the memory contents to the swap space. Especially, on development servers where in the coreadm is using the swap space for its corefiles.

On the other hand, swapping happens when there is an abnormal event in the system. This happens very rarely, and if it does there must be something wrong with the configuration of the server.

Its very unlikely for a unix box to experience swapping.

Monday, 20 June 2011

SVR4

Just incase you are reading the admin guide and came across the term SVR4, here is a quick wrap-up of the story.

SVR4 - System V Release 4

Unix System V, commonly abbreviated SysV (and usually pronounced—though rarely written—as "System Five"), is one of the first commercial versions of the Unix operating system. It was originally developed by American Telephone & Telegraph (AT&T) and first released in 1983. Four major versions of System V were released, termed Releases 1, 2, 3 and 4. System V Release 4, or SVR4, was commercially the most successful version, being the result of an effort, marketed as Unix System Unification, which solicited the collaboration of the major Unix vendors. It was the source of several common commercial Unix features.

While AT&T sold their own hardware that ran System V (see AT&T Computer Systems), most customers ran a version from a reseller, based on AT&T's reference implementation. A standards document called the System V Interface Definition outlined the default features and behavior of implementations. The most widely used versions of System V today are IBM's AIX, based on System V Release 3, and Sun's Solaris and Hewlett-Packard's HP-UX, both based on System V Release 4.

In the 1980s and early-1990s, System V was considered one of the two major "flavors" of UNIX, the other being Berkeley Unix (BSD). During the period of the Unix wars System V was known for being the primary choice of manufacturers of large multiuser systems, in opposition to BSD's dominance of desktop workstations. However, with standardization efforts such as POSIX and the commercial success of Linux, this generalization is not as accurate as it once was.
Read the rest...

Enabling MPXIO on Solaris 10

Solaris MPxIO provides a multipathing solution for storage devices accessible through multiple physical paths. Multipathing provides the ability to set up multiple redundant paths to a storage system and gives you the benefits of load balancing and failover. MPxIO was initially delivered on SPARC Systems mostly on fibre attached storage. It abstracts physical paths to a device, providing unified access via a single virtual device. Physical path failures are transparently recovered, so failures are not exposed to the applications using the virtual device.

The scsi_vhci driver registers with the MPxIO framework. It is the virtual host controller providing the abstraction for SCSI protocol devices.

To enable it, edit the file /kernel/drv/fp.conf file.

mpxio-disable="yes";

Change yes to no and it will be enabled:

sys01# cat /kernel/drv/fp.conf
#
# Copyright 2006 Sun Microsystems, Inc. All rights reserved.
.
.<truncated>
.
mpxio-disable="no";
disable-sata-mpxio="no";

Then issue this command.

sys01# stmsboot -e

Following this enabling, you are prompted to reboot. During the reboot, vfstab and the dump configuration will be updated to reflect the device name changes.

There are three ways to check that everything is fine.

1. Look for log messages in /var/adm/messages.

As the machine comes up, you should see a message like:

Dec 18 11:42:24 sys01 mpxio: [ID 669396 kern.info]
/scsi_vhci/ssd@g60060e8005b237000000b3480000087c (ssd11)
multipath status: optimal, path /pci@9,600000/SUNW,qlc@1/fp@0,0
(fp1) to target address: 60070f9006c34826,0 is online.
Load balancing: round-robin

If everything's fine, you should see an optimal status. If there's a problem you should see a degraded status.
Watch out for messages like "lost all paths", we dont want to see that.

2. Using mpathadm command. Just check the "Path State:", it will say if its ok or not.

sys01# mpathadm show lu /dev/rdsk/c3t60060E8005B237000000B3480000087cd0s2
Logical Unit: /dev/rdsk/c3t60060E8005B237000000B3480000087cd0s2
        mpath-support: libmpscsi_vhci.so
        Vendor: HITACHI
        Product: OPEN-V      -SUN
        Revision: 6007
        Name Type: unknown type
        Name: 60060e8005b237000000b3480000087c
        Asymmetric: no
        Current Load Balance: round-robin
        Logical Unit Group ID: NA
        Auto Failback: on
        Auto Probing: NA

        Paths:
                Initiator Port Name: 10000000d082a05c
                Target Port Name: 60070f9006c34826
                Override Path: NA
                Path State: OK
                Disabled: no

                Initiator Port Name: 10000000d082a05b
                Target Port Name: 50060e8005b23705
                Override Path: NA
                Path State: OK
                Disabled: no

        Target Ports:
                Name: 60070f9006c34826
                Relative ID: 0

                Name: 50060e8005b23705
                Relative ID: 0

3. Using luxadm command. Just check the "State:", it will say if its online or not.

sys01# luxadm display /dev/rdsk/c3t60060E8005B237000000B3480000087cd0s2
DEVICE PROPERTIES for disk: /dev/rdsk/c3t60060E8005B237000000B3480000087cd0s2
Vendor:               HITACHI
Product ID:           OPEN-V      -SUN
Revision:             6007
Serial Num:           60 0C348087C
Unformatted capacity: 41679.141 MBytes
Write Cache:          Enabled
Read Cache:           Enabled
    Minimum prefetch:   0x0
    Maximum prefetch:   0x0
Device Type:          Disk device
Path(s):

/dev/rdsk/c3t60060E8005B237000000B3480000087cd0s2
/devices/scsi_vhci/ssd@g60060e8005b237000000b3480000087c:c,raw
   Controller           /devices/pci@0/pci@0/pci@8/SUNW,emlxs@0,1/fp@0,0
    Device Address              60070f9006c34826,1
    Host controller port WWN    10000000d082a05c
    Class                       primary
    State                       ONLINE
   Controller           /devices/pci@0/pci@0/pci@8/SUNW,emlxs@0/fp@0,0
    Device Address              50060e8005b23705,1
    Host controller port WWN    10000000d082a05b
    Class                       primary
    State                       ONLINE

Sunday, 19 June 2011

T6320 Host Power Failure

We have a SunMicro T6320 that rebooted a couple of times.

reciosys01# last reboot | more
reboot    system boot                   Mon Mar 28 09:45
reboot    system down                   Mon Mar 28 09:38
reboot    system boot                   Mon Mar 28 08:44
reboot    system down                   Mon Mar 28 08:37

The problem the we cannot find anything in the /var/adm/messages that indicates the cause of the reboot.
A day before, there is a replacement of an emulex card in this box. But there is no error message that links to this change.

Here is the logs from /var/adm/messages:

Mar 28 09:37:08 reciosys01 inetd[411]: [ID xxxxxx daemon.notice] uptmagnt[xxxxx] from xx.xx.xx.xx xxxxx
Mar 28 09:37:09 reciosys01 inetd[411]: [ID xxxxxx daemon.notice] uptmagnt[xxxxx] from xx.xx.xx.xx xxxxx
Mar 28 09:38:40 reciosys01 inetd[411]: [ID xxxxxx daemon.notice] bgssd[xxxxx] from xx.xx.xx.xx xxxxx
Mar 28 09:44:50 reciosys01 genunix: [ID xxxxxx kern.notice] ^MSunOS Release 5.10 Version Generic_142909-17 64-bit
Mar 28 09:44:50 reciosys01 genunix: [ID xxxxxx kern.notice] Copyright (c) 1983, 2010, Oracle and/or its affiliates. All rights reserved
.
Mar 28 09:44:50 reciosys01 genunix: [ID xxxxxx kern.info] Ethernet address = x:xx:xx:xx:xx:xx
Mar 28 09:44:50 reciosys01 unix: [ID xxxxxx kern.info] NOTICE: Kernel Cage is ENABLED
Mar 28 09:44:50 reciosys01 unix: [ID xxxxxx kern.info] mem = 66977792K (0xff8000000)
Mar 28 09:44:50 reciosys01 unix: [ID xxxxxx kern.info] avail mem = 66732310528

Somehow we managed to check the event logs from the SP thru the ILOM.
And we found this specific error "Host Power Failure: MB_DC_POK Fault".
I'm thinking that this is somewhat related to power supply. The voltage output might not be at its expected levels.

-> cd /SP/logs/event
/SP/logs/event

-> show list

/SP/logs/event/list
    Targets:

    Properties:

    Commands:
        cd
        show

ID     Date/Time                 Class     Type      Severity
----- ------------------------ -------- -------- --------
70701 Mon Mar 28 01:41:50 2011 Chassis   Log       major
       Host is running
70700 Mon Mar 28 01:38:20 2011 Fault     Repair    minor
       SP detected fault cleared at time Mon Mar 28 01:38:18 2011. Host Power: M
       B_DC_POK is OK
70699 Mon Mar 28 01:37:14 2011 Chassis   Log       major
       Host has been powered on
70698 Mon Mar 28 01:37:03 2011 Chassis   Log       critical
       Host has been powered off
70697 Mon Mar 28 01:37:03 2011 Chassis   Log       major
       Power cycling Host System. Please wait.
70696 Mon Mar 28 01:37:01 2011 Fault     Fault     critical
       SP detected fault at time Mon Mar 28 01:37:01 2011. Host Power Failure: M
       B_DC_POK Fault
Paused: press any key to continue, or 'q' to quit

We tried to search for related incidents in the web but there is no specific cases for T6320.
We found something for T6340, "False Power Failure Faults Might Be Reported (CR 6895793)" but it is during POST or SunVTS Memory Testing.

This is not quite related.

Since there is a recent change on this box, it's a good idea to ask our vendor about this. Somehow it might be related. We update the service request for the emulex replacement with this problem inquiry.

Hopefully on our next update, we will have a better picture of this problem.

Embedded Lights Out Manager (ELOM)

The ELOM provides a dedicated system of hardware and supporting software that enables you to manage your server independent of an operating system, and in lowpower situations. The Sun Fire X4150 and X4450 Servers are shipped with installede ELOM.

The ELOM is composed of four components:
- Web-based interface (requires JavaR v5 or later)
- Command-line Interface (accessed via serial or ethernet using ssh)
- IPMI v2
- SNMP v3

You can access the ELOM using a web browser, secure shell (SSH), or via the Sun Fire server’s serial port. Your server’s default network setting is configured as DHCP for easy access via a web browser or SSH, and the ELOM output is directed by default to the serial port.

ELOM Common Tasks
- Redirect the system graphical console to a remote client web browser.
- Connect a remote diskette drive to the system as a virtual diskette drive.

(Web based interface only)
- Connect a remote CD-ROM drive to the system as a virtual CD-ROM drive.

(Web based interface only)
- Monitor system fans, temperatures, and voltages remotely.
- Monitor system BIOS messages remotely. (Not available on SNMP)
- Monitor system operating system messages remotely.

(Not available on SNMP)
- Interrogate system components for their IDs and serial numbers.

(Not available on Web based interface)
- Redirect the system serial console to a remote client. (IPMI and CLI only)
- Monitor system status (health check) remotely.
- Interrogate system network interface cards remotely for MAC addresses.

(Not available on SNMP)
- Manage user accounts remotely. (Not available on SNMP)
- Manage system power status remotely (power on, power off, power reset).

(Not available on SNMP)
- Monitor and manage environmental settings for key system components

(CPUs, motherboards, fans).

The ELOM is shipped with one preconfigured administrator account:
User name: root
Password: changeme

Fundamental Commands

exit --> Log out of the CLI.

version --> Display the version of the ELOM firmware running on the SP.

help --> Display information about commands and targets.

help show --> Display information about a specific command.

create /SP/users/user1 --> Add a local user.

set /SP/users/username Password=password --> Set or change password.

set /SP/users/username Permission=operator|administrator|user|callback --> Set or change permission.

delete /SP/users/user1 --> Delete a local user named user1.

set /SP/users/user1 Permission=operator --> Change the permission level of alocal user named user1.

show /SP/AgentInfo/PET/Destination1 --> Display information about PET alerts Destination1.

set /SP/AgentInfo/PET/Destination[n] IPAddress=ipaddress --> Change alert configuration.

set /SYS/CtrlInfo PowerCtrl=on --> Start the host system.

set /SYS/CtrlInfo PowerCtrl=off --> Stop the host system.

set /SYS/CtrlInfo PowerCtrl=gracefuloff --> Stop the host system gracefully.

set /SYS/CtrlInfo PowerCtrl=reset --> Reset the host system.

start /SP/AgentInfo/Console --> Start a session to connect to the host console.

stop /SP/AgentInfo/Console --> Stop the session connected to the host console.

Thursday, 16 June 2011

Sun Remote System Control (RSC)

The RSC is a server management tool that allows you to monitor and control your server over modem lines and over a network. RSC provides remote system administration for geographically distributed or physically inaccessible systems. The RSC software works with the System Service Processor (SSP) on the main logic board. RSC and the SSP support both serial and Ethernet connections to a remote console.

Once RSC software is installed and configured to manage your server, you can use it to run diagnostic tests, view diagnostic and error messages, reboot your server, and display environmental status information from a remote console.

The RSC firmware on the SSP runs independently, and uses standby power from the server. Therefore, SSP hardware and RSC software continue to be effective when the
server operating system goes offline, and can send notification of hardware failures or other events that may be occurring on your server.

RSC has the following features:
- Remote system monitoring and error reporting, including output from

power-on self-test (POST) and OpenBoot Diagnostics (OBDiag)
- Remote server reboot, power-on, and power-off on demand
- Ability to monitor the CPU temperature and fan sensors without being near

the managed server, even when the server is offline
- Ability to run diagnostic tests from a remote console
- Remote event notification of server problems
- A detailed log of RSC events
- Remote console functions on both the serial and Ethernet ports

RSC sends an alert message whenever any of the following occurs:
- The server system resets.
- Server temperature crosses the lower-fault (high-temperature warning) limit.
- Server temperature crosses the upper-fault (high-temperature shutdown)

limit.
- A server redundant power supply fails.
- A power outage occurs at the server site, if an uninterruptible power supply

(UPS) is in use and it is configured to send an alert to RSC (see Appendix A).
- RSC receives a server-generated alert.
- The server undergoes a hardware watchdog reset.
- RSC detects five unsuccessful RSC login attempts within five minutes.

Server Status and Control Commands
environment
shownetwork
console
break
xir
bootmode [-u] [normal|forth|reset_nvram|diag|skip_diag]
reset
poweroff
poweron

RSC View Log Commands
loghistory [index [+|-]n] [pause n]
index [+|-]n
pause n
consolehistory [boot|run|oboot|orun] [index [+|-]n] [pause n]
pause n
consolerestart

RSC Configuration Commands
set variable value
show [variable]
date [[mmdd]HHMM|mmddHHMM[cc]yy][.SS]
password
useradd username
userdel username
usershow [username]
userpassword username
userperm username [c][u][a][r]
resetrsc

Other RSC Commands
help
version [-v]
logout

ILOM Commands

The ILOM is a dedicated system of hardware and supporting software that allows
you to manage your Sun server independently of the operating system. ILOM inclues:

Service Processor (SP) --> The hardware
Command Line Interface (CLI) --> is a dedicated software application that allows you to operate the ILOM
WebGUI --> easy-to-use browser interface

Command Verbs
Command    Description
cd         Navigates the object namespace.
create     Sets up an object in the namespace.
delete     Removes an object from the namespace.
exit       Terminates a session to the CLI.
help       Displays Help information about commands and targets.
load       Transfers a file from source to target.
reset      Resets the state of the target.
set        Sets target properties to the specified value.
show       Displays information about targets and properties.
start      Starts the target.
stop       Stops the target.
version    Displays the version of ILOM firmware.

Command Options

Not all options are supported for all
commands. See a specific command section for the options that are valid with that
command. The help option can be used with any command.

Option(Long) Short     Description
-default             Causes the verb to perform only its default

                     functions.
-destination         Specifies the destination for data.
-display      -d     Shows the data the user wants to display.
-force        -f     Causes an immediate action instead of an

                     orderly    shutdown.
-help         -h     Displays Help information.
-level        -l     Executes the command for the

current target and all targets

contained through the level specified.
-output -o Specifies the content and form of

command output.
-script Skips warnings or prompts normally

associated with the command.
-source Indicates the location of a source image.

Command Syntax

To execute most commands, you need to specify the location of the target, then enter
the command. You can execute commands individually, or you can combine them on
the same command line.

1. To execute commands individually:
a. Navigate to the namespace using the CD command.
For example:
cd /SP/services/http

b. Enter the verb, target, and value.
For example:
set port=80

2. To combine commands, use the form verb path/target=value.
For example:
set /SP/services/http port=80

The following display shows both methods:
-> cd /SP/services/http - Navigate to namespace
/SP/services/http
-> set port=80
Set 'port' to '80' - Enter the verb, target, and value
-> set /SP/services/http port=80 - Combine path and show command
Set 'port' to '80'
->

Wednesday, 15 June 2011

How to Increase AIX filesystem size

Incase your file system reaches the limit and there is no other choice but to increase the space, here is quick reference in doing so.
Let’s say this is my disk usage.

   /dev/reciofs01      2097152     19720 100%       11     1% /app/recio/fs01

First we have to identify from which volume group it belongs. So, listing all my fs, I got this list.

# lsvg
rootvg
datavg
apprecvg

I have three volume groups. One of them is the group from which /app/recio/fs01 belongs.

Let’s search for the file system entry in lsvg:

# lsvg -l rootvg | grep -i /app/recio/fs01
# lsvg -l datavg | grep -i /app/recio/fs01
# lsvg -l apprecvg | grep -i /app/recio/fs01
apprecvg:
LV NAME             TYPE       LPs     PPs     PVs LV STATE      MOUNT POINT
reciofs01           jfs2       9       9       2    open/syncd    /app/recio/fs01

It belongs to the apprecvg volume group.

Now let’s check PPs free space in lsvg.
Note: 1 PPS is equivalent to 32MB of disk space.

# lsvg -p apprecvg
apprecvg:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdiskpower1      active             100         0           00..00..00..00..00
hdiskpower2       active            100         0           00..00..00..00..00
hdiskpower3      active             600         321         90..00..80..70..81
hdiskpower4      active             600         255         50..50..50..50..55

From here we can see that it still have a lot of free PPS (321 from hdiskpower3 and 255 from hdiskpower4).

Now here is how we increase file system size:

# chfs -a size=+200MB /app/recio/fs01
Filesystem size changed

Let’s verify how much it gained from the addition.

# df -g /app/recio/fs01
Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/reciofs01    2.25       0.27   89%       11     1% /app/recio/fs01

Increased by 11%.