unix sysadmin archives
Donation will make us pay more time on the project:

Monday, 17 October 2011

Introduction to Veritas Cluster Services

In any organization, every server in the network will have a specific purpose in terms of  it’s usage, and most of the times these servers are used to provide stable environment to run software applications that are required for organization’s business. Usually, these applications are very critical for the business,  and organizations cannot afford to let them down even for minutes.  For Example: A bank having an application which takes care of it’s internet banking.

If it was not critical in terms of business the organization can considered to run applications as standalone, in other words whenever the application down it wont impact the actual business.
Usually, the application clients for these application will connect to the application server using the server name , server IP or specific application IP.

Let us assume, if the organization is having an application which is very critical for it’s business  and any impact to the application will cause huge loss to the organization. In that case, organization is having  one option to reduce the impact of the application failure due to the Operating system or Hardware failure, i.e Purchasing a secondary server with same hardware configuration ,  install same kind of OS & Database, and configure it with the same application in passive mode. And “failover” the application from primary server to these secondary server whenever there is an issue with underlying hardware/operating system of primary server. Thus, we call it Application Server with Highly Available Configuration

Whenever there is an issue related to the primary server  which  make application unavailable to the client machines, the application should be moved to another available server in the network either by manual or automatic intervention. Transferring application from primary server to the secondary server and making secondary server active for the application  is called “failover” operation. And the reverse Operation (i.e. restoring application on primary server ) is called “Failback“. Thus, we can call this configuration as application HA ( Highly Available ) setup compared to the earlier Standalone setup.

Now the question is, how is this manual fail over works when there is an application issue due to Hardware/Operating System?

Manual Failover basically involves below steps:

     1. Application IP should failover secondary node
     2. Same Storage  and Data  should be available on the secondary node
     3. Finally application should failover to the secondary node.

Challenges in Manual Failover Configuration

    1. Continuously monitor resources.
    2. Time Consuming
    3. Technically complex when it involves more dependent components for the application.

On the other hand, we can use Automatic Failover Softwares which can do the work without human intervention. It groups both primary server and secondary server  related to the application, and always keep an eye on primary server for any failures and failover the application to secondary server automatically when ever there is an issue with primary server.

Although we are having two different servers supporting the application, both of them are actually serving the  same purpose. And from the application client perspective they both  should be treated as single application cluster server ( composed of multiple physical servers in the background).

Now, you know that cluster is nothing but “group of individual servers working together to server the same purpose ,and appear as a single machine to the external world”.

What  are the Cluster Software available in the market, today?  There are many, depending on the Operating System and Application to be supported. Some of them native to the Operating System , and others from the third party vendor

List of Cluster Software available in the market

    *SUN Cluster Services – Native Solaris Cluster
    *Linux Cluster Server – Native Linux cluster
    *Oracle RAC – Application level cluster for Oracle database that works on different Operating Systems
    *Veritas Cluster Services – Third Party Cluster Software works on Different Operating Systems like Solaris / Linux/ AIX / HP UX.
    *HACMP – IBM AIX based Cluster Technology
    *HP UX native Cluster Technology

Note: In this post, we are actually discussing about VCS and its Operations. This post is not going to cover the actual implementation part or any command syntax of VCS, but will cover the concept how VCS makes application Highly Available(HA).

Veritas Cluster Services Components
VCS is having two types of Components 1. Physical Components 2. Logical Components

Physical Components:
1. Nodes
VCS nodes host the service groups (managed applications). Each system is connected to networking hardware, and usually also to storage hardware. The systems contain components to provide resilient management of the applications, and start and stop agents.
Nodes can be individual systems, or they can be created with domains or partitions on enterprise-class systems. Individual cluster nodes each run their own operating system and possess their own boot device. Each node must run the same operating system within a single VCS cluster.
Clusters can have from 1 to 32 nodes. Applications can be configured to run on specific nodes within the cluster.

2. Shared storage
Storage is a key resource of most applications services, and therefore most service groups. A managed application can only be started on a system that has access to its associated data files. Therefore, a service group can only run on all systems in the cluster if the storage is shared across all systems. In many configurations, a storage area network (SAN) provides this requirement.
You can use I/O fencing technology for data protection. I/O fencing blocks access to shared storage from any system that is not a current and verified member of the cluster.

3. Networking Components
Networking in the cluster is used for the following purposes:
    *Communications between the cluster nodes and the Application Clients and external systems.
    *Communications between the cluster nodes, called Heartbeat network.

Logical Components
1. Resources
Resources are hardware or software entities that make up the application. Resources include disk groups and file systems, network interface cards (NIC), IP addresses, and applications.
    1.1. Resource dependencies
    Resource dependencies indicate resources that depend on each other because of application or operating system requirements. Resource dependencies are graphically depicted in a hierarchy, also     called a tree, where the resources higher up (parent) depend on the resources lower down (child).
    1.2. Resource types
    VCS defines a resource type for each resource it manages. For example, the NIC resource type can be configured to manage network interface cards. Similarly, all IP addresses can be configured         using the IP resource type.
    VCS includes a set of predefined resources types. For each resource type, VCS has a corresponding agent, which provides the logic to control resources.

2. Service groups
A service group is a virtual container that contains all the hardware and software resources that are required to run the managed application. Service groups allow VCS to control all the hardware and software resources of the managed application as a single unit. When a failover occurs, resources do not fail over individually— the entire service group fails over. If there is more than one service group on a system, a group may fail over without affecting the others.

A single node may host any number of service groups, each providing a discrete service to networked clients. If the server crashes, all service groups on that node must be failed over elsewhere.

Service groups can be dependent on each other. For example a finance application may be dependent on a database application. Because the managed application consists of all components that are required to provide the service, service group dependencies create more complex managed applications. When you use service group dependencies, the managed application is the entire dependency tree.

2.1. Types of service groups

VCS service groups fall in three main categories: failover, parallel, and hybrid.

   * Failover service groups
    A failover service group runs on one system in the cluster at a time. Failover groups are used for most applications that do not support multiple systems to simultaneously access the application’s data.

   * Parallel service groups
    A parallel service group runs simultaneously on more than one system in the cluster. A parallel service group is more complex than a failover group. Parallel service groups are appropriate for     applications that manage multiple application instances running simultaneously without data corruption.

   * Hybrid service groups
    A hybrid service group is for replicated data clusters and is a combination of the failover and parallel service groups. It behaves as a failover group within a system zone and a parallel group across     system zones.

3. VCS Agents
Agents are multi-threaded processes that provide the logic to manage resources. VCS has one agent per resource type. The agent monitors all resources of that type; for example, a single IP agent manages all IP resources.
When the agent is started, it obtains the necessary configuration information from VCS. It then periodically monitors the resources, and updates VCS with the resource status.

4.  Cluster Communications and VCS Daemons
Cluster communications ensure that VCS is continuously aware of the status of each system’s service groups and resources. They also enable VCS to recognize which systems are active members of the cluster, which have joined or left the cluster, and which have failed.

4.1. High availability daemon (HAD)
    The VCS high availability daemon (HAD) runs on each system. Also known as the VCS engine, HAD is responsible for:

       * building the running cluster configuration from the configuration files
       * distributing the information when new nodes join the cluster
       * responding to operator input
       * taking corrective action when something fails.

    The engine uses agents to monitor and manage resources. It collects information about resource states from the agents on the local system and forwards it to all cluster members. The local engine     also receives information from the other cluster members to update its view of the cluster.

    The hashadow process monitors HAD and restarts it when required.

4.2.  HostMonitor daemon
    VCS also starts HostMonitor daemon when the VCS engine comes up. The VCS engine creates a VCS resource VCShm of type HostMonitor and a VCShmg service group. The VCS engine does not     add these objects to the main.cf file. Do not modify or delete these components of VCS. VCS uses the HostMonitor daemon to monitor the resource utilization of CPU and Swap. VCS reports to the     engine log if the resources cross the threshold limits that are defined for the resources.

4.3.  Group Membership Services/Atomic Broadcast (GAB)
    The Group Membership Services/Atomic Broadcast protocol (GAB) is responsible for cluster membership and cluster communications.

    * Cluster Membership
    GAB maintains cluster membership by receiving input on the status of the heartbeat from each node by LLT. When a system no longer receives heartbeats from a peer, it marks the peer as DOWN and     excludes the peer from the cluster. In VCS, memberships are sets of systems participating in the cluster.

    * Cluster Communications
    GAB’s second function is reliable cluster communications. GAB provides guaranteed delivery of point-to-point and broadcast messages to all nodes. The VCS engine uses a private IOCTL (provided     by GAB) to tell GAB that it is alive.

4.4. Low Latency Transport (LLT)
    VCS uses private network communications between cluster nodes for cluster maintenance. Symantec recommends two independent networks between all cluster nodes. These networks provide the     required redundancy in the communication path and enable VCS to discriminate between a network failure and a system failure. LLT has two major functions.

    * Traffic Distribution
    LLT distributes (load balances) internode communication across all available private network links. This distribution means that all cluster communications are evenly distributed across all private     network links (maximum eight) for performance and fault resilience. If a link fails, traffic is redirected to the remaining links.

    * Heartbeat
    LLT is responsible for sending and receiving heartbeat traffic over network links. The Group Membership Services function of GAB uses this heartbeat to determine cluster membership.

4.5. I/O fencing module
    The I/O fencing module implements a quorum-type functionality to ensure that only one cluster survives a split of the private network. I/O fencing also provides the ability to perform SCSI-3 persistent     reservations on failover. The shared disk groups offer complete protection against data corruption by nodes that are assumed to be excluded from cluster membership.

5. VCS Configuration files.

    5.1. main.cf
    /etc/VRTSvcs/conf/config/main.cf is key file in terms  of VCS configuration. The “main.cf”  file basically explains below information to the VCS agents/VCS daemons.
      What are the Nodes available in the Cluster?
      What are the Service Groups Configured for each node?
      What are the resources available in each Service Group, the types of resources and it’s attributes?
      What are the dependencies each resource having on other resources?
      What are the dependencies each service group having on other Service Groups?

     5.2. types.cf

    The file types.cf, which is listed in the include statement in the main.cf file, defines the VCS bundled types for VCS resources. The file types.cf is also located in the folder /etc/VRTSvcs/conf/config.

    5.3. Other Important files
        /etc/llthosts—lists all the nodes in the cluster
        /etc/llttab—describes the local system’s private network links to the other nodes in the cluster

No comments:

Post a Comment