The IBM tool for building UNIX-based mission-critical computing platforms is the HACMP software. The HACMP software ensures that critical resources, such as applications, are available for processing. HACMP has two major components: high availability (HA) and cluster multi-processing (CMP).

High Availability Cluster Multi-Processing for AIX: The primary reason to create HACMP clusters is to provide a highly available environment for mission-critical applications. For example, an HACMP cluster could run a database server program which services client applications. The clients send queries to the server program which responds to their requests by accessing a database, stored on a shared external disk. In an HACMP cluster, to ensure the availability of these applications, the applications are put under HACMP control. HACMP takes measures to ensure that the applications remain available to client processes even if a component in a cluster fails. To ensure availability, in case of a component failure, HACMP moves the application (along with resources that ensure access to the application) to another node in the cluster.

High Availability vs. Fault Tolerance

Fault tolerance relies on specialized hardware to detect a hardware fault and instantaneously switch to a redundant hardware component—whether the failed component is a processor, memory board, power supply, I/O subsystem, or storage subsystem. Although this cutover is apparently seamless and offers non-stop service, a high premium is paid in both hardware cost and performance because the redundant components do no processing. More importantly, the fault tolerant model does not address software failures, by far the most common reason for downtime.

High availability views availability not as a series of replicated physical components, but rather as a set of system-wide, shared resources that cooperate to guarantee essential services. High availability combines software with industry-standard hardware to minimize downtime by quickly restoring essential services when a system, component, or application fails. While not instantaneous, services are restored rapidly, often in less than a minute.

The difference between fault tolerance and high availability, then, is this: A fault tolerant environment has no service interruption but a significantly higher cost, while a highly available environment has a minimal service interruption.

Goal of HACMP: Eliminating Scheduled Downtime

The primary goal of high availability clustering software is to minimize, or ideally, eliminate, the need to take your resources out of service during maintenance and reconfiguration activities.

HACMP software optimizes availability by allowing for the dynamic reconfiguration of running clusters. Most routine cluster maintenance tasks, such as adding or removing a node or changing the priority of nodes participating in a resource group, can be applied to an active cluster without stopping and restarting cluster services.

In addition, you can keep an HACMP cluster online while making configuration changes by using the Cluster Single Point of Control (C-SPOC) facility.

C-SPOC makes cluster management easier, as it allows you to make changes to shared volume groups, users, and groups across the cluster from a single node. The changes are propagated transparently to other cluster nodes.

Aix level and related requirements

Before you install the HACMP, you must check the other software level requirements.

The following AIX base operating system(BOS) components are prerequisites for HACMP:

- bos.adt.lib

- bos.adt.libm

- bos.adt.syscalls

- bos.net.tcp.client

- bos.net.tcp.server

- bos.rte.SRC

- bos.rte.libc

- bos.rte.libcfg

- bos.rte.libcur

- bos.rte.libpthreads

- bos.rte.odm

- bos.data

Disk space requirements:

HACMP requires the following available space in rootvg volume group for

Installation:

- /usr requires 82 MB of free space for a full installation of HACMP.

- /(root) requires 710 KB of free space.

- /var and /tmp requires 100MB of free space.

Checking for prequisites

HACMP 5.1 require one of the following operating system components:

- AIX 5L V5.1 ML5 or higher

- AIX 5L V5.2 ML2 or higher

- AIX 5L V5.3 ML2 or higher

HACMP 5.2 require one of the following operating system components:

- AIX 5L V5.1 ML5 or higher

- AIX 5L V5.2 ML2 or higher

- AIX 5L V5.3 ML2 or higher

HACMP 5.3 require one of the following operating system components:

- AIX 5L V5.3 ML2 or higher

If you are installing directly from the installation media, such as CD-ROM

or from a local repository , enter the smitty install_all fast path.

Minimum requirement of filesets for HACMP 5.2 installations

- cluster .es

- cluster.es.cspoc

- cluster .license

AIX configuration

You should be aware that HACMP makes some changes to the system when it is installed and/or started:

Installation changes

Files modified:

– /etc/inittab

– /etc/rc.net

– /etc/services

– /etc/snmpd.conf

– /etc/snmpd.peers

– /etc/syslog.conf

– /etc/trcfmt

– /var/spool/cron/crontabs/root

HACMP configuration data

There are two main components to the cluster configuration:

- Cluster topology - describes the underlying framework - the nodes, the
networks and the storage. HACMP uses this framework to keep the
other main component - the resources highly available.

- Cluster resources - are those components that HACMP can move from
node to node, for example service IP labels, file systems and
applications.

When the cluster is configured, The cluster topology and resource information is entered on one node, A verification process is then run, and the data synchronized out to the other nodes defined in the cluster. HACMP keeps this data in it’s own Object Data Manager (ODM) classes on each node in the cluster.

The following basic steps are recommended for configuring cluster:

- Define the cluster and the nodes
- Discover the additional information (networks, disks)
- Define the topology
- Verify and synchronize the topology then start the cluster services
- Define the resources and resource groups
- Verify and synchronize the resources

Software components

The following layered model describes the software components of an HACMP cluster:

- Application layer: Any application that is made highly available
   through the services provided by HACMP
- HACMP layer: Software that responds to changes within the cluster to
ensure that the controlled applications remain highly available
- RSCT layer: The daemons that monitor node membership, communication
interface and device health and advises HACMP accordingly
- AIX layer: Provides support for HACMP through the LVM which manages
the storage and TCP/IP layer which provides communication.
- LVM layer: Provides access to storage and status information back to
HACMP
- TCP/IP layer: Provides reliable communication, both node to node
and node to client

Resource planning

HACMP provides a highly available environment by identifying a set of
cluster-wide resources essential to uninterrupted processing, and then defining relationships among nodes that ensure these resources are available to client processes.

When a cluster node fails or detaches from the cluster for a scheduled outage, the Cluster Manager redistributes its resources among any number of the surviving nodes.

HACMP considers the following as resource types:
- Volume groups
- Disks
- File systems
- Network file systems
- communication adapter and links
- Service IP labels/addresses
- Applications servers(applications)
- Tape resources
- Fast connect resources
- WLM integration

Custom Resource Group

Actually what is important for HACMP implementers and administrators is the resource group’s behavior at startup, fallover and fallback. While in HACMP 5.1 both “custom” and “classic” RG are supported, starting with 5.2 , the only RGs available are the “custom” ones.

The custom RG behavior options are:

Startup options
These options controls the behavior of the RG on initial startup
1)online on home node only
2)online on first available node
3)online on all available nodes
4)online using distribution policy

Fallover options
These options control the behavior of the RG should HACMP have to move it to another node in the response to an event.
1)fallover to next priority node in the list
2)fallover using dynamic node priority
3)Bring offline(on error only)

Fallback option
These options control the behavior of an online RG when a node joins the cluster.
1)fallback to higher priority node in the list
2)Never fallback

// Unix - Aix - Linux - Hacmp - Dlpar - HMC - VIO //

Saturday, September 3, 2011

Introduction to HACMP ...!!

High Availability vs. Fault Tolerance

Aix level and related requirements

Disk space requirements:

HACMP requires the following available space in rootvg volume group for

Checking for prequisites

Minimum requirement of filesets for HACMP 5.2 installations

HACMP configuration data

No comments: