Skip Headers
Oracle® Database 2 Day + Real Application Clusters Guide
10g Release 2 (10.2)

Part Number B28759-06
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

5 Administering Oracle Clusterware Components

This chapter describes how to administer your Oracle Clusterware environment. It describes how to administer the voting disks and the Oracle Cluster Registry (OCR) in the following sections:

About Oracle Clusterware

Oracle Real Application Clusters (Oracle RAC) uses Oracle Clusterware as the infrastructure that binds together multiple nodes that then operate as a single server. Oracle Clusterware is a portable cluster management solution that is integrated with Oracle Database. In an Oracle RAC environment, Oracle Clusterware monitors all Oracle components (such as instances and Listeners). If a failure occurs, Oracle Clusterware automatically attempts to restart the failed component and also redirects operations to a surviving component.

Oracle Clusterware includes a high availability framework for managing any application that runs on your cluster. Oracle Clusterware manages applications to ensure they start when the system starts. Oracle Clusterware also monitors the applications to make sure that they are always available. For example, if an application process fails, then Oracle Clusterware attempts to restart the process based on scripts that you customize. If a node in the cluster fails, then you can program application processes that typically run on the failed node to restart on another node in the cluster.

Oracle Clusterware includes two important components: the voting disk and the OCR. The voting disk is a file that manages information about node membership, and the OCR is a file that manages cluster and Oracle RAC database configuration information.

The Oracle Clusterware installation process creates the voting disk and the OCR on shared storage. If you select the option for normal redundant copies during the installation process, then Oracle Clusterware automatically maintains redundant copies of these files to prevent the files from becoming single points of failure. The normal redundancy feature also eliminates the need for third-party storage redundancy solutions. When you use normal redundancy, Oracle Clusterware automatically maintains two copies of the OCR file and three copies of the voting disk file.

Backing Up and Recovering Voting Disks

High availability configurations have redundant hardware and software that maintain operations by avoiding single points of failure. When a component is down, Oracle Clusterware redirects its managed resources to a backup component.

The voting disk records node membership information. A node must be able to access more than half of the voting disks at any time. To avoid simultaneous loss of multiple voting disks, each voting disk should be on a storage device that does not share any components (controller, interconnect, and so on) with the storage devices used for the other voting disks.

For example, if you have five voting disks configured, then a node must be able to access at least three of the voting disks at any time. If a node cannot access the minimum required number of voting disks it is evicted, or removed, from the cluster. After the cause of the failure has been corrected and access to the voting disks has been restored, you can instruct Oracle Clusterware to recover the failed node and restore it to the cluster.

Backing Up Voting Disks

Because the node membership information does not usually change, you do not need to back up the voting disk every day. However, back up the voting disks at the following times:

  • After installation

  • After adding nodes to or deleting nodes from the cluster

  • After performing voting disk add or delete operations

To make a backup copy of the voting disk, use the Linux dd command. Perform this operation on every voting disk as needed where voting_disk_name is the name of the active voting disk and backup_file_name is the name of the file to which you want to back up the voting disk contents:

dd if=voting_disk_name of=backup_file_name

If your voting disk is stored on a raw device, use the device name in place of voting_disk_name. For example:

dd if=/dev/sdd1 of=/tmp/voting.dmp

When you use the dd command for making backups of the voting disk, the backup can be performed while the Cluster Ready Services (CRS) process is active; you do not need to stop the crsd.bin process before taking a backup of the voting disk.

Recovering Voting Disks

If a voting disk is damaged, and no longer usable by Oracle Clusterware, you can recover the voting disk if you have a backup file. Run the following command to recover a voting disk where backup_file_name is the name of the voting disk backup file and voting_disk_name is the name of the active voting disk:

dd if=backup_file_name of=voting_disk_name

Adding and Removing Voting Disks

To add or remove a voting disk, first shut down Oracle Clusterware on all nodes, then use the following commands as the root user, where path is the fully qualified path for the additional voting disk. If the new voting disk is stored on a network file server (NFS), then create an empty voting disk file location with the correct owner and permissions before running this command.

Caution:

If you use the -force option to add or remove a voting disk while the Oracle Clusterware stack is active, you can corrupt your cluster configuration.

To add a voting disk:

crsctl add css votedisk path

To remove a voting disk:

crsctl delete css votedisk path

Note:

If your cluster is down, then you can use the -force option to modify the voting disk configuration when using either of these commands without interacting with active Oracle Clusterware daemons.

Backing Up and Recovering the Oracle Cluster Registry

Oracle Clusterware automatically creates OCR backups every 4 hours. At any one time, Oracle Clusterware always retains the latest 3 backup copies of the OCR that are 4 hours old, 1 day old, and 1 week old.

You cannot customize the backup frequencies or the number of files that Oracle Clusterware retains. You can use any backup software to copy the automatically generated backup files at least once daily to a different device from where the primary OCR file resides. The default location for generating backups on Red Hat Linux systems is CRS_home/cdata/cluster_name where cluster_name is the name of your cluster and CRS_home is the home directory of your Oracle Clusterware installation.

This section contains the following topics:

Viewing Available OCR Backups

To find the most recent backup of the OCR, on any node in the cluster, use the following command:

ocrconfig -showbackup

Backing Up the OCR

Because of the importance of OCR information, Oracle recommends that you use the ocrconfig tool to make copies of the automatically created backup files at least once a day.

In addition to using the automatically created OCR backup files, you should also export the OCR contents to a file before and after making significant configuration changes, such as adding or deleting nodes from your environment, modifying Oracle Clusterware resources, or creating a database. Exporting the OCR contents to a file lets you restore the OCR if your configuration changes cause errors. For example, if you have unresolvable configuration problems, or if you are unable to restart your cluster database after such changes, then you can restore your configuration by importing the saved OCR content from the valid configuration.

To export the contents of the OCR to a file, use the following command, where backup_file_name is the name of the OCR backup file you want to create:

ocrconfig -export backup_file_name

Note:

You must be logged in as the root user to run the ocrconfig command.

Recovering the OCR

This section describes two methods for recovering the OCR. The first method uses automatically generated OCR file copies and the second method uses manually created OCR export files.

In event of a failure, before you attempt to restore the OCR, ensure that the OCR is unavailable. Run the following command to check the status of the OCR:

ocrcheck 

If this command does not display the message 'Device/File integrity check succeeded' for at least one copy of the OCR, then both the primary OCR and the OCR mirror have failed. You must restore the OCR from a backup.

Restoring the Oracle Cluster Registry from Automatically Generated OCR Backups

When restoring the OCR from automatically generated backups, you first have to determine which backup file you will use for the recovery.

To restore the OCR from an automatically generated backup on a Red Hat Linux system:

  1. Identify the available OCR backups using the ocrconfig command:

    # ocrconfig -showbackup
    

    Note:

    You must be logged in as the root user to run the ocrconfig command.
  2. Review the contents of the backup using the following ocrdump command, where file_name is the name of the OCR backup file:

    $ ocrdump -backupfile file_name
    
  3. As the root user, stop Oracle Clusterware on all the nodes in your Oracle RAC cluster by executing the following command:

    # crsctl stop crs
    

    Repeat this command on each node in your Oracle RAC cluster.

  4. As the root user, restore the OCR by applying an OCR backup file that you identified in step 1 using the following command, where file_name is the name of the OCR that you want to restore. Make sure that the OCR devices that you specify in the OCR configuration exist, and that these OCR devices are valid before running this command.

    # ocrconfig -restore file_name
    
  5. As the root user, restart Oracle Clusterware on all the nodes in your cluster by restarting each node, or by running the following command:

    # crsctl start crs
    

    Repeat this command on each node in your Oracle RAC cluster.

  6. Use the Cluster Verify Utility (CVU) to verify the OCR integrity. Run the following command, where the -n all argument retrieves a list of all the cluster nodes that are configured as part of your cluster:

    $ cluvfy comp ocr -n all [-verbose]
    

Recovering the OCR from an OCR Export File

Using the ocrconfig -export command enables you to restore the OCR using the -import option if your configuration changes cause errors.

To restore the previous configuration stored in the OCR from an OCR export file:

  1. Place the OCR export file that you created previously with the ocrconfig -export command in an accessible directory on disk.

  2. As the root user, stop Oracle Clusterware on all the nodes in your Oracle RAC cluster by executing the following command:

    crsctl stop crs
    

    Repeat this command on each node in your Oracle RAC cluster.

  3. As the root user, restore the OCR data by importing the contents of the OCR export file using the following command, where file_name is the name of the OCR export file:

    ocrconfig -import file_name
    
  4. As the root user, restart Oracle Clusterware on all the nodes in your cluster by restarting each node, or by running the following command:

    crsctl start crs
    

    Repeat this command on each node in your Oracle RAC cluster.

  5. Use the CVU to verify the OCR integrity. Run the following command, where the -n all argument retrieves a list of all the cluster nodes that are configured as part of your cluster:

    cluvfy comp ocr -n all [-verbose]
    

Note:

You cannot use the ocrconfig command to import an OCR backup file.

Changing the Oracle Cluster Registry Configuration

This section describes how to administer the OCR. The OCR contains information about the cluster node list, which instances are running on which nodes, and information about Oracle Clusterware resource profiles for applications that have been modified to be managed by Oracle Clusterware.

This section contains the following topics:

Note:

The operations in this section affect the OCR for the entire cluster. However, the ocrconfig command cannot modify OCR configuration information for nodes that are shut down or for nodes on which Oracle Clusterware is not running. So, you should avoid shutting down nodes while modifying the OCR using the ocrconfig command.

Adding an OCR Location

You can add an OCR location after an upgrade or after completing the Oracle RAC installation. If you already mirror the OCR, then you do not need to add an OCR location; Oracle Clusterware automatically manages two OCRs when you configure normal redundancy for the OCR. Oracle RAC environments do not support more than two OCRs, a primary OCR and a secondary OCR.

Run the following command to add an OCR location using either destination_file or disk to designate the target location of the additional OCR:

ocrconfig -replace ocr destination_file
ocrconfig -replace ocr disk

Run the following command to add an OCR mirror location using either destination_file or disk to designate the target location of the additional OCR:

ocrconfig -replace ocrmirror destination_file 
ocrconfig -replace ocrmirror disk

Note:

You must be logged in as the root user to run the ocrconfig command.

Replacing an OCR

If you need to change the location of an existing OCR, or change the location of a failed OCR to the location of a working one, you can use the following procedure as long as one OCR file remains online.

To change the location of an OCR:

  1. Use the OCRCHECK utility to verify that a copy of the OCR other than the one you are going to replace is online using the following command:

    ocrcheck 
    

    Note:

    The OCR that you are replacing can be either online or offline.
  2. Verify that Oracle Clusterware is running on the node on which the you are going to perform the replace operation using the following command:

    crsctl check crs
    
  3. Run the following command to replace the OCR using either destination_file or disk to indicate the target OCR:

    ocrconfig -replace ocr destination_file
    ocrconfig -replace ocr disk
    
  4. Run the following command to replace an OCR mirror location using either destination_file or disk to indicate the target OCR:

    ocrconfig -replace ocrmirror destination_file
    ocrconfig -replace ocrmirror disk
    
  5. If any node that is part of your current Oracle RAC environment is shut down, then run the following command on the stopped node to let that node rejoin the cluster after the node is restarted:

    ocrconfig -repair
    

Repairing an Oracle Cluster Registry Configuration on a Local Node

You may need to repair an OCR configuration on a particular node if your OCR configuration changes while that node is stopped. For example, you may need to repair the OCR on a node that was shut down while you were adding, replacing, or removing an OCR. To repair an OCR configuration, run the following command on the node on which you have stopped the Oracle Clusterware daemon:

ocrconfig –repair ocrmirror device_name 

Note:

You cannot perform this operation on a node on which the Oracle Clusterware daemon is running.

This operation changes the OCR configuration only on the node from which you run this command. For example, if the OCR mirror is on a disk named /dev/raw1, then use the command ocrconfig -repair ocrmirror /dev/raw1 on this node to repair its OCR configuration.

Removing an Oracle Cluster Registry

To remove an OCR location, at least one OCR must be online. You can remove an OCR location to reduce OCR-related overhead or to stop mirroring your OCR because you moved your the OCR to a redundant storage system, such as a redundant array of independent disks (RAID).

To remove an OCR location from your Oracle RAC environment:

  1. Use the OCRCHECK utility to ensure that at least one OCR other than the OCR that you are removing is online.

    ocrcheck
    

    Note:

    Do not perform this OCR removal procedure unless there is at least one active OCR online.
  2. Run the following command on any node in the cluster to remove one copy of the OCR:

    ocrconfig -replace ocr
    

    This command updates the OCR configuration on all the nodes on which Oracle Clusterware is running.

Troubleshooting the Oracle Cluster Registry

This section includes the following topics on how to troubleshoot the Oracle Cluster Registry (OCR):

Using the OCRCHECK Utility

The OCRCHECK utility displays the data block format version used by the OCR, the free space and used space in the OCR, the ID used for the OCR, and the locations you have configured for the OCR. The OCRCHECK utility calculates a checksum for all the data blocks in all the OCRs that you have configured to verify the integrity of each block. It also returns an individual status for each OCR file as well as a result for the overall OCR integrity check. The following is a sample of the OCRCHECK output:

Status of Oracle Cluster Registry is as follows :
           Version                  :          2
   Total space (kbytes)     :     262144
   Used space (kbytes)      :      16256
   Available space (kbytes) :     245888
   ID                       : 1918913332
   Device/File Name         : /dev/raw/raw1
                              Device/File integrity check succeeded
   Device/File Name         : /oradata/mirror.ocr
                              Device/File integrity check succeeded

Cluster registry integrity check succeeded

The OCRCHECK utility creates a log file in the following directory, where CRS_home is the location of the installed Oracle Clusterware software, and hostname is the name of the local node:

CRS_home/log/hostname/client

The log files have names of the form orcheck_nnnnn.log, where nnnnn is the process ID of the operating session that issued the ocrcheck command.

Resolving Common Oracle Cluster Registry Problems

Table 5-1 describes common OCR problems and their corresponding solutions.

Table 5-1 Common OCR Problems and Solutions

Problem Solution

The OCR is not mirrored.

Run the ocrconfig command with the -replace option as described in the section "Adding an OCR Location".

An OCR mirror has failed and you must replace it. Error messages are being reported in Enterprise Manager or the OCR log file.

Run the ocrconfig command with the -replace option as described in the section "Replacing an OCR".

An OCR has been incorrectly updated.

Run the ocrconfig command with the -repair option as described in the section "Repairing an Oracle Cluster Registry Configuration on a Local Node".

You are experiencing a severe performance effect from OCR processing, or you want to remove an OCR for other reasons.

Run the ocrconfig command with the -repair option as described in the section "Repairing an Oracle Cluster Registry Configuration on a Local Node".