Oracle® Database 2 Day + Real Application Clusters Guide 10g Release 2 (10.2) Part Number B28759-06 |
|
|
PDF · Mobi · ePub |
This chapter describes how to administer your Oracle Clusterware environment. It describes how to administer the voting disks and the Oracle Cluster Registry (OCR) in the following sections:
Oracle Real Application Clusters (Oracle RAC) uses Oracle Clusterware as the infrastructure that binds together multiple nodes that then operate as a single server. Oracle Clusterware is a portable cluster management solution that is integrated with Oracle Database. In an Oracle RAC environment, Oracle Clusterware monitors all Oracle components (such as instances and Listeners). If a failure occurs, Oracle Clusterware automatically attempts to restart the failed component and also redirects operations to a surviving component.
Oracle Clusterware includes a high availability framework for managing any application that runs on your cluster. Oracle Clusterware manages applications to ensure they start when the system starts. Oracle Clusterware also monitors the applications to make sure that they are always available. For example, if an application process fails, then Oracle Clusterware attempts to restart the process based on scripts that you customize. If a node in the cluster fails, then you can program application processes that typically run on the failed node to restart on another node in the cluster.
Oracle Clusterware includes two important components: the voting disk and the OCR. The voting disk is a file that manages information about node membership, and the OCR is a file that manages cluster and Oracle RAC database configuration information.
The Oracle Clusterware installation process creates the voting disk and the OCR on shared storage. If you select the option for normal redundant copies during the installation process, then Oracle Clusterware automatically maintains redundant copies of these files to prevent the files from becoming single points of failure. The normal redundancy feature also eliminates the need for third-party storage redundancy solutions. When you use normal redundancy, Oracle Clusterware automatically maintains two copies of the OCR file and three copies of the voting disk file.
High availability configurations have redundant hardware and software that maintain operations by avoiding single points of failure. When a component is down, Oracle Clusterware redirects its managed resources to a backup component.
The voting disk records node membership information. A node must be able to access more than half of the voting disks at any time. To avoid simultaneous loss of multiple voting disks, each voting disk should be on a storage device that does not share any components (controller, interconnect, and so on) with the storage devices used for the other voting disks.
For example, if you have five voting disks configured, then a node must be able to access at least three of the voting disks at any time. If a node cannot access the minimum required number of voting disks it is evicted, or removed, from the cluster. After the cause of the failure has been corrected and access to the voting disks has been restored, you can instruct Oracle Clusterware to recover the failed node and restore it to the cluster.
Because the node membership information does not usually change, you do not need to back up the voting disk every day. However, back up the voting disks at the following times:
After installation
After adding nodes to or deleting nodes from the cluster
After performing voting disk add
or delete
operations
To make a backup copy of the voting disk, use the Linux dd
command. Perform this operation on every voting disk as needed where voting_disk_name
is the name of the active voting disk and backup_file_name
is the name of the file to which you want to back up the voting disk contents:
dd if=voting_disk_name of=backup_file_name
If your voting disk is stored on a raw device, use the device name in place of voting_disk_name
. For example:
dd if=/dev/sdd1 of=/tmp/voting.dmp
When you use the dd
command for making backups of the voting disk, the backup can be performed while the Cluster Ready Services (CRS) process is active; you do not need to stop the crsd.bin
process before taking a backup of the voting disk.
If a voting disk is damaged, and no longer usable by Oracle Clusterware, you can recover the voting disk if you have a backup file. Run the following command to recover a voting disk where backup_file_name
is the name of the voting disk backup file and voting_disk_name
is the name of the active voting disk:
dd if=backup_file_name of=voting_disk_name
To add or remove a voting disk, first shut down Oracle Clusterware on all nodes, then use the following commands as the root
user, where path
is the fully qualified path for the additional voting disk. If the new voting disk is stored on a network file server (NFS), then create an empty voting disk file location with the correct owner and permissions before running this command.
Caution:
If you use the-force
option to add or remove a voting disk while the Oracle Clusterware stack is active, you can corrupt your cluster configuration.To add a voting disk:
crsctl add css votedisk path
To remove a voting disk:
crsctl delete css votedisk path
Note:
If your cluster is down, then you can use the-force
option to modify the voting disk configuration when using either of these commands without interacting with active Oracle Clusterware daemons.Oracle Clusterware automatically creates OCR backups every 4 hours. At any one time, Oracle Clusterware always retains the latest 3 backup copies of the OCR that are 4 hours old, 1 day old, and 1 week old.
You cannot customize the backup frequencies or the number of files that Oracle Clusterware retains. You can use any backup software to copy the automatically generated backup files at least once daily to a different device from where the primary OCR file resides. The default location for generating backups on Red Hat Linux systems is CRS_home
/cdata/
cluster_name
where cluster_name
is the name of your cluster and CRS_home
is the home directory of your Oracle Clusterware installation.
This section contains the following topics:
To find the most recent backup of the OCR, on any node in the cluster, use the following command:
ocrconfig -showbackup
Because of the importance of OCR information, Oracle recommends that you use the ocrconfig
tool to make copies of the automatically created backup files at least once a day.
In addition to using the automatically created OCR backup files, you should also export the OCR contents to a file before and after making significant configuration changes, such as adding or deleting nodes from your environment, modifying Oracle Clusterware resources, or creating a database. Exporting the OCR contents to a file lets you restore the OCR if your configuration changes cause errors. For example, if you have unresolvable configuration problems, or if you are unable to restart your cluster database after such changes, then you can restore your configuration by importing the saved OCR content from the valid configuration.
To export the contents of the OCR to a file, use the following command, where backup_file_name
is the name of the OCR backup file you want to create:
ocrconfig -export backup_file_name
Note:
You must be logged in as theroot
user to run the ocrconfig
command.This section describes two methods for recovering the OCR. The first method uses automatically generated OCR file copies and the second method uses manually created OCR export files.
In event of a failure, before you attempt to restore the OCR, ensure that the OCR is unavailable. Run the following command to check the status of the OCR:
ocrcheck
If this command does not display the message 'Device/File integrity check succeeded'
for at least one copy of the OCR, then both the primary OCR and the OCR mirror have failed. You must restore the OCR from a backup.
When restoring the OCR from automatically generated backups, you first have to determine which backup file you will use for the recovery.
To restore the OCR from an automatically generated backup on a Red Hat Linux system:
Identify the available OCR backups using the ocrconfig
command:
# ocrconfig -showbackup
Note:
You must be logged in as theroot
user to run the ocrconfig
command.Review the contents of the backup using the following ocrdump
command, where file_name
is the name of the OCR backup file:
$ ocrdump -backupfile file_name
As the root
user, stop Oracle Clusterware on all the nodes in your Oracle RAC cluster by executing the following command:
# crsctl stop crs
Repeat this command on each node in your Oracle RAC cluster.
As the root
user, restore the OCR by applying an OCR backup file that you identified in step 1 using the following command, where file_name
is the name of the OCR that you want to restore. Make sure that the OCR devices that you specify in the OCR configuration exist, and that these OCR devices are valid before running this command.
# ocrconfig -restore file_name
As the root
user, restart Oracle Clusterware on all the nodes in your cluster by restarting each node, or by running the following command:
# crsctl start crs
Repeat this command on each node in your Oracle RAC cluster.
Use the Cluster Verify Utility (CVU) to verify the OCR integrity. Run the following command, where the -n all
argument retrieves a list of all the cluster nodes that are configured as part of your cluster:
$ cluvfy comp ocr -n all [-verbose]
Using the ocrconfig -export
command enables you to restore the OCR using the -import
option if your configuration changes cause errors.
To restore the previous configuration stored in the OCR from an OCR export file:
Place the OCR export file that you created previously with the ocrconfig -export
command in an accessible directory on disk.
As the root
user, stop Oracle Clusterware on all the nodes in your Oracle RAC cluster by executing the following command:
crsctl stop crs
Repeat this command on each node in your Oracle RAC cluster.
As the root
user, restore the OCR data by importing the contents of the OCR export file using the following command, where file_name
is the name of the OCR export file:
ocrconfig -import file_name
As the root
user, restart Oracle Clusterware on all the nodes in your cluster by restarting each node, or by running the following command:
crsctl start crs
Repeat this command on each node in your Oracle RAC cluster.
Use the CVU to verify the OCR integrity. Run the following command, where the -n all
argument retrieves a list of all the cluster nodes that are configured as part of your cluster:
cluvfy comp ocr -n all [-verbose]
Note:
You cannot use theocrconfig
command to import an OCR backup file.This section describes how to administer the OCR. The OCR contains information about the cluster node list, which instances are running on which nodes, and information about Oracle Clusterware resource profiles for applications that have been modified to be managed by Oracle Clusterware.
This section contains the following topics:
Note:
The operations in this section affect the OCR for the entire cluster. However, theocrconfig
command cannot modify OCR configuration information for nodes that are shut down or for nodes on which Oracle Clusterware is not running. So, you should avoid shutting down nodes while modifying the OCR using the ocrconfig
command.You can add an OCR location after an upgrade or after completing the Oracle RAC installation. If you already mirror the OCR, then you do not need to add an OCR location; Oracle Clusterware automatically manages two OCRs when you configure normal redundancy for the OCR. Oracle RAC environments do not support more than two OCRs, a primary OCR and a secondary OCR.
Run the following command to add an OCR location using either destination_file
or disk
to designate the target location of the additional OCR:
ocrconfig -replace ocr destination_file ocrconfig -replace ocr disk
Run the following command to add an OCR mirror location using either destination_file
or disk
to designate the target location of the additional OCR:
ocrconfig -replace ocrmirror destination_file ocrconfig -replace ocrmirror disk
Note:
You must be logged in as theroot
user to run the ocrconfig
command.If you need to change the location of an existing OCR, or change the location of a failed OCR to the location of a working one, you can use the following procedure as long as one OCR file remains online.
To change the location of an OCR:
Use the OCRCHECK
utility to verify that a copy of the OCR other than the one you are going to replace is online using the following command:
ocrcheck
Note:
The OCR that you are replacing can be either online or offline.Verify that Oracle Clusterware is running on the node on which the you are going to perform the replace operation using the following command:
crsctl check crs
Run the following command to replace the OCR using either destination_file
or disk
to indicate the target OCR:
ocrconfig -replace ocr destination_file ocrconfig -replace ocr disk
Run the following command to replace an OCR mirror location using either destination_file
or disk
to indicate the target OCR:
ocrconfig -replace ocrmirror destination_file ocrconfig -replace ocrmirror disk
If any node that is part of your current Oracle RAC environment is shut down, then run the following command on the stopped node to let that node rejoin the cluster after the node is restarted:
ocrconfig -repair
You may need to repair an OCR configuration on a particular node if your OCR configuration changes while that node is stopped. For example, you may need to repair the OCR on a node that was shut down while you were adding, replacing, or removing an OCR. To repair an OCR configuration, run the following command on the node on which you have stopped the Oracle Clusterware daemon:
ocrconfig –repair ocrmirror device_name
Note:
You cannot perform this operation on a node on which the Oracle Clusterware daemon is running.This operation changes the OCR configuration only on the node from which you run this command. For example, if the OCR mirror is on a disk named /dev/raw1
, then use the command ocrconfig -repair ocrmirror /dev/raw1
on this node to repair its OCR configuration.
To remove an OCR location, at least one OCR must be online. You can remove an OCR location to reduce OCR-related overhead or to stop mirroring your OCR because you moved your the OCR to a redundant storage system, such as a redundant array of independent disks (RAID).
To remove an OCR location from your Oracle RAC environment:
Use the OCRCHECK utility to ensure that at least one OCR other than the OCR that you are removing is online.
ocrcheck
Note:
Do not perform this OCR removal procedure unless there is at least one active OCR online.Run the following command on any node in the cluster to remove one copy of the OCR:
ocrconfig -replace ocr
This command updates the OCR configuration on all the nodes on which Oracle Clusterware is running.
This section includes the following topics on how to troubleshoot the Oracle Cluster Registry (OCR):
The OCRCHECK utility displays the data block format version used by the OCR, the free space and used space in the OCR, the ID used for the OCR, and the locations you have configured for the OCR. The OCRCHECK utility calculates a checksum for all the data blocks in all the OCRs that you have configured to verify the integrity of each block. It also returns an individual status for each OCR file as well as a result for the overall OCR integrity check. The following is a sample of the OCRCHECK output:
Status of Oracle Cluster Registry is as follows : Version : 2 Total space (kbytes) : 262144 Used space (kbytes) : 16256 Available space (kbytes) : 245888 ID : 1918913332 Device/File Name : /dev/raw/raw1 Device/File integrity check succeeded Device/File Name : /oradata/mirror.ocr Device/File integrity check succeeded Cluster registry integrity check succeeded
The OCRCHECK utility creates a log file in the following directory, where CRS_home
is the location of the installed Oracle Clusterware software, and hostname
is the name of the local node:
CRS_home/log/hostname/client
The log files have names of the form orcheck_
nnnnn
.log
, where nnnnn
is the process ID of the operating session that issued the ocrcheck
command.
Table 5-1 describes common OCR problems and their corresponding solutions.
Table 5-1 Common OCR Problems and Solutions
Problem | Solution |
---|---|
The OCR is not mirrored. |
Run the |
An OCR mirror has failed and you must replace it. Error messages are being reported in Enterprise Manager or the OCR log file. |
Run the |
An OCR has been incorrectly updated. |
Run the |
You are experiencing a severe performance effect from OCR processing, or you want to remove an OCR for other reasons. |
Run the |