A Troubleshooting Enterprise Manager

This appendix describes solutions to common problems and scenarios that you might encounter when installing or upgrading Enterprise Manager.

Installation Issues

This section lists some of the most commonly encountered installation issues, and their resolutions.

Installation Fails with an Abnormal Termination

If there is a daily cron job that is running on the system where you are installing Grid Control that cleans up the /tmp/ directory, the installation might fail with an abnormal termination and the installActions.err file will log the following error: java.lang.UnsatisfiedLinkError: no nio in java.library.path.

The workaround is to set the TMP and TEMP environment variables to a directory other than the default /tmp and execute the ./runInstaller.

PERL Environment Variable is Forced on the environment During an Enterprise Manager 10g R2 (10.2.0.2) Installation

In a Microsoft Windows environment, if you have an existing PERL5LIB environment variable, the Enterprise Manager Grid Control installation will forcible overwrite this variable, in turn, forcing other applications on this host to use the new Perl version that get installed during the Management Service installation.

To work around this issue, rename the existing Perl variable as PERL5LIB_TMP before the Management Service installation starts. You can later (after the installation is complete) change the PERL5LIB_TMP variable to PERL5LIB

Note:

If the Perl environment variable is not set, remove this variable from the Environment Variables. To do this, from the Control Panel, go to Environment Variable under Systems.

Management Agent Installation Fails

If the Management Agent installation fails, look into the emctl status log to diagnose the reason for installation failure. You can view this log by executing the following command:

<AGENT_HOME>/bin/emctl status agent

A sample log file follows and shows some of the typical problem areas shown in bold.

Oracle Enterprise Manager 10g Release 10.2.0.0.0.Copyright (c) 1996, 2005 Oracle Corporation.  All rights reserved.---------------------------------------------------------------Agent Version     : 10.2.0.2.0
OMS Version       : 10.2.0.2.0
Protocol Version  : 10.2.0.2.0Agent Home        : /scratch/OracleHomes2/agent10gAgent binaries    : /scratch/OracleHomes2/agent10gAgent Process ID  : 9985Parent Process ID : 29893
Agent URL         : https://foo.abc.com:1831/emd/main/
Repository URL    : https://foo.abc.com:1159/em/upload
Started at        : 2005-09-25 21:31:00Started by user   : pjohnLast Reload       : 2005-09-25 21:31:00Last successful upload                       : (none)
Last attempted upload                        : (none)Total Megabytes of XML files uploaded so far :     0.00Number of XML files pending upload           :     2434Size of XML files pending upload(MB)         :    21.31Available disk space on upload filesystem    :    17.78%Last attempted heartbeat to OMS              : 2005-09-26 02:40:40Last successful heartbeat to OMS             : unknown---------------------------------------------------------------
Agent is Running and Ready

Prerequisite Check Fails with Directories Not Empty Error During Retry

During an agent installation using Agent Deploy, the installation fails abruptly, displaying the Failure page. On clicking Retry, the installation fails again at the Prerequisite Check phase with an error stating the directories are not empty.

This could be because Oracle Universal Installer (OUI) is still running though the SSH connection that is closed on the remote host.

To resolve this issue, on the remote host, check if OUI is still running. Execute the following command to verify this:

ps -aef | grep -i ora

If OUI is still running, wait till OUI processes are complete and restart the SSH daemon. Now, you can click Retry to perform the installation.

Note:

For more information on running the prerequisite checks in standalone mode, see Chapter1, "Running the Prerequisite Check in Standalone Mode".

Agent Deployment on Linux Oracle RAC 10.2 Cluster Fails

Agent deployment on a 10.2 release of an Oracle RAC cluster may fail due to a lost SSH connection during the installation process.

This can happen if the LoginGraceTime value in the sshd_config file is 0 (zero). The zero value gives an indefinite time for SSH authentication.

To resolve this issue, modify the LoginGraceTime value in the /etc/ssh/sshd_config file be a higher value. The default value is 120 seconds. This means that the server will disconnect after this time if you have not successfully logged in.

To resolve this issue, modify the LoginGraceTime value in the /etc/ssh/sshd_config file to be a higher value. If the value is set to 0 (zero), there is no definite time limit for authentication.

SSH User Equivalence Verification Fails During Agent Installation

The most common reasons for SSH User Equivalence Verification to fail are the following:

The server settings in /etc/sshd/sshd_config file do not allow ssh for user $USER.
The server may have disabled the public key-based authentication.
The client public key on the server may be outdated.
You may not have passed the -shared option for shared remote users, or may have passed this option for non-shared remote users.

Verify the server setting and rerun the script to set up SSH User Equivalence successfully.

Note:

For more information on how to set up SSH, see AppendixC, "Set Up SSH (Secure Shell) User Equivalence".

Sample sshd_config File

The following sshd_config file sample is a server-wide configuration file with all the variables.

#    $OpenBSD: sshd_config,v 1.59 2002/09/25 11:17:16 markus Exp $# This is the sshd server system-wide configuration file.  See# sshd_config(5) for more information.# This sshd was compiled with PATH=/usr/local/bin:/bin:/usr/bin# The strategy used for options in the default sshd_config shipped with# OpenSSH is to specify options with their default values where# possible, but leave them commented out.  Uncommented options change a# default value.#Port 22#Protocol 2,1#ListenAddress 0.0.0.0#ListenAddress ::# HostKey for protocol version 1#HostKey /etc/ssh/ssh_host_key# HostKeys for protocol version 2#HostKey /etc/ssh/ssh_host_rsa_key#HostKey /etc/ssh/ssh_host_dsa_key# Lifetime and size of ephemeral version 1 server key#KeyRegenerationInterval 3600#ServerKeyBits 768# Logging#obsoletes QuietMode#SyslogFacility AUTHSyslogFacility AUTHPRIV
#LogLevel INFO

# Authentication:#LoginGraceTime 120#PermitRootLogin yes#StrictModes yes

#RSAAuthentication yes#PubkeyAuthentication yes#AuthorizedKeysFile       .ssh/authorized_keys# rhosts authentication should not be used#RhostsAuthentication no# Don't read the user's ~/.rhosts and ~/.shosts files#IgnoreRhosts yes# For this to work you will also need host keys in /etc/ssh/ssh_known_hosts#RhostsRSAAuthentication no# similar for protocol version 2#HostbasedAuthentication no# Change to yes if you don't trust ~/.ssh/known_hosts for# RhostsRSAAuthentication and HostbasedAuthentication#IgnoreUserKnownHosts no# To disable tunneled clear text passwords, change to no here!#PasswordAuthentication yes#PermitEmptyPasswords no# Change to no to disable s/key passwords#ChallengeResponseAuthentication yes# Kerberos options#KerberosAuthentication no#KerberosOrLocalPasswd yes#KerberosTicketCleanup yes#AFSTokenPassing no# Kerberos TGT Passing only works with the AFS kaserver#KerberosTgtPassing no# Set this to 'yes' to enable PAM keyboard-interactive authentication# Warning: enabling this may bypass the setting of 'PasswordAuthentication'#PAMAuthenticationViaKbdInt no#X11Forwarding noX11Forwarding yes#X11DisplayOffset 10#X11UseLocalhost yes#PrintMotd yes#PrintLastLog yes
#KeepAlive yes
#UseLogin no#UsePrivilegeSeparation yes#PermitUserEnvironment no#Compression yes#MaxStartups 10# no default banner path#Banner /some/path#VerifyReverseMapping no#ShowPatchLevel no# override default of no subsystemsSubsystem       sftp    /usr/libexec/openssh/sftp-server

SSH Setup Fails with "Invalid Port Number" Error

The SSH User Equivalence script when executed, is built to automatically verify the setup at the end, by executing the following command:

ssh -l <user> <remotemachine> 'date'

At the time of verification, you may encounter an "Invalid Port Error" indicating that the SSH setup was not successful.

This can happen if the ssh.exe (sshUserSetupNT.sh script) is not being invoked from the cygwin home directory.

To resolve this issue, ensure the sshUserSetupNT.sh script on the local OMS machine is being executed from within the cygwin (BASH) shell only. The script will fail to execute if done from outside this location.

If there are multiple Cygwin installations, and you want to find out which ssh.exe is being invoked, execute the following command:

C:\Cygwin\bin\which ssh

For example, when you execute the previously mentioned command, and it returns a result that is similar to the following:

\cygdrive\c\WINDOWS\ssh

This indicates that the ssh.exe file from Cygwin is not being invoked as there is C:\windows that is present before C:\Cygwin\bin in the PATH environment variable.

To resolve this issue, rename this ssh.exe as follows:

-C:\cygwin>move c:\WINDOWS\ssh.exe c:\WINDOWS\ssh.exe1 
          1 file(s) moved.

Now, execute the C:\Cygwin which ssh command again.

The result should be similar to "\usr\bin\ssh".

This verifies that ssh.exe file is being invoked from the correct location (that is, from your C:\Cygwin\bin folder).

Note:

You must ensure C:\cygwin is the default installation directory for the Cygwin binaries.

If you install Cygwin at a location other than c:\cygwin (default location), it can cause the SSH setup to fail, and in turn, the agent installation will fail too.

To work around this issue, you must either install Cygwin in the default directory (c:\cygwin), or update the ssPaths_msplats.properties file with the correct path to the Cygwin binaries.

You can look into the following remote registry key to find out the correct Cygwin path:

HKEY_LOCAL_MACHINE\SOFTWARE\Cygnus Solutions\Cygwin\mounts v2\

Note:

For more information on how to set up SSH, see AppendixC, "Set Up SSH (Secure Shell) User Equivalence".

sshConnectivity.sh Script Fails

If you are executing the sshConnectivity.sh script on Cygwin version 5.2, the script may fail and result in the following error:

"JAVA.LANG.NOCLASSDEFFOUNDERROR"

To workaround this issue, ensure the Oracle home in the Cygwin style path is defined as follows:

ORACLE_HOME="c:/oraclehomes/oms10g/oracle"

You can find out the currently installed Cygwin version by executing the uname command on the Cygwin window.

Note:

For more information on using the sshConnectivity.sh script, see AppendixC, "Setting Up SSH User Equivalence Using sshConnectivity.sh".

Troubleshooting the "command cygrunsrv not found" Error.

During the SSH daemon setup, you may encounter a "command cygrunsrv not found" error. This can occur due to one of the following two reasons:

The sshd service is not running.
The Cygwin installation was not successful.

If SSHD Service Is Not Running

Create the sshd service, and then start a new sshd service from the cygwin directory.

To create the SSHD service, you must execute the following command:

ssh-host-config

The Cygwin script that runs when this command is executed will prompt you to answer several questions. Specify yes for the following questions:

privilege separation
install sshd as a service

Specify no when the script prompts you to answer whether or not to "create local user sshd".

When the script prompts you to specify a value for Cygwin, type ntsec (CYGWIN="binmode tty ntsec").

Now that the SSHD service is created, you can start the service by executing the following command:

cygrunsrv -start sshd

If Your Cygwin Installation Was Unsuccessful

If restarting the SSHD service does not resolve the error, then you must reinstall Cygwin. To do this:

Remove the Keys and Subkeys under Cygnus Solutions using regedit.
Remove the Cygwin directory (C:\cygwin), and all Cygwin icons.
Remove the .ssh directory from the Documents and Settings folder of the domain user.
Reinstall Cygwin.

For detailed instructions on Cygwin installation, see AppendixC, "Setting Up SSH Server (SSHD) on Microsoft Windows"
Execute the following command to start SSH daemon:
```
cygrunsrv -start sshd
```

SSH Setup Verification Fails with "Read from socket failed: Connection reset by peer." Error

After the SSH setup is complete, the script automatically executes the following verification command:

ssh -l <user> <remotemachine> 'date'

If this command returns an error stating "Read from socket failed: Connection reset by peer", this means SSH was incorrectly set up. To resolve this issue, go to the remote machine where you attempted to set up user equivalence and do the following:

Stop the SSHD service (cygrunsrv -stop sshd).
Go to the etc directory (cd /etc).
Change the SSH file owner to the appropriate system (chown <SYSTEM> ssh*).

Go to the Cygwin command prompt and execute the following:

chmod 644 /etc/ssh*
chmod 755 /var/empty
chmod 644 /var/log/sshd.log

Now, execute the verification command from the Management Service (OMS) machine (ssh -l <user> <remote machine> 'date'). This should display the date correctly, suggesting the SSH setup was successful.
Finally, start the SSHD service (from /usr/bin/sshd), or by executing cygrunsrv -start sshd.
Now, execute the verification command again from the OMS machine (ssh -l <user> <remote machine> 'date'). This should display the date correctly, suggesting the SSH setup was successful.

SSHD Service Fails to Start

During SSHD configuration, the SSHD service is created for the local account by default. When you log in as a domain user, this account is not recognized by the service, and does not start up.

To resolve this issue, you must change the SSHD service "Log On As" value from LocalSystem to the domain user. To do this, complete the following steps:

Right-click on My Computer and select Manage.
In the Computer Management dialog box that appears, click Services under Services and Applications.
In the right pane, select the Cygwin SSHD service, right-click and go to Properties.
In the Cygwin SSHD Properties window that appears, select This Account.
Now, specify the appropriate domain name and user (in the form of domain\user, for example, FOO-US\pjohn).
Specify the password for this user, and click Apply.

Now, go to the Cygwin command prompt and execute the following:

chmod 644 /etc/ssh*
chmod 755 /var/empty
chmod 644 /var/log/sshd.log

Start SSHD by executing the following command:
```
/usr/sbin/sshd
```

Timezone Prerequisite Check Fails

The timezone prerequisite check (timezone_check) will fail if the TZ environment variable is not set on the SSH daemon of the remote host.

To resolve this issue, you must set the TZ environment variable on the SSH daemon of the remote host. See AppendixC, "Setting Up the Timezone Variable on Remote Hosts" for more information.

Alternatively, you do the following:

If you are installing or upgrading the agent from the default software location, set the timezone environment variable by specifying the following in the Additional Parameters section of the Agent Deploy application:
```
-z <timezone>
For example, -z PST8PDT
```
If you are installing the agent from a nondefault software location, you must specify the timezone environment variable using the following command:
```
s_timeZone=<timezone>
For example, s_timezone=PST8PDT
```

OMS Version Is Not Displayed

If the OMS version is not displayed in the log file, it could mean that the installed agent is not registered with a secure and locked Management Service (OMS).

You can verify this by executing the following commands:

emctl status oms
emctl status agent

To resolve this issue, you must manually secure the Management Agent by executing the following command:

<AGENT_HOME>/bin/emctl secure agent -reg_passwd <password>

Discrepancy Between Agent and Repository URL Protocols

If the agent installation is successful, the protocol for both agent and the repository URLs are the same. That is, both URLs start with the https protocol (meaning both are secure).

If the protocol for the agent URL is displayed as http instead of https, this means that the agent is not secure.

To resolve this issue, you must secure the agent manually by executing the following command:

<AGENT_HOME>/bin/emctl secure agent -reg_passwd <password>

Last Successful Upload Does Not Have a Time Stamp

If there is no time stamp against this parameter in the log (displays Null), it means that the agent is unable to upload any data.

To resolve this issue, you must perform a manual upload of the data by executing the following command, and then check the log again:

<AGENT_HOME>/bin emctl upload

emctl status Log File is Empty

If the agent is not ready and running, the emctl status log displays only the copyright information. None of the parameters listed in the sample log is displayed.

The issue can occur due to any of the following reasons:

Agent is not secure: To manually secure the agent, execute the following command:
```
<AGENT_HOME>/bin emctl secure agent -reg_passwd <password>
```
Agent is not running: Check if the agent is running. If not, you can start the agent manually by executing the following command:
```
<AGENT_HOME>/bin emctl start agent
```
Agent port is not correct: Verify whether the agent is connecting to the correct port. To verify the port, look into the sysman/config/emd.properties file:

You must also ensure the following are correctly set in the emd.properties file:
1. REPOSITORY_URL: Verify this URL (http://<hostname>:port/em/upload). Here, ensure the host name and port are correct.
2. emdWalletSrcURL: Verify if the host name and port are correct in this URL (http://<hostname>:port/em/wallets/emd).
3. agentTZRegion: Ensure the time zone that is configured is correct.

Configuration Issues

This section lists some of the most commonly encountered configuration issues, and their resolutions.

Configuration Assistants Fail During Enterprise Manager Installation

During the installation, if any of the configuration assistants fails to run successfully, you can choose to run the configuration assistants in standalone mode.

Note:

The individual log files for each configuration tool are available at the following directory:

ORACLE_HOME/cfgtoollogs/cfgfw

Besides the individual configuration logs, this directory also contains cfmLogger_timestamp.log (The timestamp depends on the local time and has a format such as cfmLogger_2005_08_19_01-27-05-AM.log.). This log file contains all the configuration tool logs.

For more information about the installation logs that are created and their locations, see Appendix A, "Troubleshooting Enterprise Manager".

Also see Chapter3, "Executing the runConfig Tool from the Command Line" to understand using the runconfig tool. This tool is used for running configuration assistants as explained below.

Invoking the One-Off Patches Configuration Assistant in Standalone Mode

During the installation process, this configuration assistant is executed before the Management Service Configuration Assistant is run.

This configuration assistant applies the one-off patches that are required for a successful Enterprise Manager 10g Release 2 installation.

To run this configuration assistant in standalone mode, you must execute the following command from the Management Service Oracle home:

<OMS_HOME>/perl/bin/perl <OMS_HOME>/install/oneoffs/applyOneoffs.pl

Invoking the Database Configuration Assistant in Standalone Mode

To run the Database Configuration Assistant, you must invoke the runConfig.sh script as:

<DB_Home>/oui/bin/runConfig.sh ORACLE_HOME=<DB_HOME> ACTION=Configure MODE=Perform

On Microsoft Windows, replace runConfig.sh with runConfig.bat in the previously mentioned command.

Invoking the OMS Configuration Assistant in Standalone Mode

To run the OMSConfig Assistant, you must invoke the runConfig.sh as the following:

<OMS_Home>/oui/bin/runConfig.sh ORACLE_HOME=<OMS_HOME> ACTION=Configure MODE=Perform

On Microsoft Windows, replace runConfig.sh with runConfig.bat in the previously mentioned command.

Invoking the Agent Configuration Assistant in Standalone Mode

To run the AgentConfig Assistant, you must invoke the runConfig.sh as the following:

<Agent_Home>/oui/bin/runConfig.sh ORACLE_HOME=<AGENT_HOME> ACTION=Configure  MODE=Perform

On Microsoft Windows, replace runConfig.sh with runConfig.bat in the above-mentioned command.

Note:

While the preceding command can be used to execute the agentca script, Oracle recommends you execute the following command to invoke the configuration assistant:

Agent_Home/bin/agentca -f

If you want to run the agentca script on a Oracle RAC, you must execute the following command on each of the cluster nodes:

Agent_Home/bin/agentca -f -c "node1,node2,node3,...."

See Chapter7, "Agent Reconfiguration and Rediscovery" for more information.

Invoking the OC4J Configuration Assistant in Standalone Mode

If you want to deploy only the Rules Manager, execute the following commands:

/scratch/OracleHomes/oms10g/jdk/bin/java -Xmx512M -DemLocOverride=/scratch/OracleHomes/oms10g -classpath
/scratch/OracleHomes/oms10g/dcm/lib/dcm.jar:/scratch/OracleHomes/oms10g/jlib/e mConfigInstall.jar:/scratch/OracleHomes/oms10g/lib/classes12.zip:/scratch/Orac leHomes/oms10g/lib/dms.jar:/scratch/OracleHomes/oms10g/j2ee/home/oc4j.jar:/scr atch/OracleHomes/oms10g/lib/xschema.jar:/scratch/OracleHomes/oms10g/lib/xmlpar serv2.jar:/scratch/OracleHomes/oms10g/opmn/lib/ons.jar:/scratch/OracleHomes/om s10g/dcm/lib/oc4j_deploy_tools.jar oracle.j2ee.tools.deploy.Oc4jDeploy -oraclehome /scratch/OracleHomes/oms10g -verbose -inifile /scratch/OracleHomes/oms10g/j2ee/deploy.master -redeploy

On Microsoft Windows, replace runConfig.sh with runConfig.bat in the previously mentioned command.

Enterprise Manager Deployment Fails

Enterprise Manager deployment may fail due to the Rules Manager deployment failure.

To resolve this issue, redeploy Enterprise Manager by following these steps:

Move OH/j2ee/deploy.master to OH/j2ee/deploy.master.bak.
Execute the OH/bin/EMDeploy script.
Restore the OH/j2ee/deploy.master. That is, execute mv OH/j2ee/deploy.master.bak OH/j2ee/deploy.master

Oracle Management Service Configuration Fails

Oracle Management Service configuration may fail due the following reasons.

Oracle Management Service Fails While Deploying Enterprise Manager AgentPush Application

The cfgfw logs display the following error:

Redeploying application 'EMAgentPush' to OC4J instance 'OC4J_EMPROV'. FAILED! ERROR: Caught exception while deploying 'EMAgentPush' to 'OC4J_EMPROV':java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

This error is due to Ipv6 entries in /etc/hosts file. When prompted to execute root.sh or when configuration fails, do the following:

In <OMS Home>/sysman/install/EMDeployTool.pm, include "-Djava.net.preferIPv4Stack=true" in the command executed in deployEmEar ().
In <OMS Home>/opmn/conf/opmn.xml, include "-Djava.net.preferIPv4Stack=true" in java-options of all OC4J processes.

In 'Enterprise Manager with new Database' Install, Oracle Management Service Configuration Fails While Unlocking Passwords

The cfgfw logs display the following error:

Failed to initialize JDBC Connection

This is caused when listener does not start during NetCA execution and the following error will be present in the installActions log:

Listener start failed. Listener may already be running.

To rectify this error, add the following line in <DB Home>/network/admin/listener.ora:

SUBSCRIBE_FOR_NODE_DOWN_EVENT_<listener_name>=OFF

Then, restart the listener.

Dropping of Repository Hangs If SYSMAN Sessions are Active

While installing Enterprise Manager using existing database, Oracle Management Service configuration hangs while dropping the repository. This is due to active SYSMAN sessions connected to the database.

To resolve this issue, shutdown any existing Enterprise Manager sessions (both Grid Control and Database Control) or other SQLPLUS SYSMAN sessions.

If Oracle Management Service Configuration is Retried, oracle.sysman.emSDK.svlt.ConsoleServerHost and oracle.sysman.emSDK.svlt.ConsoleServerName in emoms.properties are Swapped and There is an Extra Underscore in ConsoleServerHost

This problem only occurs with 10.2.0.1.0 Additional Oracle Management Server installation.

To resolve this issue, swap the values and remove the extra underscore in ConsoleServerName in emoms.properties present in <OMS_ORACLE_HOME>/sysman/config directory.

Enterprise Manager Upgrade and Recovery Issues

The Enterprise Manager 10g Release 2 upgrade is an out-of-place upgrade, meaning that Enterprise Manager 10g Release 2 Oracle homes are separate from the old homes. If you decide to abort the upgrade process during the copying phase (copying of the binaries), you can simply revert to your old 10g Release 1 installation.

The upgrade process creates a new OMS home and a new database home. The Upgrade assistants upgrade the datafiles and SYSMAN schema, and then configure the new Oracle homes.

Caution:

Do not abort the upgrade process during the configuration phase, as this will corrupt the installation. You will not be able to revert to the old 10g Release 1 installation either.

Agent Upgrade Issues

This section lists some of the issues that you may encounter during an agent upgrade.

Agent Does Not Start Up After Upgrade

During an agent upgrade from 10.1.0.n to 10.2.0.2, the agent may fail to start up after upgrade if the time zone that is configured for the upgraded agent is different from the originally configured agent.

You can correct this issue by changing the time zone. To do this, execute the following command from the upgraded agent home:

emctl resetTZ agent

This command will correct the agent-side time zone, and specify an additional command to be run against the repository to correct the value there.

Caution:

Before you change the time zone, check if there are any blackouts that are currently running or scheduled to run on any of the targets that are monitored by the upgraded agent. Do the following to check this:

In the Grid Control console, go to the All Targets page under Targets and locate the Agent in the list of targets. Click the agent name link. The Agent home page appears.
The list of targets monitored by the agent will be listed in the Monitored Targets section.
For each target in the list, click the target name to view the target home page.
Here, in the Related Links section, click Blackouts to check any blackouts that are currently running or may be scheduled to run in the future.
If such blackouts exist, you must stop all the blackouts that are running on all the targets monitored by this agent.
From the console, stop all the targets that are scheduled to run on any of these monitored targets.
Now, run the following command from the agent home to reset the time zone;
```
emctl resetTZ agent
```
After the time zone is reset, you can create new blackouts on the targets.

Missing Directories When Upgrading Agent from 10.1.0.5 to 10.2.0.1

In a Windows NT RAC agent upgrade scenario, after the AgentOnly shiphome installer has completed installation, the utility <Upgrade AOH>/oui/bin/upgrade has to be executed on every single node in the RAC to complete the agent upgrade.

Enterprise Manager Recovery

This sections provides the instructions to be followed to perform an Enterprise Manager recovery.

Steps to Follow for Agent Recovery

Use the following instructions to perform an agent recovery:

After exiting the installer, you must open a new window and change the directory to the <New_AgentHome>/bin.
Execute the script ./upgrade_recover.
You can then start the old agent and continue using it. If you want to remove the installed binaries of the new agent home, use the Remove Productions function of the installer.

Steps to Follow for OMS Recovery

If the schema has been upgraded or the upgrade was incomplete, you must manually restore the database to the backup that was taken prior to executing the OMS upgrade.

You can determine the status of the repository upgrade by looking into the log file at <New_OMSHome>/sysman/log/emrepmgr.log.<proc_id>. The last line of the log file provides the status of the upgrade. If the upgrade was completed without errors, it reads Repository Upgrade Successful. If not, the message Repository Upgrade has errors… is displayed.

Follow these instructions to perform an OMS recovery:

Note:

Before you attempt to restore the database, you must exit the Upgrade wizard. You must also ensure there are no OMS processes that are running. See Chapter11, "Shut Down Enterprise Manager Before Upgrade" for more information on shutting down the Enterprise Manager processes.

Caution:

Ensure all OMS processes are completely shut down. If not, the system may become unstable after the upgrade.

Restore the database to the backup. See Oracle Database Administrator's Guide for more information.
After the database is restored, start the database and listener to ensure successful restoration.
Open a new window and change the directory to the <New_OMSHome>/bin.
Now, execute the ./upgrade_recover.

Start the old OMS and continue to use it. If you want to remove the binaries of the newly installed OMS home, use the Remove Productions function in the installer.

Steps to Re-create the Repository

If the Management Service configuration plugin fails due to the repository creation failure, rerunning the configuration tool from Oracle Universal Installer drops the repository and re-creates it. However, if you want to manually drop the repository, complete the following steps:

Dropping the Repository

Stop the OPMN processes (<OMSHOME>/bin/opmnctl stopall), Management Service (<OMS_HOME>/bin/emctl stop oms), and Agent (<AGENT_HOME>/bin/emctl stop agent) before dropping the repository.
Set ORACLE_HOME to OMS_OracleHome
Execute OMS_Home/sysman/admin/emdrep/RepManager <hostname> <port> <SID> -action drop -output_file <log_file>

Creating the Repository

Set ORACLE_HOME to OMS_OracleHome.
Execute OMS_Home/sysman/admin/emdrep/RepManager <hostname> <port> <SID> -action create -output_file <log_file>.

Note:

After recreating the repository, you must run the following command on all the Management Service Oracle homes to reconfigure the emkey:

emctl config emkey -repos -force

This command overwrites the emkey.ora file with the newly generated emkey.

Caution:

While recreating the repository using ./Repmanager -action create command, you may encounter the following error message:

java.sql.SQLExecution: ORA-28000: the account is locked during recreation of repository.

Workaround

This error may occur if there are processes or multiple Management Services that are trying to connect to the database with incorrect SYSMAN credentials. If there are multiple login failures, the database becomes locked up and shuts down the monitoring agent.

You can resolve this issue by shutting down all the Management Services connected to the database, along with the monitoring agent.

Repository Creation Fails

When installing Enterprise Manager using an existing database, the repository creation fails.

This may happen if the profile of the Password Verification resource name in the database has a value that is other than Default. To resolve this issue:

Change the Password Verification profile value to Default.
Create the repository using RepManager command.

Collection Errors After Upgrade

If you upgrade only the Management Service to 10g Release 2 without upgrading the monitoring agent, you may encounter the following collection errors:

Target Management Services and Repository
Type OMS and Repository
Metric Response
Collection Timestamp <session_time_stamp>
Error Type Collection Failure
Message Target is in Broken State. Reason - Target deleted from agent

To resolve this issue, upgrade the monitoring agent along with the Management Service to 10g Release 2.

Oracle Management Service Upgrade Issues

You may encounter problems during Management Service upgrade where the upgrade process aborts due to the following reasons.

OMS Upgrade Stops at OracleAS Upgrade Assistant Failure

The installation dialog box and the configuration framework log file (located at<New_OracleHome>/cfgtoollogs/cfgfw/oracle.sysman.top.oms_#date.log) lists SEVERE messages indicating the reason the Oracle Application Server Upgrade Assistant fails.

If the message displays permission denied on certain files, it means that the user running the installer may not have the correct permissions to run certain iAS configurations.

To resolve this issue, comment out the OracleAS configuration that contains these files and then retry the upgrade again. You can reapply the configurations after the upgrade is successfully completed.

OMS Configuration Stops at EMDeploy Failure

The most common reasons for EMDeploy to fail are if:

All Enterprise Manager processes are not shut down completely.

To shut down Enterprise Manager, execute the following commands:
```
<Oracle_Home>/opmn/bin/opmnctl stopall
<Oracle_Home>/bin/emctl stop em
```
See Chapter11, "Shut Down Enterprise Manager Before Upgrade" for more information.
Symbolic links have been used instead of hard links

The <Oracle_Home>/Apache/<component> configuration files must be examined to ensure only hard links (and no symbolic links) were referenced. See Chapter11, "Check for Symbolic Links" for more information.

After you have successfully resolved these issues, perform the redeploy steps manually and click Retry on the Upgrade wizard.

OMS Configuration Stops at Repository Schema Failure (RepManager)

The most common reason the repository schema configuration fails is when it is not able to connect to the listener. The configuration framework log file (<New_OracleHome>/cfgtoollogs/cfgfw/oracle.sysman.top.oms_#date.log) indicates the reason for the repository schema upgrade failure.

To resolve this issue, you must verify whether or not the listener connecting to the OMS is valid and active.

Also, if you have installed the OMS using the Install Enterprise Manager Using New Database installation type, ensure there are no symbolic links being referenced. After you have successfully established the listener connections, click Retry on the Upgrade wizard.

Monitoring Agent Does Not Discover Upgraded Targets

If you have upgraded an Enterprise Manager Grid Control target (for example, database) independently (that is using a regular upgrade mechanism other than the Oracle Universal Installer), the monitoring agent may fail to discover this upgraded target.

This can happen if you have specified a different Oracle home value for the upgraded target other than the one that already existed.

To resolve this issue, you must manually configure the targets.xmlfile of the monitoring agent to update the configuration details of the upgraded Oracle home information, or log in to the Enterprise Manager console, select the appropriate target, and modify its configuration parameters to reflect the upgraded target parameters.

CSA Collector Is Not Discovered During Agent Upgrade

When a 10g Release 1 Management Service and its associated (monitoring) agent are upgraded at the same time, the agent upgrade does not discover the CSA Collector target.

To discover this target, you must run the agent configuration assistant (the agentca script) using the rediscovery option. See Chapter7, "Rediscover and Reconfigure Targets on Standalone Agents" for more information.

ias_admin Password Is Set To welcome1 After Upgrade

To resolve this issue, run the following command:

<New OMS Home>/bin/emctl set password welcome1 <New Password>

Oracle Management Service Upgrade Fails If Older Listener Is Running On A Port Other Than 1521

To resolve this issue, do the following:

Stop the older listener when prompted to execute allroot.sh. The Oracle Management Service upgrade will fail.
Set the listener from the new database to run from the same non-1521 port.
Run the upgrade again.

Network Issues

This section lists network issues you may encounter during Enterprise Manager installation and configuration.

Incorrect Format For Entries In /etc/hosts File

This will cause the installation to hang and OUI-25031 or OUI-10104 errors in log files.

Entries in the /etc/hosts file should be in the following format:

IP_Address Canonical_Hostname Aliases

For example:

11.22.33.441 abc.xyz.com abc1 xyz2

When creating the /etc/hosts file, follow these rules:

Host name may contain only alphanumeric characters, hyphen, and period. The name must begin with an alphabetic character and end with an alphanumeric character.
Lines cannot start with a blank or tab character.
Fields can have any number of blanks or tab characters separating them.
Comments are allowed and designated by a pound sign (#) preceding the comment text.
Trailing blank and tab characters are allowed.
Blank line entries are allowed.
Only one host entry per line is allowed.

Forward lookup is finding IP address given the hostname. Reverse lookup is finding hostname given the IP address. Results of forward and reverse lookups should be the same. It is usually different because of case difference (upper/lower) in hostnames and aliases.

For 10.2.0.1 Enterprise Manager installations, if a host name contains an upper case letter, securing of Agent will fail.

Enterprise Manager Installation on Computers With Multiple Addresses

While installing Enterprise Manager or related components on Multi-homed (Multi-IP) machines, that is, a machine having multiple IP addresses, hostname will be derived from ORACLE_HOSTNAME environment variable if it is set; else the first name in /etc/hosts will be considered for installation purposes.

Agent Configuration Fails on A Non-Network Computer

To resolve this error, Oracle Management Service and target host where the Agent needs to be installed should be pingable.

Loopback Adapter On Windows and Related Known Issues

If installing Enterprise Manager or related components on a DHCP host, one needs to install a loopback adapter to assign a local IP address to that computer.

Note:

Refer to section 2.4.5 Installing a Loopback Adapter of the Oracle® Database Installation Guide 10g Release 2 (10.2) for Microsoft Windows (32-Bit) Part Number B14316-02 for more information.

Ensure that the following conditions are met:

The /etc/hosts file should contain the following entry:

<lopback IP Address><hostname.domainname> <hostname>

For example:

127.0.0.1 localhost.localdomain localhost

Ensure that the IP address specified in /etc/hosts is correct otherwise allocation of ports will fail

Other Installation and Configuration Issues

This section lists some of the generic errors that you may encounter during Enterprise Manager installation and configuration.

Storage Data Has Metric Collection Errors

The following Enterprise Manager collection error message may appear from agents installed through silent or agentdownload install mechanisms:

snmhsutl.c:executable nmhs should have root suid enabled.

Perform the required root install actions (using root.sh script on UNIX platforms only) to resolve this issue. It may take up to 24 hours before the resolution is reflected.

Cannot Add Systems to Grid Environment from the Grid Control Console

You cannot add new targets to your grid environment if you do not have an agent already installed.

To install the agent from your Grid Control console:

Log in to the Grid Control console and go to the Deployments page.
Click Install Agent under the Agent Installation section.
In the Agent Deploy home page that appears, select the appropriate installation option that you want to perform. See Chapter6, "Agent Deploy Installation Prerequisites" for more information.

Error During Deinstallation of Grid Control Targets

After deinstalling certain Grid Control targets, when you try to remove the same targets from the Grid Control console, you may encounter an exception with a message similar to the following:

java.sql.SQLException: ORA-20242: Target <target name> is monitoring other targets. It cannot be deleted.

To resolve this issue, deinstall the Grid Contol targets and wait for at least 15 minutes before you attempt to remove the targets from the Grid Control console using the Hosts page. This time is required for the deinstallation information to propagate to the Management Repository.

Need More Help

If this appendix does not solve the problem you encountered, try these other sources:

Oracle Enterprise Manager Release Notes, available on the Oracle Technology Network Web site (http://www.oracle.com/technology/documentation).
Oracle

(http://metalink.oracle.com).

If you do not find a solution for your problem, log a service request.