Archive for the ‘Install 10.2.0.4 agent’ Category

Install 10.2.0.4 agent

Tuesday, August 5th, 2008

Install OEM agent 10.2.0.4

 


This document outlines steps to install OEM 10.2.0.4 agents, and troubleshoot errors such as “Suspended on Agent Unreachable” or “Instance Health Check initialization failed” or “Unreachable Start”. See section Troubleshooting on this document for more detail and possible fixes.

Naming you will see here:

OS is HPUX.

Target server where OMS is installed: myoms

Target server where agent is installed: myagent

oma is the os account owner of OEM agent.

 

Download agent 10.2.0.4:

From Oracle OTN download agent 10.2.0.4

unzip HPUX_Grid_Control_agent_download_10_2_0_4_0.zip into an staging area.

àDOwnloads -àEnterprise Manager àSECTION: Mass Agent Deployment

 

1) Un-install previous agent.

I currently haved agent 10.2.0.3 which I would need to un-install following these steps:

vi /var/opt/oracle/oraInst.loc

àHere make sure you point it to a orainventory location where you have agents 10.2.0.3

(e.g. /u14/app/oracle/product/OEM10.2/oraInventory).

 

stop agents:

cd /u14/app/oracle/product/OEM10.2/agent10g/bin

./emctl stop agent

 

Set environment:

TEMPDIR=/u02/temp

export TEMPDIR

TEMP=/u02/temp

export TEMP

TMP=/u04/tmp

export TMP

DISPLAY=172.16.52.156:0.0

 export DISPLAY

unset OBJECT_MODE

Start Hummingbird–> Exceed

unset ORACLE_HOME

ORACLE_HOME=/u14/app/oracle/product/OEM10.2/agent10g

export ORACLE_HOME

 

un-install:

as oma (OS owner of agents)

cd /u14/stage/OMA/agent10204/hpunix/agent

./runInstaller

 

Once OUI starts, click installed products and remode agent10g which is in

 

/u14/app/oracle/product/OEM10.2/agent10g

àThis run for 10 minutes and successfully removed agents 10.2.0.3

 

 

 

 

2) Install agent 10.2.0.4

cd /var/opt/oracle

vi /var/opt/oracle/oraInst.loc

àHere make sure you point it to a new orainventory location within the new home where you

 

would be installing

the 10.2 agents, create this new orainventory location if it does not already exist

 

(/u14/app/oracle/product/OEM10.2/oraInventory).

 

As OS user owner of the 10.2.0.4 agents vi .profile and make sure you have the following line

 

ulimit -n 1024 then log out and log back in.

 

remove 10.2.0.3 OS files:

cd /u14/app/oracle/product/OEM10.2

pwd

rm -Rf *

 

Set environment:

TEMPDIR=/u02/temp

export TEMPDIR

TEMP=/u02/temp

export TEMP

TMP=/u04/tmp

export TMP

DISPLAY=172.16.52.156:0.0

 export DISPLAY

unset OBJECT_MODE

Start Hummingbird–> Exceed

unset ORACLE_HOME

ORACLE_HOME=/u14/app/oracle/product/OEM10.2/agent10g

export ORACLE_HOME

 

install:

as oma (OS owner of agents)

cd /u14/stage/OMA/agent10204/hpunix/agent

./runInstaller

 

Once OUI starts, and prompted you for parent directory, set it to

 

/u14/app/oracle/product/OEM10.2

Make sure agent10g does not exist under /u14/app/oracle/product/OEM10.2

set Management service Hostname to myoms.mydomain.com with default port 4889

àThis run for 15 minutes no errors and it tells me to run the roor.sh

logon as root or an account with sudo

cd /u14/app/oracle/product/OEM10.2/agent10g

./root.sh                        àThis run for 3 seconds no erros.

 

Now connect to all target datanase instances monitored by this agent:

alter user dbsnmp identified by oem_password;

Now start a browser into https://myoms.mydomain.com:1159/em and navigate to Targetsà Databaseàdatabase nameà Configure

            àThis would Unlock or recreate monitor account dbsnmp and instance will start being

 

monitored by GRID.

 

àAll success.

 

If necessary the following might help to clear any errors you might get after installation:

cd /u14/app/oracle/product/OEM10.2/agent10g/bin

./emctl clearstate agent

./emctl upload

 

 

Troubleshooting:

Email from GRID: Agent is Unreachable. Also all jobs sending email with Status=Suspended on Agent Unreachable

Getting errors as seen below in

/u14/app/oracle/product/OEM10.2/agent10g/sysman/log/emagent.trc.1 :

2008-08-01 06:22:20,466 Thread-12293 ERROR util.files: ERROR: nmeufos_new: failed in lfiopn on file: /u14/app/oracle/product/O

EM10.2/agent10g/sysman/emd/agntstmp.txt.error = 24 (Too many open files)

2008-08-01 06:22:20,466 Thread-12293 ERROR pingManager: Error in updating the agent time stamp file 2008-08-01 06:22:20,467 Thread-12293 ERROR http: snmehl_connect: failed to create socket: Too many open files (error = 24)

2008-08-01 06:22:20,467 Thread-12293 ERROR pingManager: nmepm_pingReposURL: Cannot connect to https://myoms.mydomain.com:1159/em/upload: retStatus=-32

2008-08-01 06:22:20,468 Thread-12293 ERROR util.files: ERROR: nmeufos_new: failed in lfiopn on file: /u14/app/oracle/product/OEM10.2/agent10g/sysman/emd/agntstmp.txt.error = 24 (Too many open files)

2008-08-01 06:22:20,468 Thread-12293 ERROR pingManager: Error in updating the agent time stamp file

2008-08-01 06:22:26,483 Thread-12294 ERROR util.fileops: ERROR: snmeuf_dirlist can’t list directory: /u14/app/oracle/product/OEM10.2/agent10g/sysman/emd/upload: Too many open files (errno=24)

2008-08-01 06:22:26,484 Thread-12295 ERROR engine: Failed when generating a new ECID.

2008-08-01 06:22:26,491 Thread-12295 ERROR fetchlets.healthCheck: GIM-00104: file not found

LEM-00031: file not found; arguments: [lempgmh] [lmserr]

LEM-00033: file not found; arguments: [lempgfm] [Couldn't open message file]

LEM-00031: file not found; arguments: [lempgmh] [lmserr]

2008-08-01 06:22:26,491 Thread-12295 ERROR engine:

 

[oracle_database,mysid.mydomain.com,health_check] : nmeegd_GetMetricData fa

iled : Instance Health Check initialization failed due to one of the following causes: the

 

owner of the EM agent process is no

t same as the owner of the Oracle instance processes; the owner of the EM agent process is not part of the dba group; or the database version is not 10g (10.1.0.2) and above.

àEnd of paste from log emagent.trc.1

 

àMessage Too many open files:

To fix this shutdown or kill running agent. Then as oma do ulimit -a            this shows me currentl a setting of 60 for nofiles(descriptors). and ulimit -aH which is hard limit shows 1024 for nofiles(descriptors). So I edit oma .profile file and insert the following line at the end of the file        ulimit -n 1024    then log out and log back in as oma and restrt agents:

cd /u14/app/oracle/product/OEM10.2/agent10g/bin

ps -aef | grep -i oma

àKill any oma processes if any.

./emctl start agent

./emctl clearstate agent

./emctl upload

 

Analysis.

Using glance (shift f) for PID of OMA emagent and sar -v               –>I can see that oma

continuously opens files specially /u02/…/product/10.2.0/dbs/hc_mysid.dat until it reaches the 1024 limit and then it crashes. Currently it has over 650 open files and increasing and agent was started close to 4 hours ago.

 

I open a TAR and Oracle support recommands Note: 430805.1

430805.1 says to apply the following patches to fix this problem which seems to be specific to HPUX.

1) Srop or kill agent

2) as oma owner of agent

cd $AGENT_HOME/rdbms/lib

cp ins_rdbms.mk ins_rdbms.mk.org

vi ins_rdbms.mk

comment the line GENOCCISH so that the content of the makefile looks like:

 

client_sharedlib:    

$(GENCLNTSH)    

# $(GENOCCISH)    

$(GENAGTSH) $(LIBAGTSH) 1.0

 

3)Apply the Patch 5854190 in the Agent Oracle Home by following the instructions given in the README

For 10.2.0.3 / 10.2.0.4 Agent on HP-UX PA-RISC, download the version 10.2.0.3 of Patch 5854190

For 10.2.0.3 Agent on HP-UX Itanium, download the version 10.2.0.2 of Patch 5854190

For 10.2.0.4 Agent on HP-UX Itanium, download the version 10.2.0.3 of Patch 5854190

 

Download into staging area.

Make sure /var/opt/oracle/oraInst.loc points to where you have installed agent 10.2.0.4

cd to staging area and unzip p5854190_10203_HP64.zip

cd /u14/stage/OMA/5854190/5854190

ORACLE_HOME=/u14/app/oracle/product/OEM10.2/agent10g

export ORACLE_HOME

/u14/app/oracle/product/OEM10.2/agent10g/OPatch/opatch apply

àThis would run for 3 minutes and finishes successfully with message: OPatch succeeded.

 

4) Relink the Agent by following Note 273189.1

Set ORACLE_HOME to agent home:

ORACLE_HOME=/u14/app/oracle/product/OEM10.2/agent10g

export ORACLE_HOME

cd into the 10G Central Agent home/bin directory.

cd $ORACLE_HOME/bin

make sure agent is down:

./emctl stop agent

./emctl status agent

cd $ORACLE_HOME/sysman/lib

make -f ins_emagent.mk agent

–>This run for 2 minutes, no errors.

cd $ORACLE_HOME

pwd

/u14/app/oracle/product/OEM10.2/agent10g

su – root (or sudo)

cd /u14/app/oracle/product/OEM10.2/agent10g

./root.sh

say no to overwrotr “dbhome”, “oraenv” and “coraenv”

exit

su – oma

ORACLE_HOME=/u14/app/oracle/product/OEM10.2/agent10g

export ORACLE_HOME

cd $ORACLE_HOME/bin

emctl start agent

 

5) Patch target databases:

Do the following steps in each monitored ORACLE_HOME:

su – oracle

10.1.0.3, 10.1.0.2: consider patching / upgrading

10.2.0.2: Apply the Patch 4559294 in the RDBMS Oracle Home by following instructions given in

 

the README.

10.2.0.3 (and above): nothing to be done. The fix is already included.

Shutdown the monitored database and patch it.

Make sure /var/opt/oracle/oraInst.loc points to where you have installed target database.

cd /u14/stage/patches/db/4559294

unzip p4559294_10202_HP64.zip

cd 4559294

/u02/app/oracle/product/10.2.0/OPatch/opatch apply

àThis run for 5 minutes no error: OPatch succeeded.

 

Rename / Delete the healthcheck file $RDBMS_HOME/dbs/hc_<SID>.dat file

cd $ORACLE_HOME/dbs

cp hc_mysid.dat hc_mysid.dat.org

rm hc_mysid.dat

 

Restart the monitored database. This will recreate the $RDBMS_HOME/dbs/hc_<SID>.dat file

Bounce agents.

àApplying the agent patch and target database patch fixed the problem.

àFor PROD plan to either go to 10.2.0.4 or 11g.

 

6) Disable the Health Check Metric Collection in Grid Control 10.2

If you cannot patch your database at this moment: see Note 379423.1

This means that the database availability will rely on the Response metric, which is collected by default every 5 minutes.

From GRID home page https://myoms.mydomain.com:1159/em

Targetsà Databaseà Click on the desired databaseà On the bottom click on Metric and Policy Settingsà do a find for status and look for Instance Statusà click on frequency default is Every 15 Secondsàclick disable àClick continue

 

Restart agents.

àJust disabling Health Check Metric Collection on target database does not fix the problem. I had to patch 10.2.0.4 agent and target database and disable Health Check Metric Collection as outlined by steps 3, 4, 5 and 6 on this page.

 

àFixed. The problem did not come back. And all jobs started to run successfully automatically.



Google