Saturday, December 10, 2011

Nice try

Tired of looking this up... so I'm posting it. I don't use the unix nice command very often, but when I do, the man page seems a little confusing. Here's the format I typically use the command:
nice -n +19
That runs the command at the lowest possible priority, helpful if you're going to do some big file op and don't want to cause anyone else any grief.

Monday, August 22, 2011

What service has oci.dll open?


Attempting to use deployment procedure's to install patch bundles on windows hosts, sometimes you run into issues with services keeping dll's (especially oci.dll) open.

Here's a trick to determining what service is keeping a specific dll open.

From a command prompt:

tasklist /m > tasks.txt
notepad tasks.txt


The look for the specified dll in tasks.txt and find the process. From there it should be somewhat obvious which service to shut down in order to get the patch to continue.

Tuesday, August 2, 2011

ORA-01406: fetched column value was truncated

I just got a dump file of our data from a vendor that is hosting an application for us and I was making it available to my developers and since we will be getting files on a somewhat frequent basis, initially I created an external table to read the data.

I had to get over the first hurdle of version incompatibility of the datapump file, which wasn't too bad, but after I cleared up those issues (lots of references on google, this chart helped the most) I received the following error in Toad:

ORA-01406: fetched column value was truncated


with further details
fetched column value was truncated

Cause: In a host language program, a FETCH operation was forced to truncate a
character string. The program buffer area for this column was not large enough to
contain the entire string. The cursor return code from the fetch was +3.

Action: Increase the column buffer area to hold the largest column value or perform
other appropriate processing.


Google wasn't as helpful with this (hence the post here), but essentially this is a client side setting that needs to be adjusted. In version 10.6.1.3 of Toad there is one setting that needs to be changed. Search for "OCI Array Buffer size" or find it under "Oracle - General". You need to change it to be large enough to fetch the largest column in the query. In my case I had a column of size 4000, so setting this to 4000 returned results.

Wednesday, July 20, 2011

Fun with Logon Triggers

Sometimes a process happens too fast to be traced effectively, but it's happening slowly enough to make your users angry.

Here's an example of a logon trigger setup to make a trace happen, we're using it to diagnose an issue with slow response from an ODBC connection, using user account ADMIN_JOSH:


DROP TRIGGER SYSTEM.JOSH_LOGINTRIGGER;

CREATE OR REPLACE TRIGGER SYSTEM.Josh_LoginTrigger AFTER LOGON ON ADMIN_JOSH.SCHEMA
BEGIN
execute immediate 'ALTER SESSION SET tracefile_identifier = MYTRACEFILE';
execute immediate 'alter session set sql_trace=true';
END;
/



Note if you get the following error, the owner of the trigger needs the alter session priv (in this case system).

ORA-00604: error occurred at recursive SQL level 1
ORA-01031: insufficient privileges
ORA-06512: at line 2

Friday, July 8, 2011

It's the env, stupid

While performing the following...

$ORACLE_HOME/forms/lib32/make -f ins_forms.mk install


Got the following error:


ld: fatal: library -ljvm: not found
ld: fatal: File processing errors. No output written to frmbld
make: *** [frmbld] Error 1


Apparently no one else has ever reported this problem before...

Turns out the user's env didn't include LD_LIBRARY_PATH. Fixed that, and problem was solved.

Thursday, July 7, 2011

It pays to read the documentation

Had an issue where I couldn't get forms builder to work.

Here's the error code:

# frmbld
FRM-91111: Internal Error: window system startup failure.
FRM-10039: Unable to start up the Form Builder.


It should work right out of the box, looks like the build config had changed much, here's the env file used:

ORACLE_HOME=/u01/app/oracle/product/ias10gds
export ORACLE_HOME
LD_LIBRARY_PATH=$ORACLE_HOME/lib:$ORACLE_HOME/jdk/jre/lib/sparc:$ORACLE_HOME/jdk/jre/lib/sparc/native_threads
export LD_LIBRARY_PATH
PATH=$ORACLE_HOME/bin:$ORACLE_HOME/opmn/bin:/usr/local/bin:/usr/ccs/bin:/bin:/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/usr/ucb:/usr/ccs/bin:/usr/sbin:/usr/X/bin:$ORACLE_HOME/OPatch
export PATH
FORMS_PATH=/u01/app/sct/baninbr2/QUAL/forms/fmb
export FORMS_PATH


Eventually found this documentation post about how to solve the problem. Basically since the last time we used forms builder on this server, the other Oracle homes required a change to NLS_LANG with a UTF8 characterset.

Simply added the following to the above forms builder env file fixed the issue with forms builder:

export NLS_LANG=


Problem solved.

Before you set global_names=true

For all the years that I've been working with Oracle I have not had to deal with the oracle streams capture/apply processes. Other DBA's on my team have had to work with it, but I didn't get the hands on experience. That is until past weekend when I had installed a streams backed process for our identity management product for creation of downstream accounts.

Part of the installation process was to set global_names=true, which as I have now found out is the recommendation from oracle. From what little I know about it, it is for additional security as well as preventing bad things from happening during clones (or so I've read).

It turns out that one of the results from enabling global_names is that you end up getting "ORA-02085: database link FOO.DB.COM connects to BAR.DB.COM" when the database link name does not match the target.

I tried "fooling" it with everything I could think of that didn't involve rewriting any of the code that calls the database link, but nothing in fact worked. The name itself has to be changed to match the target database name (which is clearly stated in oracle documentation...I hoped it was a lie).

This was new to me.

I currently have set global_names=false until I can figure out how much code has to be reworked reflecting all the database link name changes.

So the one thing I should have checked into before setting global_names to true was which database links were going to fail after the switch. And had I done that before, I would have realized the answer would be... all of them. At least now I can push for a new standard for everyone to follow when they create database links.

Thursday, June 30, 2011

Fun with ancient clients

User was reporting an error with ODBC after a recent upgrade to 11g:


Unable to connect
SQLState=28000
[Oracle][ODBC][Ora]ORA-01017: invalid username/password; login denied


The user reported, and the tech verified that on this machine they could login with sqlplus but not ODBC on this PC. Multiple accounts were tried.

Interesting test case illuminated the issue: Created user with ALL UPPERCASE PASSWORD. Using this, could connect through ODBC.

The ancient 9i ODBC connection will not send a case sensitive password.

Made the following change, problem solved, user is happy though we probably could use this as an opportunity to force upgrading ODBC clients:


alter system set sec_case_sensitive_logon=FALSE scope=both

Empty variables can be troublesome

Upgrading a 10.2.0.4 ERP test database to 11.2.0.2 today, and shortly after starting the upgrade via DBUA, got the following error:

ORA-16024: parameter LOG_ARCHIVE_DEST_1 cannot be parsed


I had chosen to disable archiving as one of the options for DBUA, but was really surprised to see this error. After a little bit of research I found Metalink Doc 96278.1.

Turns out this error was happening because we had log_archive_dest along with log_archive_dest_1 set -- the kicker was it was set to be blank. Here's the relevant info from the pfile:


*.log_archive_dest=''
*.log_archive_dest_1='LOCATION=/u01/app/oracle/admin//arch'


So the trick was to completely remove the entire *.log_archive_dest='' line from the pfile and DBUA was able to proceed with the upgrade without the ORA-16024 error.

Tuesday, June 21, 2011

Death by a thousand trace files


Found a large number of trace files generated in the /udump admin directory.

Each file looked like the following:
$ cat orcl_ora_12191.trc
/u01/app/oracle/admin/ORCL/udump/orcl_ora_12191.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1
System name: SunOS
Node name: hostname
Release: 5.10
Version: Generic_144488-14
Machine: sun4u
Instance name: ORCL
Redo thread mounted by this instance: 1
Oracle process number: 389
Unix process pid: 12191, image: oracle@hostname

opiino: Attach failed! error=-1 ifvp=0
Not much help on support.oracle.com or google in diagnosing this issue.

Oracle support was helpful in doing some tracing which proved that these were failures to connect rather than existing connections that were lost.

With that information in hand, we correlated the timestamps of the trace files to timestamps of listener connections. This narrowed our issue down to a specific IP address, and by that a specific application.

Our lead application administrator was able to determine the root cause of the failed connections, an outdated library used in development of the application:
From what I can tell CoreLab Oracle is now dotConnect for Oracle from Devart. They resolved this issue with version 5.70.140.

Whew.

UDM for Banner Jobsub and other subsystems

Over the weekend we had a SungardHE Banner subsystem offline. This particular subsystem uses Oracle Pipes to manage user job submission. The downtime could have been prevented if this subsystem had been represented as a target within Grid Control.

So, to prevent future outages, we're going to make this process a User Defined Metric, and have grid control fire off a notification if the service is offline. At the end of every maintenance window, if Grid Control says all systems are online, then my team can have more confidence that all systems are actually online.

Found a great article by Sagar Patil titled Oracle Grid Tracking OS Process Using a Custom User Defined Matrix udmshell script, which is the basis for what follows.

I created the following script that determines if ARG1 has ARG2 processes running and returns either Online or Offline:

#!/usr/local/bin/bash

if [ $# -ne 2 ]
then
echo "Usage: `basename $0` {ORACLE_SID} {Expected Count}"
exit 1
fi
toCheck=$1
testEm=`/bin/ps -ef | /usr/local/bin/grep -v grep | /usr/local/bin/grep -v $0 |
/usr/local/bin/grep $toCheck | /usr/local/bin/wc -l`

if [ $testEm -eq $2 ]
then
echo "em_result=Online"
else
echo "em_result=Offline"
fi
Next I create the UDM.
From the target Host screen, click on "User-Defined Metrics", then click create.

Here's the parameters I used:
Metric Name: GURJOBS DEVL
Metric Type: String
Command Line: /banner/u01/app/oracle/product/agent11g/sysman/emd/custom/checkSubsystems.sh gurjobDEVL 1
User Name: Oracle
Comparison Operator: Match
Critical: Offline

The rest I left as default. After hitting OK, it took about 15 minutes to show that gurjobDEVL was in fact online. Shutting down gurjobs in DEVL returned an error in the UDM screen but did not send a notification.

To enable notifications, I had to do the following:
Grid Control User Preferences -> Rules -> Select "Host Availability and Critical States", then click Edit. In the Metrics tab, click Add, enter "User" in the search and click Go.

To enable the metric, check the box next to User Defined Metric, and then choose "Critical" and "Clear" severity states, then click continue, and then OK.

Whoever is subscribed to that notification rule will be sent a notification per their preferences.

In this simple case, knowing that Banner Jobsub is offline is great. Another example would be with CAPP Pipes, the next step is to build in automatic restart scripts. That will have to wait for another blog post.

Thursday, May 26, 2011

VI trick of the year

Sometimes the faster you type the slower you are.

Yesterday I was working on a configuration file, commenting and uncommenting about 10 different lines. Being a somewhat fast typist, the keyboard gymnastics required to add "--" to the beginning of 10 lines of code wasn't daunting enough to make me consider better ways to get the job done.

Those keyboard gymnastics consisted of the following 6 keypresses per line commented out:
i to insert
-- to comment the line out
ESC to go back to command mode
down arrow
then zero to go to the beginning of the line

I suppose if this had been 25 or 50 or whatever that unknowable mental threshold of "too many lines", I would have used awk or sed to get the job done.

My new coworker saw what I was doing, and asked why I didn't use the repeat command. Que? I asked?

Apparently hitting a period while in command mode will repeat the last insert/append that took place in edit mode. So simply made the change once in edit mode, then hit "." to repeat on the next line.

Ok, maybe not the "trick of the year". Maybe trick of the day is more appropriate, but I'm sure this is one of those tips that I'll be using on a daily basis -- a somewhat year changing event.

Thursday, May 19, 2011

Thanks to this blog post and metalink doc 741004.1, learned some fun stuff about how an Oracle agent determines the amount of free memory available on a Solaris host.




ORACLE_HOME=/u01/app/oracle/product/agent11g
export ORACLE_HOME
PATH=$ORACLE_HOME/bin:$PATH
export PATH

$ORACLE_HOME/bin/nmupm osLoad | awk -F"|" '{print $14}'


Now how to figure out why this would be so wildly different than vmstat...

Friday, May 13, 2011

Insert update and deleting records in a view

Recently a vendor gave me a vague request to grant "write access" to a view. Remembering an error a user had shown me in the past, I replied off the cuff that one can't simply write to a view.

I created the following test to prove that I was right, and you'll see that I proved myself wrong:

drop user u_owner;
create user u_owner identified by pass2pass;
alter user u_owner default tablespace users;
alter user u_owner quota unlimited on users;
grant create session to u_owner;
grant create table to u_owner;
grant create view to u_owner;

drop user u_viewer;
create user u_viewer identified by pass2pass;
grant create session to u_viewer;

conn u_owner/pass2pass;

create table tb_test (v_int1 int);
create view vw_test as select v_int1 from tb_test;
grant select, insert, update, delete on vw_test to u_viewer;

conn u_viewer/pass2pass;

select * from u_owner.vw_test;
insert into u_owner.vw_test values (1);
select * from u_owner.vw_test;
update u_owner.vw_test set v_int1 = 2;
select * from u_owner.vw_test;
delete u_owner.vw_test where v_int1 = 2;
select * from u_owner.vw_test;
conn / as sysdba;
drop user u_owner cascade;
drop user u_viewer;

exit;


So as it turns out you absolutely can insert / update / delete into views. Still trying to make sense of it, but it *is* possible.

Wednesday, March 30, 2011

Grid control claims App Server 10.1.2 is down wen console says it's up

I'm currently going through moving all of my targets from grid control repositories and in the process of doing that I'm cleaning up anything that doesn't need to be there, or anything that reports as being down. For some reason I like to see the fully green pie chart.

I had added one of my oracle mid tiers that had a version of 10.1.2.3 running to support our forms application and grid control reported that the app server was down, but I knew for a fact it was not. All the subcomponents were reporing as up, opmnctl reported everything up, and the iasconsole on the server itself reported everything was up. In summary, it was up.

After much digging through perl code and metalink docs I have resolved the issue and figured out it had to do with Solaris 10 security of /usr/ucb/ps command not being able to report extended process info without being the owner of the process. There was a pretty good metalink document that put me on this trail, but didn't solve the problem itself. Here's a metalink ID roundup for what helped (or didn't but was related).


395013.1 - Application Server shown as Down in Grid Control even though all of its Components are shown as up
276350.1 - How to Enable the Metric Browser/Agent Browser for the Oracle Management Agent [Video]
752944.1 - CMSDK ORACLE_IAS TARGET DOWN WHEN IT IS UP

Enabling the metric browser was essential for this, in addition to running the perl scripts manually from the host.

So the problem ended up being this, the agent checks for forms processes on the host and reports back the sid.
Run this as your agent owner, and then also as root to see the difference.
/usr/ucb/ps -axww | grep oc4j.jar | grep OC4J_BI_Forms | grep <$ORACLE_HOME>


Since the process is owned by a different user than the agent, it can't get the extended process info like root can. So to solve that two things had to be done.

First set up sudo privileges for your agent user to run /usr/ucb/ps as root
gridagent          ALL=(root)NOPASSWD: /usr/ucb/ps


Second, modify the perl code in $AGENT_HOME/sysman/admin/scripts/formsinit.pl to use sudo.
Change
elsif ( $os eq "SunOS" )
{
$PS = "/usr/ucb/ps -axww";

To
elsif ( $os eq "SunOS" )
{
$PS = "sudo /usr/ucb/ps -axww";



Restart the agent force an upload and check all your metrics again. You can do that via the metric browser, the commandline or wait for grid control to register the target again.

If anyone finds a metalink document that covers this exactly please add something in the comments.

Tuesday, March 8, 2011

OCM patch required even though you're not using OCM

While installing OAS 10gR3 patch 8626084 to upgrade OAS to 10.1.3.5, I received an error that step "Run One-off OPatches" had failed.

Looking at the log file, I found the following:


This is a OCM patch.
Home has OCM installed but not configured.


To run in silent mode, OPatch requires a response file for Oracle Configuration Manager (OCM).
Run /u01/app/oracle/product/oas10gr3/OPatch/ocm/bin/emocmrsp to generate an OCM response file. The generated response file can be reused on different platforms and in multiple OPatch silent installs.

To regenerate an OCM response file, rerun /u01/app/oracle/product/oas10gr3/OPatch/ocm/bin/emocmrsp.


ERROR: OPatch failed because of cmd. args. problem.


Doing a bit of digging, I found this recommendation to go ahead and run setupCCR in disconnected mode.

What's frustrating is that I requested that OCM not be installed or configured.

The following did the trick and I was able to complete the patch installation.

/u01/app/oracle/product/oas10gr3/ccr/bin/setupCCR -s -d

Thursday, February 24, 2011

Clean up old xauth entries

In a followup to a previous post on forwarding x sessions and su, here's a quick way to clean up old xauth entries.

> xauth list | cut -f1 -d\  | xargs -i xauth remove {}


You'll recall that cleanup will not happen automatically for the sessions that got su'ed to, so if you have several entries hanging around the one-liner will make quick work of them.

On the subject of support requests

Last night I was working on providing some information to a tech on MOS, and part of their request was screenshots. While I don't mind taking screenshots, it's actually a little quicker to take a screencast.

I used to use CamStudio or Wink, but now that I'm primarily using a Mac, I find that it's much simpler to use the java app at http://screencast-o-matic.com/.

Screencast-o-matic is drop dead simple to use, creates a reasonably small file, and allows you to provide more information to Oracle support with less effort. Edit: It's free for up to 15 minute long files.

Highly recommended.

Wednesday, February 23, 2011

Oracle database file migration script

I recently had a project to migrate datafiles to our new san array, and part of that project I wanted to split apart each of the databases to their own mount points to allow mirror splits to be easier for quick backups. To perform this I pretty much had two options to move all the datafiles (60-70 total/database). I could dump the controlfile to trace, make the modifications then recreate the controlfile or I would have to move the datafiles on the os and perform alter database statements.

I wanted to just go the route of recreating the controlfile because it would have been easier to do mass edits of one file but these particular databases were not registered to an rman catalog, and I did not want to lose backup history even though these were just test environments cloned on a regular basis.

So, the choice I had left was to move files and do the alter databases. After doing a bit of cursory searching I didn't find any scripts to generate the statements for me, so I put together a driver script that made all the necessary steps. Hopefully this will save someone time in the future with a script to start from. This was done against 10gR2.

Some assumptions:

  • The datafile locations assume ofa compliant structure "/uxx/oradata/SID/"
  • For the driver script being created the destination will be all the same, you can change this after the script has been generated.
  • You need to change the source and destination.



set serveroutput on size unlimited for wra;
set linesize 200
set feedback off
set echo off
set timi off
set trimspool on

spool &&SID._generate

CREATE TABLE weg_temp (
step INTEGER,
seq integer,
STATEMENT varchar2(200));

-- Generate move the controlfile script
DECLARE
-- REGEX MATCHING PARMS
source VARCHAR2 (10) := '/u0[1-3]';
destination VARCHAR2 (5) := '/u06';

CURSOR ctrl_file_cur
IS
SELECT name FROM v$controlfile;

CURSOR redo_cur
IS
SELECT MEMBER FROM v$logfile;

CURSOR datafile_cur
IS
SELECT name FROM v$datafile;

CURSOR tempfile_cur
IS
SELECT name FROM v$tempfile;

ctrl_file_rec ctrl_file_cur%ROWTYPE;
redo_rec redo_cur%ROWTYPE;
curholder VARCHAR2 (500);
sqlstatement VARCHAR2 (2000);
unixstatement VARCHAR2 (2000);
loopcount INTEGER;
regexbuffer varchar2(100);
BEGIN
sqlstatement := 'ALTER SYSTEM SET control_files=';
loopcount := 0;

-- Populate the table with the section headings and static steps
insert into weg_temp values (1,0,'-- 1. Change the controlfile locations in the spfile');
insert into weg_temp values (2,0,'-- 2. Shutdown the database');
insert into weg_temp values (2,1, 'shutdown immediate;');
insert into weg_temp values (3,0,'# 3. Move the control files on the os');
insert into weg_temp values (4,0,'# 4. Move the redo log files on the os');
insert into weg_temp values (5,0,'# 5. Move the datafiles on the os');
insert into weg_temp values (6,0,'# 6. Move the tempfiles on the os');
insert into weg_temp values (7,0, '-- 7. Start the database in mount');
insert into weg_temp values (7,1, 'startup mount;');
insert into weg_temp values (8,0, '-- 8. Change redo log files in the database');
insert into weg_temp values (9,0, '-- 9. Change datafiles in the database');
insert into weg_temp values (10,0, '-- 10. Change tempfiles in the database');
insert into weg_temp values (11,0, '-- 11. Open the database and hope it works.');
insert into weg_temp values (11,1, 'alter database open;');

-- Spacing for file
insert into weg_temp values (2,-1,NULL);
insert into weg_temp values (3,-1,NULL);
insert into weg_temp values (4,-1,NULL);
insert into weg_temp values (5,-1,NULL);
insert into weg_temp values (6,-1,NULL);
insert into weg_temp values (7,-1,NULL);
insert into weg_temp values (8,-1,NULL);
insert into weg_temp values (9,-1,NULL);
insert into weg_temp values (10,-1,NULL);
insert into weg_temp values (11,-1,NULL);



OPEN ctrl_file_cur;

LOOP
FETCH ctrl_file_cur INTO ctrl_file_rec;

EXIT WHEN ctrl_file_cur%NOTFOUND;

IF loopcount > 0
THEN
sqlstatement := sqlstatement || ', ';
END IF;

-- Calculate the destination filename
regexbuffer := REGEXP_REPLACE (ctrl_file_rec.name, source, destination);
sqlstatement :=
sqlstatement
|| ''''
|| regexbuffer
|| '''';

insert into weg_temp values (3 ,loopcount+1,'mv ' || ctrl_file_rec.name || ' ' || regexbuffer);
loopcount := loopcount + 1;

END LOOP;

CLOSE ctrl_file_cur;
-- Finish out the sqlstatement
sqlstatement := sqlstatement || ' SCOPE=SPFILE;';
-- Put generated sql statement into the table.
insert into weg_temp values (1,1,sqlstatement);

OPEN redo_cur;

-- Reset loopcount back at 0
loopcount := 0;
LOOP
FETCH redo_cur INTO curholder;

EXIT WHEN redo_cur%NOTFOUND;
regexbuffer := REGEXP_REPLACE (curholder, source, destination);

insert into weg_temp values (4 ,loopcount+1,'mv '|| curholder|| ' '|| regexbuffer);

insert into weg_temp values (8 ,loopcount+1, 'ALTER DATABASE RENAME FILE ''' || curholder || ''' TO ''' || regexbuffer || ''';');

loopcount := loopcount + 1;

END LOOP;

CLOSE redo_cur;

OPEN datafile_cur;

-- Reset loopcount back at 0
loopcount := 0;

LOOP
FETCH datafile_cur INTO curholder;

EXIT WHEN datafile_cur%NOTFOUND;

regexbuffer := REGEXP_REPLACE (curholder, source, destination);
insert into weg_temp values (5 ,loopcount+1,'mv '|| curholder|| ' '|| regexbuffer);

insert into weg_temp values (9 ,loopcount+1, 'ALTER DATABASE RENAME FILE ''' || curholder || ''' TO ''' || regexbuffer || ''';');

loopcount := loopcount + 1;

END LOOP;

CLOSE datafile_cur;


OPEN tempfile_cur;
-- Reset loopcount back at 0
loopcount := 0;

LOOP
FETCH tempfile_cur INTO curholder;

EXIT WHEN tempfile_cur%NOTFOUND;

regexbuffer := REGEXP_REPLACE (curholder, source, destination);
insert into weg_temp values (6 ,loopcount+1,'mv '|| curholder|| ' '|| regexbuffer);

insert into weg_temp values (10 ,loopcount+1, 'ALTER DATABASE RENAME FILE ''' || curholder || ''' TO ''' || regexbuffer || ''';');
loopcount := loopcount + 1;

END LOOP;

CLOSE tempfile_cur;

commit;
END;
/

spool off;

spool &&SID._script.txt

DECLARE
CURSOR weg_temp_cur
IS
SELECT statement
FROM weg_temp
ORDER BY step, seq;

wtr weg_temp_cur%ROWTYPE;
BEGIN
OPEN weg_temp_cur;

LOOP
FETCH weg_temp_cur INTO wtr;

EXIT WHEN weg_temp_cur%NOTFOUND;
DBMS_OUTPUT.put_line (wtr.statement);
END LOOP;

CLOSE weg_temp_cur;
END;
/

spool off;

drop table weg_temp;
exit;

Thursday, February 3, 2011

Grid control 11g... and unable to patch using deployment procedures

Recently upgraded grid control from 10.2.0.5 to 11g, and now unable to apply patches using the standard deployment procedure.

The job bombs out at the initialize step.

Here's the error
Downloading the patches from My Oracle Support.
Error while evaluating expression:[Error evaluating expression: ${doInitialization(step.guid)} : [no oraInstaller in java.library.path]]


Found note 427577.1, installed patches 9495798, 9705138, and 10388644. Still not working.

Couldn't find any additional info, so opened a ticket with Oracle support...

Resolution log:
1) Reran OPatch Update job and Refresh from Metalink, restarted job, still broke
2) Deleted the job, restarted from scratch again and noticed a warning during the "Search and select software updates" screen -- the component already exists in the software library.
So I chose not to overwrite the version in the software library, and instead of failing at the initialize stage, now the job fails a few steps down at the "Stage Patches" step.

Here's that error:

Received null input stream of binary file, null', for the item identified by, 'Components/Oracle Software Updates/p10248523_11.2.0.2.0_46(Revision: 0.1)'. See log file '$ORACLE_HOME/sysman/log/emoms.log on Oracle Management Service host for more details.


At this point my guess is there's something wrong with the software library not getting upgraded correctly from 10g to 11g. We'll have to see what Oracle support recommends...

--- Update 2/8
Oracle support eventually concurred it's likely a problem with the software library. "Trying fix described in Oracle Internal Support Portal: Document 602886.1", which looks to be a pretty straightforward method to rebuild the software library.

Making a cold backup now, will post more info later.
--- Update 2/17

Rebuilding the software library was successful, but did not resolve our issues with patching. Oracle support suggests that either we used the wrong JDK version (we used 1.6.0_24 rather than the required 1.6.0_18), or I did not use the required 64 bit flag when installing the weblogic server.

So -- we're reinstalling grid control in a new location on the same server to try and see if that resolves our patching issues, and if it does migrate all our targets to it.

--- Update 2/17
Found someone else who experienced a similar error to the one above who also had to reinstall.

Seems this is the correct way to start the weblogic installer for grid (where JAVA_HOME is set to your 1.6.0_18 JDK):
$JAVA_HOME/bin/java -d64 -jar wls1032_generic.jar


As for install guides, I think I would recommend this one from CamOra ICT.

Tuesday, February 1, 2011

x forwarding and sudo for oracle installs

From my linux desktop I ssh to my oracle box and forward X packets back to my desktop over ssh.

bill@wgwsvm:~$ ssh -X 192.168.56.101


When I invoke xclock this works like you would expect.

If I wanted to forward xpackets from other users that I might ssh to I have to do two additional steps. Obviously set the DISPLAY and additionally set the x auth token to forward through the connection.

bill@wgdbvm:/home/bill> xauth list
wgdbvm/unix:10 MIT-MAGIC-COOKIE-1 4696e4a4a52fac8af8b85e42195baa84
bill@wgdbvm:/home/bill> echo $DISPLAY
localhost:10.0
bill@wgdbvm:/home/bill> sudo su - oracle
[sudo] password for bill:
oracle@wgdbvm:/home/oracle> xauth add wgdbvm/unix:10 MIT-MAGIC-COOKIE-1 4696e4a4a52fac8af8b85e42195baa84
oracle@wgdbvm:/home/oracle> export DISPLAY=localhost:10.0




One note, when disconnecting from my bill account the xauth file is cleaned up automatically for me. But oracle's xauth will still contain the token, so it needs to be deleted manually.


oracle@wgdbvm:/home/oracle> xauth list
wgdbvm/unix:10 MIT-MAGIC-COOKIE-1 4696e4a4a52fac8af8b85e42195baa84
oracle@wgdbvm:/home/oracle> xauth remove wgdbvm/unix:10