Thursday, June 30, 2011

Fun with ancient clients

User was reporting an error with ODBC after a recent upgrade to 11g:


Unable to connect
SQLState=28000
[Oracle][ODBC][Ora]ORA-01017: invalid username/password; login denied


The user reported, and the tech verified that on this machine they could login with sqlplus but not ODBC on this PC. Multiple accounts were tried.

Interesting test case illuminated the issue: Created user with ALL UPPERCASE PASSWORD. Using this, could connect through ODBC.

The ancient 9i ODBC connection will not send a case sensitive password.

Made the following change, problem solved, user is happy though we probably could use this as an opportunity to force upgrading ODBC clients:


alter system set sec_case_sensitive_logon=FALSE scope=both

Empty variables can be troublesome

Upgrading a 10.2.0.4 ERP test database to 11.2.0.2 today, and shortly after starting the upgrade via DBUA, got the following error:

ORA-16024: parameter LOG_ARCHIVE_DEST_1 cannot be parsed


I had chosen to disable archiving as one of the options for DBUA, but was really surprised to see this error. After a little bit of research I found Metalink Doc 96278.1.

Turns out this error was happening because we had log_archive_dest along with log_archive_dest_1 set -- the kicker was it was set to be blank. Here's the relevant info from the pfile:


*.log_archive_dest=''
*.log_archive_dest_1='LOCATION=/u01/app/oracle/admin//arch'


So the trick was to completely remove the entire *.log_archive_dest='' line from the pfile and DBUA was able to proceed with the upgrade without the ORA-16024 error.

Tuesday, June 21, 2011

Death by a thousand trace files


Found a large number of trace files generated in the /udump admin directory.

Each file looked like the following:
$ cat orcl_ora_12191.trc
/u01/app/oracle/admin/ORCL/udump/orcl_ora_12191.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1
System name: SunOS
Node name: hostname
Release: 5.10
Version: Generic_144488-14
Machine: sun4u
Instance name: ORCL
Redo thread mounted by this instance: 1
Oracle process number: 389
Unix process pid: 12191, image: oracle@hostname

opiino: Attach failed! error=-1 ifvp=0
Not much help on support.oracle.com or google in diagnosing this issue.

Oracle support was helpful in doing some tracing which proved that these were failures to connect rather than existing connections that were lost.

With that information in hand, we correlated the timestamps of the trace files to timestamps of listener connections. This narrowed our issue down to a specific IP address, and by that a specific application.

Our lead application administrator was able to determine the root cause of the failed connections, an outdated library used in development of the application:
From what I can tell CoreLab Oracle is now dotConnect for Oracle from Devart. They resolved this issue with version 5.70.140.

Whew.

UDM for Banner Jobsub and other subsystems

Over the weekend we had a SungardHE Banner subsystem offline. This particular subsystem uses Oracle Pipes to manage user job submission. The downtime could have been prevented if this subsystem had been represented as a target within Grid Control.

So, to prevent future outages, we're going to make this process a User Defined Metric, and have grid control fire off a notification if the service is offline. At the end of every maintenance window, if Grid Control says all systems are online, then my team can have more confidence that all systems are actually online.

Found a great article by Sagar Patil titled Oracle Grid Tracking OS Process Using a Custom User Defined Matrix udmshell script, which is the basis for what follows.

I created the following script that determines if ARG1 has ARG2 processes running and returns either Online or Offline:

#!/usr/local/bin/bash

if [ $# -ne 2 ]
then
echo "Usage: `basename $0` {ORACLE_SID} {Expected Count}"
exit 1
fi
toCheck=$1
testEm=`/bin/ps -ef | /usr/local/bin/grep -v grep | /usr/local/bin/grep -v $0 |
/usr/local/bin/grep $toCheck | /usr/local/bin/wc -l`

if [ $testEm -eq $2 ]
then
echo "em_result=Online"
else
echo "em_result=Offline"
fi
Next I create the UDM.
From the target Host screen, click on "User-Defined Metrics", then click create.

Here's the parameters I used:
Metric Name: GURJOBS DEVL
Metric Type: String
Command Line: /banner/u01/app/oracle/product/agent11g/sysman/emd/custom/checkSubsystems.sh gurjobDEVL 1
User Name: Oracle
Comparison Operator: Match
Critical: Offline

The rest I left as default. After hitting OK, it took about 15 minutes to show that gurjobDEVL was in fact online. Shutting down gurjobs in DEVL returned an error in the UDM screen but did not send a notification.

To enable notifications, I had to do the following:
Grid Control User Preferences -> Rules -> Select "Host Availability and Critical States", then click Edit. In the Metrics tab, click Add, enter "User" in the search and click Go.

To enable the metric, check the box next to User Defined Metric, and then choose "Critical" and "Clear" severity states, then click continue, and then OK.

Whoever is subscribed to that notification rule will be sent a notification per their preferences.

In this simple case, knowing that Banner Jobsub is offline is great. Another example would be with CAPP Pipes, the next step is to build in automatic restart scripts. That will have to wait for another blog post.