Quantcast
Channel: Ludovico – DBA survival BLOG
Viewing all 119 articles
Browse latest View live

BP and Patch 22652097: set optimizer_adaptive_statistics to FALSE explicitly or it might not work!

$
0
0

According to Nigel’s blog post:

The Oracle 12.1.0.2 October 2017 BP and the Adaptive Optimizer

if you installled the patch 22652097 prior to apply the Bundle Patch 171018, the BP apply in the database should recognize that the patch was already in place and keep it activated. This is done through the fix control 26664361.

When fix_control 26664361:0 -> Patch 22652097 is not enabled: the parameter optimizer_adaptive_features (OAF) works

When fix_control 26664361:1 -> Patch 22652097 is enabled; optimizer_adaptive_features is discarded and the two new parameters have the priority: optimizer_adaptive_plans (OAP) and optimizer_adaptive_statistics (OAS).

But at my customer, I had another behavior.

My patching story might be very similar to yours!

When I started upgrading my customer’s database to 12c in early 2015, I experienced very soon the infamous problems with SQL Plan Directives (SPD) and Adaptive Dynamic Sampling (ADS) that I described in my paper: ADAPTIVE FEATURES OR: HOW I LEARNED TO STOP WORRYING AND TROUBLESHOOT THE BOMB .

Early fixes

When I was new to the problem, the quick fix for the problematic applications was to set OAF to FALSE.

Later, I discovered some more details and decided to opt for setting:

_optimizer_dsdir_usage_control=0

In other cases, I disabled the specific directives that were causing problems.

But many databases did not have so many problems, and I left the defaults.

Patch 22652097 on top of BP170718 

At some point, me and my customer decided to apply the fix 22652097, on top of BP170718 that was our current patch level at that time.

The patch installation on a test database was complaining about the optimizer_adaptive_feature set: this parameter was not used anymore. This issue is nicely explained by Flora in her post Patch 22652097 in 12.1 makes optimizer_adaptive_features parameter obsolete.

In order to apply that patch on the remaining databases, we did:

  • alter system reset optimizer_adaptive_features;
  • alter system reset “_optimizer_dsdir_usage_control”;
  • Applied the patch on binaries and datapatch on the databases.

The result at this point was that:

  • optimizer_adaptive_features was not set
  • optimizer_adaptive_plans was set to true
  • optimizer_adaptive_statistics was set to false.

It might seems superflous to say, but it’s not, the SQL Plan Directives were not used anymore: no Adaptice Dynamic Sampling and no performance problems.

Bundle Patch 180116

Three weeks ago, we installled the last Bundle Patch in order to fix some Grid Infrastructure problems, and the BP, as described in Nigel’s note (and Mike Dietrich and many other bloggers :-)) contains the patch 22652097.

According to Nigel’s post, the patch installation should have detected that the patch 22652097 was already there and activate it.

And indeed, after we applied the BP, the fix_control 26664361 was set to 1 (that means that the patch 22652097 is enabled). So we went live with this setup without additional checks.

One week later, we started experiencing performance problems again. I noticed immediately that the Adaptive Dynamic Sampling was very aggressive again, and the SQL Plan Directives used again.

But the fix was there AND ENABLED!

After a few tests, I realized that the SPD is not used anymore if I set optimizer_adaptive_statistics EXPLICITLY to false.

optimizer_adaptive_statistics must be set explicitly, the default does not work

And here’s the proof:

I use once again the great SPD example by Tim Hall (sorry Tim, it’s not the first time that I steal your work 🙂 ) . You can find here:

SQL Plan Directives in Oracle Database 12c Release 1 (12.1)

After applying the BP, I have the default parameter, not set explicitly, and the fix_control enabled:

SQL> select value from v$system_fix_control where bugno = 26664361;

     VALUE
----------
         1

SQL> select name, value, isdefault, ismodified from v$parameter where name='optimizer_adaptive_statistics';  
  
NAME                                    VALUE                          ISDEFAULT ISMODIFIED  
---------------------------------------- ------------------------------ --------- ----------------------------------------  
optimizer_adaptive_statistics            FALSE                          TRUE      FALSE

If I run the test statement (again, find it here https://oracle-base.com/articles/12c/sql-plan-directives-12cr1) the directives are used:
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */  
      *  
  2  FROM  tab1  
WHERE  gender = 'M'  
AND    has_y_chromosome = 'Y';  
  
SET LINESIZE 200 PAGESIZE 100  
  
...  
  
10 rows selected.  
  
SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(format => 'allstats last'));  
  
PLAN_TABLE_OUTPUT  
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
SQL_ID  5t8y8p5mpb99j, child number 0  
-------------------------------------  
SELECT /*+ GATHER_PLAN_STATISTICS */        * FROM  tab1 WHERE  gender  
= 'M' AND    has_y_chromosome = 'Y'  
  
Plan hash value: 1552452781  
  
-----------------------------------------------------------------------------------------------------------------  
| Id  | Operation                          | Name            | Starts | E-Rows | A-Rows |  A-Time  | Buffers |  
-----------------------------------------------------------------------------------------------------------------  
|  0 | SELECT STATEMENT                    |                |      1 |        |    10 |00:00:00.01 |      4 |  
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TAB1            |      1 |    10 |    10 |00:00:00.01 |      4 |  
|*  2 |  INDEX RANGE SCAN                  | TAB1_GENDER_IDX |      1 |    10 |    10 |00:00:00.01 |      2 |  
-----------------------------------------------------------------------------------------------------------------  
  
Predicate Information (identified by operation id):  
---------------------------------------------------  
  
  1 - filter("HAS_Y_CHROMOSOME"='Y')  
  2 - access("GENDER"='M')  
  
Note  
-----  
  - dynamic statistics used: dynamic sampling (level=2)  
  - 2 Sql Plan Directives used for this statement  
      
      
    26 rows selected.

but then I set the parameter explicitly:
SQL> alter system flush shared_pool;  
  
System altered.  
  
SQL> alter system set optimizer_adaptive_statistics=false;  
  
System altered.  
  
SQL> select name, value, isdefault, ismodified from v$parameter where name='optimizer_adaptive_statistics';  
  
NAME                                     VALUE                          ISDEFAULT ISMODIFIED  
---------------------------------------- ------------------------------ --------- ----------------------------------------  
optimizer_adaptive_statistics            FALSE                          TRUE      MODIFIED

and the SPD usage (and consequently, ADS), are gone:
SQL> SELECT /*+ GATHER_PLAN_STATISTICS */  
       *  
FROM   tab1  
WHERE  gender = 'M'  
AND    has_y_chromosome = 'Y';  
  
SET LINESIZE 200 PAGESIZE 100  
  
        ID G H  
---------- - -  
         1 M Y  
         2 M Y  
         3 M Y  
         4 M Y  
         5 M Y  
         6 M Y  
         7 M Y  
         8 M Y  
         9 M Y  
        10 M Y  
  
10 rows selected.  
  
SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(format => 'allstats last'));  
  
PLAN_TABLE_OUTPUT  
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
SQL_ID  5t8y8p5mpb99j, child number 0  
-------------------------------------  
SELECT /*+ GATHER_PLAN_STATISTICS */        * FROM   tab1 WHERE  gender  
= 'M' AND    has_y_chromosome = 'Y'  
  
Plan hash value: 1552452781  
  
-----------------------------------------------------------------------------------------------------------------  
| Id  | Operation                           | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  
-----------------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT                    |                 |      1 |        |     10 |00:00:00.01 |       4 |  
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TAB1            |      1 |     25 |     10 |00:00:00.01 |       4 |  
|*  2 |   INDEX RANGE SCAN                  | TAB1_GENDER_IDX |      1 |     50 |     10 |00:00:00.01 |       2 |  
-----------------------------------------------------------------------------------------------------------------  
  
Predicate Information (identified by operation id):  
---------------------------------------------------  
  
   1 - filter("HAS_Y_CHROMOSOME"='Y')  
   2 - access("GENDER"='M')  
      
      
    21 rows selected.

Conclusion

Set the parameter EXPLICITLY when you apply the BP that contains the fix.

And ALWAYS test the behavior!

You can check how many statements use the dynamic sampling by following this short blog post by Dominic Brooks:

Which of my sql statements are using dynamic sampling?

HTH


Basic Vagrantfile for multiple groups of VMs

$
0
0

In case you want to prepare multiple sets of machines quickly using Vagrant, ready for different setups, this might be something for you:

## -*- mode: ruby -*-
## vi: set ft=ruby :

require 'ipaddr'

###############################
# CUSTOM CONFIGURATION START
###############################

# lab_name is the name of the lab where all the files will be organized.
lab_name = "lab_bigdata"

# here is where you download your software, so it will be available to the VMs.
sw_path  = "C:\\Users\\ludov\\Downloads\\Software"

# cluster(s) definition
clusters = [
  {
  :prefix  => "hadoop", 				# prefix: VMs will be named prefix01, prefix02, etc
  :domain  => "ludovicocaldara.net",	# domain name
  :box     => "ludodba/ol7.3-base",		# base box, either "ludodba/ol7.3-base" or "ludodba/ubu1604"
  :nodes   => 3,						# number of nodes for this cluster
  :cpu     => 1,
  :mem     => 2048,
  :publan  => IPAddr.new("192.168.56.0/24"), 	# public lan for the cluster
  :publan_start => 121							# starting IP, each VM will increment it by one
  },
  {
  :prefix  => "kafka",							# eventually, continue with another cluster!
  :domain  => "ludovicocaldara.net",
  :box     => "ludodba/ol7.3-base",
  :nodes   => 1,
  :cpu     => 1,
  :mem     => 2048,
  :publan  => IPAddr.new("192.168.56.0/24"),
  :publan_start => 131
  },
  {
  :prefix  => "postgres",
  :domain  => "ludovicocaldara.net",
  :box     => "ludodba/ubu1604",
  :nodes   => 1,
  :cpu     => 1,
  :mem     => 2048,
  :publan  => IPAddr.new("192.168.56.0/24"),
  :publan_start => 141
  }
]

###############################
# CUSTOM CONFIGURATION END
###############################

######################################################
# Extending Class IPAddr to add the CIDR to the lan
class IPAddr
  def to_cidr_s
    if @addr
      mask = @mask_addr.to_s(2).count('1')
      "#{to_s}/#{mask}"
    else
      nil
    end
  end
end # extend class IPAddr

########
# MAIN #
########

Vagrant.configure(2) do |config|
  config.ssh.username = "root"  	# my boxes are password based for simplicity
  config.ssh.password = "vagrant"
  config.vm.graceful_halt_timeout = 360	# in case you install grid infra... do not force shutdown after a few seconds

  if File.directory?(sw_path)
    # our shared folder for oracle 12c installation files (uid 54320 is grid, uid 54321 is oracle)
    config.vm.synced_folder sw_path, "/media/sw", :mount_options => ["dmode=775","fmode=775","uid=54322","gid=54328"]
  end

  # looping through each cluster
  (0..(clusters.length-1)).each do |cluid|

    # assign variable clu to current cluster, for convenience
    clu = clusters[cluid]
      
    # looping through each node in the cluster
    (1..(clu[:nodes])).each do |nid|

      # let's start from the last node (see RAC Attack automation for the reason) :-)
      nid = clu[:nodes]+1-nid
      config.vm.define vm_name = "#{clu[:prefix]}%02d" % nid do |cnf|
	  
		# set the right box for the VM
		cnf.vm.box = clu[:box]
		if (clu[:box_version]) then
			cnf.vm.box_version = clu[:box_version]
		end #if
		
		# the new vm name
        vm_name = "#{clu[:prefix]}%02d" % nid
        fqdn = "#{vm_name}.#{clu[:domain]}"
        cnf.vm.hostname = "#{fqdn}"

		# incrementing public ip for the cluster
        pubip = clu[:publan].|(clu[:publan_start]+nid-1).to_s

        cnf.vm.provider :virtualbox do |vb|
          #vb.linked_clone = true  # in case you want thin provisioning. read the vagrant doc before setting it
          vb.name = vm_name
          vb.gui = false
          vb.customize ["modifyvm", :id, "--memory", clu[:mem]]
          vb.customize ["modifyvm", :id, "--cpus",   clu[:cpu]]
          vb.customize ["modifyvm", :id, "--groups", "/#{lab_name}/#{clu[:prefix]}"]
        end #config.vm.provider
		
        # Configuring virtualbox network for #{pubip}
        cnf.vm.network :private_network, ip: pubip

      end #config.vm.define
    end #loop nodes
  end  #loop clusters
end #Vagrant.configure

The nice thing, (beside speeding up the creation and basic configuration) is the organization of the directories. The configuration at the beginning of the script will result in 5 virtual machines:

your VM directory
        |- lab_bigdata 
                |- hadoop
                        |- hadoop01  (ol7)
                        |- hadoop02  (ol7)
                        |- hadoop03  (ol7)
                |- kafka
                        |- kafka01   (ol7)
                |- postgres
                        |- postgres01  (ubuntu 16.04)

It is based, in part (but modified and simplified a lot), from  the RAC Attack automation scripts by Alvaro Miranda.

I have a more complex version that automates all the tasks for a full multi-cluster RAC environment, but if this is your requirement, I would rather check oravirt scripts on github (https://github.com/oravirt) . They are much more powerful and complete (and complex…) than my Vagrantfile. 🙂

Cheers

DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL not working on 12c? Here’s why…

$
0
0

It is bad to realize, after a few years, that my customer’s Audit Cleanup procedures are not working properly for every database…

NOTE: The post is based on standard audit, not unified audit.

My customer developed a quite nice procedure for database housekeeping (including diag dest, OS audit trail, recyclebin, DB audit…)

But after some performance problems, I have come across the infamous sql_id 4ztz048yfq32s:

SELECT TO_CHAR(current_timestamp AT TIME ZONE 'GMT', 'YYYY-MM-DD HH24:MI:SS TZD') AS curr_timestamp, COUNT(username) AS failed_count, TO_CHAR(MIN(timestamp), 'yyyy-mm-dd hh24:mi:ss') AS first_occur_time, TO_CHAR(MAX(timestamp), 'yyyy-mm-dd hh24:mi:ss') AS last_occur_time
FROM sys.dba_audit_session
WHERE returncode != 0 AND timestamp >= current_timestamp - TO_DSINTERVAL('0 0:30:00')

This SQL comes from the “Failed Logon Attempts” metric in Enterprise Manager.

I’ve checked the specific database, and the table SYS.AUD$ was containing way too many rows, dating before our purge time:

SQL> select min(timestamp) from dba_audit_session;

MIN(TIMESTAMP)
-------------------
04.02.2017 07:01:20

SQL>  select dbid, count(*) from aud$ group by dbid;

      DBID   COUNT(*)
---------- ----------
2416611527   35846477

The cleanup procedure does basically this:

SQL> begin
  2  dbms_audit_mgmt.set_last_archive_timestamp(audit_trail_type  => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD
  3                          ,last_archive_time => SYSTIMESTAMP-31);
  4  end;
  5  /

PL/SQL procedure successfully completed.

SQL> set timing on
SQL> begin
  2  dbms_audit_mgmt.clean_audit_trail(
  3    audit_trail_type => sys.dbms_audit_mgmt.AUDIT_TRAIL_AUD_STD,
  4    use_last_arch_timestamp => TRUE);
  5  end;
  6  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:38.34

But despite a retention window of 31 days, the rows are still there:

SQL> select min(timestamp) from dba_audit_session;

MIN(TIMESTAMP)
-------------------
04.02.2017 07:01:20

Elapsed: 00:00:29.06

(today is 27.04.2018, so the oldest records are more than 1 year old)

I’ve checked with ASH, the actual delete statement executed by the clean_audit_trail procedure is:

DELETE FROM SYS.AUD$ WHERE DBID = 2416611527 AND NTIMESTAMP# < to_timestamp('2017-02-04 05:01:10', 'YYYY-MM-DD HH24:MI:SS.FF') AND ROWNUM <= 140724603463440

So, the DBID clause is OK, but the NTIMESTAMP# clause is  not!

Why?

Long story long (hint, it’s a bug, not filed yet):

The cleanup metadata is stored into the view DBA_AUDIT_MGMT_LAST_ARCH_TS. Its structure in 11g was:

SQL> desc dba_audit_mgmt_last_arch_ts
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 AUDIT_TRAIL                                        VARCHAR2(20)
 RAC_INSTANCE                              NOT NULL NUMBER
 LAST_ARCHIVE_TS                                    TIMESTAMP(6) WITH TIME ZONE

But in 12c, there are 2 new columns:

SQL> desc dba_audit_mgmt_last_arch_ts
 Name                                  Null?    Type
 ------------------------------------- -------- ----------------------------
 AUDIT_TRAIL                                    VARCHAR2(20)
 RAC_INSTANCE                          NOT NULL NUMBER
 LAST_ARCHIVE_TS                                TIMESTAMP(6) WITH TIME ZONE
 DATABASE_ID                           NOT NULL NUMBER
 CONTAINER_GUID                        NOT NULL VARCHAR2(33)

When the database is upgraded from 11g to 12c, the two new columns are set to “0” by default.

SQL> select * from dba_audit_mgmt_last_arch_ts;

AUDIT_TRAIL                 RAC_INSTANCE LAST_ARCHIVE_TS                      DATABASE_ID CONTAINER_GUID
--------------------------- ------------ ------------------------------------ ----------- --------------------------------
STANDARD AUDIT TRAIL                   0 04-FEB-17 05.01.10.000000 AM +00:00            0 00000000000000000000000000000000
OS AUDIT TRAIL                         1 04-FEB-17 05.01.15.000000 AM +02:00            0 00000000000000000000000000000000

But when the procedure DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP is executed, the actual dbid is used, and new lines appear:

SQL> select * from dba_audit_mgmt_last_arch_ts;

AUDIT_TRAIL                 RAC_INSTANCE LAST_ARCHIVE_TS                      DATABASE_ID CONTAINER_GUID
--------------------------- ------------ ------------------------------------ ----------- --------------------------------
STANDARD AUDIT TRAIL                   0 04-FEB-17 05.01.10.000000 AM +00:00            0 00000000000000000000000000000000
OS AUDIT TRAIL                         1 04-FEB-17 05.01.15.000000 AM +02:00            0 00000000000000000000000000000000
STANDARD AUDIT TRAIL                   0 27-MAR-18 12.29.55.000000 PM +00:00   2416611527 4A2962517EF2316FE0532296780AE383
OS AUDIT TRAIL                         1 27-MAR-18 12.20.06.000000 PM +02:00   2416611527 4A2962517EF2316FE0532296780AE383

It is clear now that the DELETE statement is not constructed properly. It should get the LAST_ARCHIVE_TS of the actual DBID being purged… but it takes the other one.

According to my tests, it does not use neither the correct timestamp for the dbid, nor get the oldest timestamp: it uses instead the timestamp of the first record found by the clause “WHERE AUDIT_TRAIL=’STANDARD AUDIT TRAIL'”. It depends on the physical location of the row in the table! Clearly a big mess… (PS, not sure 100%, but this is what I suppose)

So, I have tried to modify the archive time for DBID 0:

SQL> begin
  2  dbms_audit_mgmt.set_last_archive_timestamp(audit_trail_type  => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD
  3                          ,last_archive_time => SYSTIMESTAMP-31
  4                          ,database_id => 0
  5                          ,container_guid => '00000000000000000000000000000000');
  6  end;
  7
  8  /

PL/SQL procedure successfully completed.

SQL> select database_id, audit_trail, last_archive_ts from dba_audit_mgmt_last_arch_ts;

DATABASE_ID AUDIT_TRAIL                   LAST_ARCHIVE_TS
----------- ----------------------------- ----------------------------------------
          0 STANDARD AUDIT TRAIL          27-MAR-18 12.37.22.000000 PM +00:00
          0 OS AUDIT TRAIL                04-FEB-17 05.01.15.000000 AM +02:00
 2416611527 STANDARD AUDIT TRAIL          27-MAR-18 12.29.55.000000 PM +00:00
 2416611527 OS AUDIT TRAIL                27-MAR-18 12.20.06.000000 PM +02:00

Trying to execute the cleanup again, now leads to a better timestamp:

DELETE FROM SYS.AUD$ WHERE DBID = 2416611527 AND NTIMESTAMP# < to_timestamp('2018-03-27 12:37:22', 'YYYY-MM-DD HH24:MI:SS.FF') AND ROWNUM <= 140724603463440

I have then tried to play a little bit with the DBA_AUDIT_MGMT_LAST_ARCH_TS view (and the underlying table DAM_LAST_ARCH_TS$).

First, I’ve faked the DBID:

SQL> update dba_audit_mgmt_last_arch_ts set database_id=2416611526 where database_id=0;

2 rows updated.

SQL> commit;

Commit complete.
SQL> select database_id, audit_trail, last_archive_ts from DBA_AUDIT_MGMT_LAST_ARCH_TS;

DATABASE_ID AUDIT_TRAIL                                                  LAST_ARCHIVE_TS
----------- ------------------------------------------------------------ ---------------------------------------------------------------------------
 2416611526 STANDARD AUDIT TRAIL                                         27-MAR-18 12.37.22.000000 PM +00:00
 2416611526 OS AUDIT TRAIL                                               04-FEB-17 05.01.15.000000 AM +02:00
 2416611527 STANDARD AUDIT TRAIL                                         27-MAR-18 12.29.55.000000 PM +00:00
 2416611527 OS AUDIT TRAIL                                               27-MAR-18 12.20.06.000000 PM +02:00

Then, I have tried to increase the retention timestamp (500 days):

SQL> begin
  2  dbms_audit_mgmt.set_last_archive_timestamp(audit_trail_type  => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD
  3                          ,last_archive_time => SYSTIMESTAMP-500
  4                          ,database_id => 2416611526
  5                          ,container_guid => '00000000000000000000000000000000');
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL> select database_id, audit_trail, last_archive_ts from dba_audit_mgmt_last_arch_ts;

DATABASE_ID AUDIT_TRAIL                                                  LAST_ARCHIVE_TS
----------- ------------------------------------------------------------ ---------------------------------------------------------------------------
 2416611526 STANDARD AUDIT TRAIL                                         13-DEC-16 12.48.23.000000 PM +00:00
 2416611526 OS AUDIT TRAIL                                               04-FEB-17 05.01.15.000000 AM +02:00
 2416611527 STANDARD AUDIT TRAIL                                         27-MAR-18 12.29.55.000000 PM +00:00
 2416611527 OS AUDIT TRAIL                                               27-MAR-18 12.20.06.000000 PM +02:00

Finally, I have tried to purge the audit trail with both DBIDs:

SQL> begin
  2  dbms_audit_mgmt.clean_audit_trail(
  3    audit_trail_type => sys.dbms_audit_mgmt.AUDIT_TRAIL_AUD_STD,
  4    database_id =>   2416611526,
  5    use_last_arch_timestamp => TRUE);
  6  end;
  7  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:45.89

SQL> begin
  2   dbms_audit_mgmt.clean_audit_trail(
  3    audit_trail_type => sys.dbms_audit_mgmt.AUDIT_TRAIL_AUD_STD,
  4    database_id =>   2416611527,
  5     use_last_arch_timestamp => TRUE);
  6  end
  7  ;
  8  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:34.72

As I expected, in both cases the the cleanup generated the delete with the timestamp of the fake DBID:

-- clean audit trail for dbid 2416611526 
DELETE FROM SYS.AUD$ WHERE DBID = 2416611526 AND NTIMESTAMP# < to_timestamp('2016-12-13 12:48:23', 'YYYY-MM-DD HH24:MI:SS.FF') AND ROWNUM <= 140724603463440

-- clean audit trail for dbid 2416611527
DELETE FROM SYS.AUD$ WHERE DBID = 2416611527 AND NTIMESTAMP# < to_timestamp('2016-12-13 12:48:23', 'YYYY-MM-DD HH24:MI:SS.FF') AND ROWNUM <= 140724603463440

Is it possible to delete the unwanted records from the view DBA_AUDIT_MGMT_LAST_ARCH_TS?

Not only is possible, but I recommend it:

SQL> delete from dba_audit_mgmt_last_arch_ts where database_id=2416611526;

2 rows deleted.

SQL> commit;

Commit complete.

SQL>

Afterwards, the timestamp in the where condition is correct and remains correct after subsequent executions of DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP.

Conclusions, IMPORTANT FOR THE DATABASE OPERATIONS:

The upgrade causes the unwanted lines with DBID=0 in the DBA_AUDIT_MGMT_LAST_ARCH_TS view.

Moreover, any duplicate changes the DBID: any subsequent execution of DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP in the duplicated database will lead to additional lines in the view.

This is what I plan to do now:

  • Whenever I upgrade from 11g to 12c, cleanup the data from DBA_AUDIT_MGMT_LAST_ARCH_TS and schedule the cleanup for DBID 0 as well
  • Whenever I duplicate a database, I execute a DELETE (without clauses) from DBA_AUDIT_MGMT_LAST_ARCH_T and a truncate of the table SYS.AUD$ (it is a duplicate, after all!)

HTH

Oracle Home Management – part 1: “Patch soon, patch often” vs. reality

$
0
0

With this post, I am starting a new blog series about Oracle Database home management, provisioning, patching… Best (and worst) practices, common practices and blueprints from my point of view as consultant and, sometimes, as operational DBA.

I hope to find the time to continue (and finish) it 🙂

How often should you upgrade/patch?

Database patching and upgrading is not an easy task, but it is really important.

Many companies do not have a clear patching strategy, for several reasons.

  • Patching is time consuming
  • It is complex
  • It introduces some risks
  • It is not always really necesary
  • It leads to human errors

Oracle, of course, recommends to apply the patches quarterly, as soon as they are released. But the reality is that it is (still) very common to find customers that do not apply patches regularly.

Look at this:

$ opatch lspatches
26925218;OCW Patch Set Update : 12.1.0.2.180116 (26925218)
26925263;Database Bundle Patch : 12.1.0.2.180116 (26925263)
22243983;

OPatch succeeded.

$ cd $ORACLE_HOME/inventory
$ grep -r "bug description" * |  wc -l
1883
$ grep -r "bug description" * | grep -i "wrong result" | wc -l
56

With January 2018 Bundle Patch, you can fix 1883 bugs, including 56 “wrong results” bugs! I hope I will talk more about this kind of bugs, but for now consider that if you are not patching often, you are taking serious risks, including putting at risk your data consistency.

I will not talk about bugs, upgrade procedures, new releases here… For this, I recommend to follow Mike Dietrich’s blog: Upgrade your Database – NOW!

I would like rather to talk, as the title of this blog series states, about the approaches of maintaining the Oracle Homes across your Oracle server farm.

Common worst practices in maintaining homes

Maintaining a plethora of Oracle Homes across different servers requires thoughtful planning. This is a non-exhaustive list of bad practices that I see from time to time.

  • Installing by hand every new Oracle Home
  • Applying different patch levels on Oracle Homes with the same path
  • Not tracking the installed patches
  • Having Oracle Home paths hard-coded in the operational scripts
  • Not minding about Oracle Home path naming convention
  • Not minding about Oracle Home internal names
  • Copying Oracle Homes without minding about the Central Inventory

All these worst practices lead to what I like to call “patching madness”… that monster that makes regular patching very difficult / impossible.

THIS IS A SITUATION THAT YOU NEED TO AVOID:

Server A
/u01/app/oracle/product/12.1.0            -> Home "OraHOme12C", contains clean 12.1.0.2

Server B
/u01/app/oracle/product/12.1.0.2          -> Home "OraHome1",   contains 12.1.0.2.PSU161018
/u01/app/oracle/product/12.1.0.2.BP170117 -> Home "OraHome2",   contains 12.1.0.2.BP170117

Server C
/u01/app/oracle/product/12.1.0            -> Home "OraHome1",   contains clean 12.1.0.1
/u01/app/oracle/product/12.1.0.2          -> Home "DBHome_1",   contains 12.1.0.2.BP170117

A better approach, would be starting having some naming conventions, e.g.:

Server A
/u01/app/oracle/product/12.1.0.2           -> Home "Ora12cR2",           contains clean 12.1.0.2

Server B
/u01/app/oracle/product/12.1.0.2.PSU161018 -> Home "Ora12cR2_PSU161018", contains 12.1.0.2.PSU161018
/u01/app/oracle/product/12.1.0.2.BP170117  -> Home "Ora12cR2_BP170117",  contains 12.1.0.2.BP170117

Server C
/u01/app/oracle/product/12.1.0.1           -> Home "Ora12cR1",           contains clean 12.1.0.1
/u01/app/oracle/product/12.1.0.2.BP170117  -> Home "Ora12cR2_BP170117",  contains 12.1.0.2.BP170117

In the next blog post, I will talk about common patching patterns and their pitfalls.

Blog posts in this series:

Oracle Home Management – part 1: Patch soon, patch often vs. reality
Oracle Home Management – part 2: Common patching patterns
Oracle Home Management – part 3: Strengths and limitations of Rapid Home Provisioning
Oracle Home Management – part 4: Challenges and opportunities of the New Release Model
Oracle Home Management – part 5: Oracle Home Inventory and Naming Conventions
Oracle Home Management – part 6: Simple Golden Image blueprint
Oracle Home Management – part 7: Putting all together
Oracle Home Management – Addendum: Managing and controlling the patch level (berx’s work)

Oracle Home Management – part 2: Common patching patterns

$
0
0

Let’s see some common approaches to Oracle Home patching.

First, how patches are applied

No, I will not talk about how to use opatch 🙂 It is an overview of the “high-level” methods… when you have multiple servers and (eventually) multiple databases per server.

Worst approach (big bang)

1.Stop everything

2.In-place binaries patching

3.Database patching, “big bang” mode

4.Start everything

With this approach, you have a big downtime, a maintenance window hard to get (all applications are down at the same time), no control over a single database and no easy rollback in case your binaries get compromised/corrupted by the patch apply.

in-place-patching

Another bad approach (new install and out-of-place patching)

1.Re-install binaries manually in a new path

2.Patch the new binaries

3.Stop, change OH, patch databases one by one

4.Decommission old binaries

out-of-place-patchingThis approach is much better than the previous one, but still has some pitfalls:

  • If you have many servers and environments: doing it frequently might be a challenge
  • Rollback scripts are not copied automatically: the datapatch will fail unless you copy them by hand
  • New installs introduce potential human error, unless you use unattended install with your own scripts
  • Do you like to run opatch apply all the time, after all?

Better approach (software cloning)

This approach is very close to the previous one, with the exception that the new Oracle Home is not installed from scratch, but rather cloned from an existing one. This way, the rollback scripts used by the datapatch binary will be there and there will be no errors when patching the databases.

The procedure for Oracle Home cloning is described in the Oracle Documentation, here.

Another cool thing is that you can clone Oracle Homes across different nodes, so that you might have the same patch level everywhere without repeating the tedious tasks of upgrading the opatch, patching the binaries, etc. etc.

But still, you have to identify which Oracle Home you have to clone and keep track of the latest version.

Best approach (Golden Images)

The best approach would consist in having a central repository for your software, where you store every version of your Oracle Homes, one for each patch level.

Having a central repository allows to install the software ONCE and use a “clone, patch and store it” strategy. You can, for example, use only one server to do all the patching and then distribute your software images to the different database servers.

This is the concept of Golden Images used by Rapid Home Provisioning that will be in the scope of my next blog post.

 

Second, which patches are applied

Now that we have seen some Oracle Home patching approaches, is it worth to know which patches are important in a patching strategy.

It is better that you get used to the differences between PSU/BP and RU/RUR, by reading this valuable post from Mike Dietrich:

Differences between PSU / BP and RU / RUR

I will make the assumption that in every case, the critical patches should be applied quarterly, or at least once per year, in order to fix security bugs.

The conservative approach (stability and performance over improvements)

Prior to 12.2, in order to guarantee security and stability, the best approach was to apply only PSUs each quarter.

From 12.2, the most conservative approach is to apply the latest Release Update Review on top of the oldest as possible Release Update. Confusing? Things will be clearer when I’ll write about the 18c New Release Model in a few days…

The cowboy approach (improvements over stability and performance)

Sometimes Bundle Patches and Release Updates contain cool backports from the new releases; sometimes they contain just more bug fixes than the PSUs and RURs; sometimes they fix important stuff like disabling bad transformations that lead to wrong result bugs or other annoying bugs.

Personally, I prefer to include such improvements in my patching strategy: I regularly apply RU for releases >=12.2 and BP for releases <=12.1. Don’t call me cowboy, however 🙂

The incumbent approach (or why you cannot avoid one-offs)

It does not matter your patch frequency: sometimes you hit a bug, and the only solution is either to apply the one-off patch or the workaround, if available.

If you apply the one-off patch for a specific bug, from an Oracle Home maintenance point of view, it would be better to

  • apply the same one-off everywhere (read, all your Oracle Homes with the very same release), this makes your environment homogeneous.

or

  • use a clone of the Oracle Home with the one-off as basis to apply the release update and distribute it to the other servers.

Why?

Again, it is a problem with rollback scripts, with patch conflicts and also, of number of versions to maintain:2018-05-03 16_26_38-Diaporama PowerPoint - [Présentation1]Less paths, less error-prone!

There is, however, the alternative to one-offs: implementing the workaround instead of applying the patch. Most of the time the workaround consist in disabling “something” through parameters, or worse, hidden parameters (the underscore parameters that the support says you should not set, but advise to do  all the time as workaround :-))

It might be a good idea to use the workaround instead of apply tha patch if you already know that the bug will be fixed in the next Release Update (for example), or that the workaround is so easy to implement that it is not worth to create another version of Oracle Home that will require special attention at the next quarter.

If you apply workarounds, anyway, be sure that you comment EXACTLY why, when and who, so you can decide to unset it at the next parameter review or maintenance… e.g.

alter system set "_px_groupby_pushdown"=off
  comment='Ludo, 03.05.16: W/A for bug 18499088' scope=both sid='*';

alter system set "_fix_control"='14033181:0','11843466:off','26664361:7','16732417:1','20243268:1' 
  comment='Ludo, 20.11.17: fixes of BP171017 + W/A bugs 21303294 24499054' scope=spfile sid='*';

Makes sense?

Blog posts in this series:

Oracle Home Management – part 1: Patch soon, patch often vs. reality
Oracle Home Management – part 2: Common patching patterns
Oracle Home Management – part 3: Strengths and limitations of Rapid Home Provisioning
Oracle Home Management – part 4: Challenges and opportunities of the New Release Model
Oracle Home Management – part 5: Oracle Home Inventory and Naming Conventions
Oracle Home Management – part 6: Simple Golden Image blueprint
Oracle Home Management – part 7: Putting all together
Oracle Home Management – Addendum: Managing and controlling the patch level (berx’s work)

Oracle Home Management – part 3: Strengths and limitations of Rapid Home Provisioning

$
0
0

In the previous post I mentioned that having a central repository storing the Golden Images would be the best solution for the Oracle Home provisioning.

In this context, Oracle provides Rapid Home Provisioning: a product included in Oracle Grid Infrastructure that automates home provisioning and patching of Oracle Database and Grid Infrastructure Homes, databases and also generic software.

rhp-conceptOracle Rapid Home Provisioning simplifies tremendously the software provisioning: you can use it to create golden images starting from existing installations and then deploy them locally, across different nodes, on local or remote clusters, standalone servers, etc.

Having a central store with enforced naming conventions ensures software standardization across the whole Oracle farm, and makes patching easier with less risks. Also, it allows to patch existing databases, moving them to Oracle Homes with a higher patch level, and taking care of service draining and rolling upgrades when RAC or RAC One Node deployments exist. Multiple databases can be patched in a single batch using one single rhpctl command.

I will not explain the technical details of Rapid Home Provisioning implementation operation. I already did a webinar a couple of years ago for the RAC SIG:

Burt Clouse, the RHP product manager, did a presentation as well about Rapid Home Provisioning 12c Release 2, that highlights some new features that the product was missing in the first release:

More details about the new features can be found here:

https://blogs.oracle.com/db_maintenance/whats-new-in-122-for-rapid-home-provisioning-and-maintenance

Close to be the perfect product, but…

If rapid home provisioning is so powerful, what makes it less appealing for most users?

In my opinion (read: very own personal opinion 🙂 ), there are two main factors:

First: The technology stack RHP is relying on is quite complex

Although Rapid Home Provisioning 12c Release 2 allows Oracle Home deployments on standalone servers (it was not the case with 12c Release 1), the Rapid Home Provisioning sever itself relies on Oracle Grid Infrastructure 12cR2. That means that there must be skills in the company to manage the full stack: Clusterware, ASM, ACFS, NFS, GNS, SCAN, etc. as well as the RHP Server itself.

Second: remote provisioning requires Lifecycle Management Pack (extra-cost) option licensed on all the RHP targets

If Oracle Homes are deployed on the same cluster that hosts the RHP Server, the product can be used at no extra cost. But if you have many clusters, or using standalone servers for your Oracle databases, then RHP can become pricey very quickly: the price per processor for Lifecycle Management Pack is 12’000$, plus support (pricelist April 2018). So, buying this management pack just to introduce Rapid Home Provisioning in your company might be an excessive investment.

Of course, depending on your needs, you can evaluate it and leverage its full potential and make a bigger return of investment.

Or, you might explore if it is viable to configure each cluster as Rapid Home Provisioning Server: in this case it would be free, but it will have the additional complexity layer on all your clusters.

For small companies, simple architectures and especially where Standard Edition is deployed (no Management Pack for Standard Edition!), a self-made, simpler solution might be a better choice.

In the next post, before going into the details of a hypothetical self-made implementation, I will introduce my thoughts about the New Oracle Database Release Model.

Blog posts in this series:

Oracle Home Management – part 1: Patch soon, patch often vs. reality
Oracle Home Management – part 2: Common patching patterns
Oracle Home Management – part 3: Strengths and limitations of Rapid Home Provisioning
Oracle Home Management – part 4: Challenges and opportunities of the New Release Model
Oracle Home Management – part 5: Oracle Home Inventory and Naming Conventions
Oracle Home Management – part 6: Simple Golden Image blueprint
Oracle Home Management – part 7: Putting all together
Oracle Home Management – Addendum: Managing and controlling the patch level (berx’s work)

Oracle Home Management – part 4: Challenges and Opportunities of the New Release Model

$
0
0

Starting with the upcoming next release (18c), the Oracle Database will be a yearly release. (18c, 19c, etc). New yearly releases will contain only new features ready to go, and eventually some new features for performance improvements (plus bug fixes and security fixes from the previous version.)

Quarterly, instead of Patch Set Updates (PSU) and Bundle Patches (BP), there will be the new Release Updates (RU). They will contain critical fixes, optimizer changes, minor functional enhancements, bug fixes, security fixes. The new Release Updates will be equivalent to what we have now with Bundle Patches.

The Release Updates will be released during the whole lifetime of the feature release, according to the roadmap (2 years or 5 years depending on whether the release is in Long Term Support (LTS) or not). There will be a Long Term Support release every few years. The first two will probably be Oracle 19c and Oracle 23c (I am deliberately supposing that the c will still be relevant 🙂 ).

Beside Release Updates, there will be the new Release Update Revisions (RUR), that according to what I have read until now, will be released “at least” quarterly. Release Update Revisions will contain only regression fixes for bugs introduced by RUs and new security fixes, very close to what we have now with Patch Set Updates.

Release Update Revisions will cover ONLY 6 months, after that it will be necessary to upgrade to a newer Release Update or to a newer major release. Oracle introduced this change to reduce the complexity of their release management.

This leads to a few important things:

  • There will be no more than two RURs for each RU (e.g. 18.2 will have only 18.2.1 and 18.2.2)
  • If applying a RUR, after 6 months at latest, the DBs must be patched to a greater level of RU.
  • Applying the second RUR of each RU (e.g. 18.2.2 -> 18.3.2 -> 18.4.2) is the most conservative approach whilst keeping up to date with the latest critical fixes.

On top of that, one-off patches will still exist. For more information,  please read the note Release Update Introduction and FAQ (Doc ID 2285040.1)

new-release-modelHow will the new release model impact the patching strategy?

It is clear that it will be complex to keep the same major upgrade frequency as today (I expect it to increase). There have been from 3 to 5 years between each major release so far, and switching to a yearly release is a big change.

But the numbering will be easier: 18.3.2 is much more readable/maintainable than 12.2.0.3.BP180719 and, despite it does not contain an explicit date, it keeps it easy to understand the “distance” with the latest release.

So we will have, on one side, the need to upgrade more frequently. But on the other side, the upgrades might be easier than how they are now. One thing is sure, however: we will deal with many more Oracle Homes with different patch levels.

The new release model will bring us a unique opportunity to reinvent our procedures and scripts for Oracle Home management, to achieve a standardized and automated way to solve common problems like:

  • Multiple Oracle Homes coexistence (environment, naming conventions)
  • Automated binaries setup (via golden images or other automatic provisioning)
  • Database patches
  • Database upgrades

In the next post, I will show my idea of how Oracle Homes could be managed (with either the current or the new release model), making their coexistence easier for the DBAs.

Bonus: calculating the distance between releases

For a given release YY.x.z, the distance from its first release is ( x + z -1 ) quarters.

E.g.18.3.2 will be ( 3 + 2 – 1 ) = 4 quarters after the initial release date.

Across versions, assuming that each yearly release will be released in the same quarter, the distance between versions YY1.x1.z1 and YY2.x2.z2  is:

( YY2 – YY1 ) * 4 + ( x2 + z2 ) – ( x1 + z1 ) quarters

E.g. : between 18.4.1 and 20.1.2 the distance will be:

( 20 – 18 ) * 4 + ( 1 + 2 ) – ( 4 + 1 ) = 6 quarters

Blog posts in this series:

Oracle Home Management – part 1: Patch soon, patch often vs. reality
Oracle Home Management – part 2: Common patching patterns
Oracle Home Management – part 3: Strengths and limitations of Rapid Home Provisioning
Oracle Home Management – part 4: Challenges and opportunities of the New Release Model
Oracle Home Management – part 5: Oracle Home Inventory and Naming Conventions
Oracle Home Management – part 6: Simple Golden Image blueprint
Oracle Home Management – part 7: Putting all together
Oracle Home Management – Addendum: Managing and controlling the patch level (berx’s work)

About the universe, the infinite big and the infinite small…

$
0
0

The rumors start spreading fast (despite I have tried to keep it secret :-)), so I prefer to announce it personally rather that let you know through other voices…

I will work for Trivadis until the 6th of June, then I will take three weeks of vacation before starting a new adventure.

I cannot express in words how much I loved Trivadis: the environment, the conditions, the incredibly knowledgeable techies and super friends that I met there.

Nowhere I felt so part of a family as I have done in Trivadis in the last 6 years. 6 years!

The reason is that I have got a contract at CERN. Yes, THAT CERN.
I was not expecting to get selected, but when I had the confirmation I decided to catch the opportunity: working at CERN, even for a defined period, represents a lot for the people working in IT. It will be the opportunity to apply what I already know as consultant, but also to learn many new things that are not easy to see everyday.

Wish me good luck!


Oracle Home Management – part 5: Oracle Home Inventory and Naming Conventions

$
0
0

Having the capability of managing multiple Oracle Homes is fundamental for the following reasons:

  • Out-of-place patching: cloning and patching a new Oracle Home usually takes less downtime than stopping the DBs and patching in-place
  • Better control of downtime windows: if the databases are consolidated on a single server, having multiple Oracle Homes allows moving and patching one database at a time instead of stopping everything and doing a “big bang” patch.

Make sure that you have a good set of scripts that help you to switch correctly from one environment to the other one. Personally, I recommend TVD-BasEnv, as it is very powerful and supports OFA and non-OFA environments, but for this blog series I will show my personal approach.

Get your Home information from the Inventory!

I wrote a blog post sometimes ago that shows how to get the Oracle Homes from the Central Inventory (Using Bash, OK, not the right tool to query XML files, but you get the idea):

Getting the Oracle Homes in a server from the oraInventory

With the same approach, you can have a script to SET your environment:

setoh ()
{
    SEARCH=${1:-"_foo_"};
    if [ $SEARCH == "ic" ]; then
		# ic is a shortcut for the Instant Client...
        OH=/u01/app/oracle/sbin/instantclient_12_2
        export VERSION=12.2.0.1
        export ORACLE_HOME=$OH
        export LD_LIBRARY_PATH=$ORACLE_HOME
        export OH_NAME=instantclient_12_2
        export ORACLE_VERSION=$VERSION
        export PATH=$ORACLE_HOME:$DEFAULT_PATH
        echo ORACLE_SID = $ORACLE_SID
        echo ORACLE_VERSION = $ORACLE_VERSION
        echo ORACLE_HOME = $ORACLE_HOME
    else
        CENTRAL_ORAINV=`grep ^inventory_loc /etc/oraInst.loc | awk -F= '{print $2}'`;
        IFS='
';
        found=0;
        for line in `grep "<HOME NAME=" ${CENTRAL_ORAINV}/ContentsXML/inventory.xml 2>/dev/null`;
        do
            if [ $found -eq 1 ]; then
                continue;
            fi;
            unset ORACLE_VERSION;
            unset ORAEDITION;
            OH=`echo $line | tr ' ' '\n' | grep ^LOC= | awk -F\" '{print $2}'`;
            OH_NAME=`echo $line | tr ' ' '\n' | grep ^NAME= | awk -F\" '{print $2}'`;
            if [ "$SEARCH" == "$OH_NAME" ]; then
                found=1;
                comp_file=$OH/inventory/ContentsXML/comps.xml;
                comp_xml=`grep "COMP NAME" $comp_file | head -1`;
                comp_name=`echo $comp_xml | tr ' ' '\n' | grep ^NAME= | awk -F\" '{print $2}'`;
                comp_vers=`echo $comp_xml | tr ' ' '\n' | grep ^VER= | awk -F\" '{print $2}'`;
                case $comp_name in
                    "oracle.crs")
                        ORACLE_VERSION=$comp_vers;
                        ORAEDITION=GRID
                    ;;
                    "oracle.sysman.top.agent")
                        ORACLE_VERSION=$comp_vers;
                        ORAEDITION=AGT
                    ;;
                    "oracle.server")
                        ORACLE_VERSION=`grep "PATCH NAME=\"oracle.server\"" $comp_file 2>/dev/null | tr ' ' '\n' | grep ^VER= | awk -F\" '{print $2}'`;
                        ORAEDITION="DBMS";
                        if [ -z "$ORACLE_VERSION" ]; then
                            ORACLE_VERSION=$comp_vers;
                        fi;
                        ORAMAJOR=`echo $ORACLE_VERSION |  cut -d . -f 1`;
                        case $ORAMAJOR in
                            11 | 12)
                                ORAEDITION="DBMS "`grep "oracle_install_db_InstallType" $OH/inventory/globalvariables/oracle.server/globalvariables.xml 2>/dev/null | tr ' ' '\n' | grep VALUE | awk -F\" '{print $2}'`
                            ;;
                            10)
                                ORAEDITION="DBMS "`grep "s_serverInstallType" $OH/inventory/Components21/oracle.server/*/context.xml 2>/dev/null | tr ' ' '\n' | grep VALUE | awk -F\" '{print $2}'`
                            ;;
                        esac
                    ;;
                esac;
                export VERSION=$ORACLE_VERSION;
                export ORACLE_HOME=$OH;
                export LD_LIBRARY_PATH=$ORACLE_HOME/lib;
                export OH_NAME;
                export ORACLE_VERSION;
                export PATH=$ORACLE_HOME/bin:$ORACLE_HOME/OPatch:$DEFAULT_PATH;
                echo ORACLE_SID = $ORACLE_SID;
                echo ORACLE_VERSION = $ORACLE_VERSION;
                echo ORACLE_HOME = $ORACLE_HOME;
                continue;
            fi;
        done;
        if [ $found -eq 0 ]; then
            echo "cannot find Oracle Home $1";
            false;
        else
            true;
        fi;
    fi
}

It uses a different approach from the oraenv script privided by Oracle, where you set the environment based on the ORACLE_SID variable and getting the information from the oratab. My setoh function gets the Oracle Home name as input. Although you can convert it easily to set the environment for a specific ORACLE_SID, there are some reason why I like it:

  • You can set the environment for an Oracle Home that it is not associated to any database (yet)
  • You can set the environment for an upgrade to a new release without changing (yet) the oratab
  • It works for OMS, Grid and Agent homes as well…
  • Most important, it will let you specify correctly the environment when you need to use a  fresh install (for patching it as well)

So, this is how it works:

# [ oracle@myserver:/u01/app/oracle [11:23:18] [12.1.0.2.0 SID="not set"] 0 ] #
# lsoh

HOME                        LOCATION                                                VERSION      EDITION
--------------------------- ------------------------------------------------------- ------------ ---------
OraGI12Home1                /u01/app/grid/product/grid                              12.1.0.2.0   GRID
agent12c1                   /u01/app/oracle/product/agent12c/core/12.1.0.5.0        12.1.0.5.0   AGT
OraDb11g_home1              /u01/app/oracle/product/11.2.0.4                        11.2.0.4.0   DBMS EE
OraDB12Home1                /u01/app/oracle/product/12.1.0.2                        12.1.0.2.0   DBMS EE
12_1_0_2_BP170718_RON       /u01/app/oracle/product/12_1_0_2_BP170718_RON           12.1.0.2.0   DBMS EE
12_1_0_2_BP180116_OCW       /u01/app/oracle/product/12_1_0_2_BP180116_OCW           12.1.0.2.0   DBMS EE

# [ oracle@myserver:/u01/app/oracle [11:23:22] [12.1.0.2.0 SID="not set"] 0 ] #
# setoh 12_1_0_2_BP180116_OCW
ORACLE_SID =
ORACLE_VERSION = 12.1.0.2.0
ORACLE_HOME = /u01/app/oracle/product/12_1_0_2_BP180116_OCW

# [ oracle@myserver:/u01/app/oracle [11:23:25] [12.1.0.2.0 SID="not set"] 0 ] #
# opatch lspatches
26925218;OCW Patch Set Update : 12.1.0.2.180116 (26925218)
26925263;Database Bundle Patch : 12.1.0.2.180116 (26925263)
22243983;

OPatch succeeded.

In the previous example, there are two Database homes that have been installed without a specific naming convention (OraDb11g_home1, OraDB12Home1) and two that follow a specific one (12_1_0_2_BP170718_RON, 12_1_0_2_BP180116_OCW).

Naming conventions play an important role

If you want to achieve an effective Oracle Home management, it is important that you have everywhere the same ORACLE_HOME paths, names and patch levels.

The Oracle Home path should not include only the release number:

/u01/app/oracle/product/12.1.0.2

If we have many Oracle Homes with the same release, how shall we call the other ones? There are several variables that might influence the naming convention:

Edition (EE, SE), RAC Option or other options, the patch type (formerly PSU, BP: now RU and RUR), eventual additional one-off patches.

Some ideas might be:

/u01/app/oracle/product/EE12.1.0.2
/u01/app/oracle/product/EE12.1.0.2_BP171019
/u01/app/oracle/product/EE12.1.0.2_BP171019_v2

The new release model will facilitate a lot the definition of a naming convention as we will have names like:

/u01/app/oracle/product/EE18.1.0
/u01/app/oracle/product/EE18.2.1
/u01/app/oracle/product/EE18.2.1_v2

Of course, the naming convention is not universal and can be adapted depending on the customer (e.g., if you have only Enterprise Editions you might omit this information).

Replacing dots with underscores?

You will see, at the end of the series, that I use Oracle Home paths with underscores instead of dots:

/u01/app/oracle/product/EE12_1_0_2
/u01/app/oracle/product/EE12_1_0_2_BP171019
/u01/app/oracle/product/EE12_1_0_2_BP171019_v2

Why?

From a naming perspective, there is no need to have the Home that corresponds to the release number. Release, version and product information can be collected through the inventory.

What is really important is to have good naming conventions and good manageability. In my ideal world, the Oracle Home name inside the central inventory and the basename of the Oracle Home path are the same: this facilitates tremendously the scripting of the Oracle Home provisioning.

Sadly, the Oracle Home name cannot contain dots, it is a limitation of the Oracle Inventory, here’s why I replaced them with underscores.

In the next blog post, I will show how to plan a framework for automated Oracle Home provisioning.

Blog posts in this series:

Oracle Home Management – part 1: Patch soon, patch often vs. reality
Oracle Home Management – part 2: Common patching patterns
Oracle Home Management – part 3: Strengths and limitations of Rapid Home Provisioning
Oracle Home Management – part 4: Challenges and opportunities of the New Release Model
Oracle Home Management – part 5: Oracle Home Inventory and Naming Conventions
Oracle Home Management – part 6: Simple Golden Image blueprint
Oracle Home Management – part 7: Putting all together
Oracle Home Management – Addendum: Managing and controlling the patch level (berx’s work)

Oracle Home Management – part 6: Simple Golden Image Blueprint

$
0
0

As I explained in the previous blog posts, from a manageability perspective, you should not change the patch level of a deployed Oracle Home, but rather install and patch a new Oracle Home.

With the same principle, Oracle Homes deployed on different hosts should have an identical patch level for the same name. For example, an Oracle Home /u01/app/oracle/product/EE12_1_0_2_BP171018 should have the same patch level on all the servers.

To guarantee the same binaries and patch levels everywhere, the simple solution that I am shoing in this series is to store copies of the Oracle Homes somewhere and use them as golden images. (Another approach, really different and cool, is used by Ilmar Kerm: he explains it here https://ilmarkerm.eu/blog/2018/05/oracle-home-management-using-ansible/ )

For this, we will use a Golden Image store (that could be a NFS share mounted on the Oracle Database servers, or a remote host accessible with scp, or other) and a metadata store.

golden-image-storeWhen all the software is deployed from golden images, there is the guarantee that all the Homes are equal; therefore the information about patches and bugfixes might be centralized in one place (golden image metadata).

A typical Oracle Home lifecycle:

  • Install the software manually the first time
  • Create automatically a golden image from the Oracle Home
  • Deploy automatically the golden image on the other servers

When a new patch is needed:

  • Deploy automatically the golden image to a new Oracle Home
  • Patch manually (or automatically!) the new Oracle Home
  • Create automatically the new golden image with the new name
  • Deploy automatically the new golden image to the other servers

The script that automates this lifecycle does just two main actions:

  • Automates the creation of a new golden image
  • Deploys an existing image to an Oracle Home (either with a new path or the default one)
  • (optional: uninstall an existing Home)

Let’s make a graphical example of the previously described steps:

oh-mgmt-lifecycleHere, the script ohctl takes two actions: -c (creates a Golden Image) and -i (installs a Golden Image)).

The create action does the following steps:

  • Copies the content to a working directory
  • Cleans up logs, audits, etc.
  • Creates the zip file
  • Stores the zip file in a shared NFS repository
  • Inserts the metadata of the new golden image in a repository

The install action does the following steps:

  • Checks if the image is already deployed (plus other security checks)
  • Creates the new path based on the name of the image or the new name passed as argument
  • Unzips the content in the new Oracle Home
  • Runs the runInstaller –clone to attach the home in the central inventory and (optionally) set a new Home name
  • (optionally) Relinks the oracle binary with the RAC option
  • Run setasmgid if found
  • Other environment-specific tasks (e.g. dealing with TNS_ADMIN links)

By following this pattern, Oracle Home names and paths are clean and the same everywhere. This facilitates the deployment and the patching.

You can find the Oracle Home cloning steps in the Oracle Database documentation:

Cloning an Oracle Home

In the next blog post I will explain parts of the ohctl source code and give some examples of how I use it (and publish a link to the full source code :-) )

Blog posts in this series:

Oracle Home Management – part 1: Patch soon, patch often vs. reality
Oracle Home Management – part 2: Common patching patterns
Oracle Home Management – part 3: Strengths and limitations of Rapid Home Provisioning
Oracle Home Management – part 4: Challenges and opportunities of the New Release Model
Oracle Home Management – part 5: Oracle Home Inventory and Naming Conventions
Oracle Home Management – part 6: Simple Golden Image blueprint
Oracle Home Management – part 7: Putting all together
Oracle Home Management – Addendum: Managing and controlling the patch level (not my work)

 

Oracle Home Management – part 7: Putting all together

$
0
0

Last part of the blog series… let’s see how to put everything together and have a single script that creates and provisions Oracle Home golden images:

Review of the points

The scripts will:

  • let create a golden image based on the current Oracle Home
  • save the golden image metadata into a repository (an Oracle schema somewhere)
  • list the avilable golden images and display whether they are already deployed on the current host
  • let provision an image locally (pull, not push), either with the default name or a new name

Todo:

  • Run as root in order to run root.sh automatically (or let specify the sudo command or a root password)
  • Manage Grid Infrastructure homes

Assumptions

  • There is an available Oracle schema where the golden image metadata will be stored
  • There is an available NFS share that contains the working copies and golden images
  • Some variables must be set accordingly to the environment in the script
  • The function setoh is defined in the environment (it might be copied inside the script)
  • The Instant Client is installed and “setoh ic” correctly sets its environment. This is required because there might be no sqlplus binaries available at the very first deploy
  • Oracle Home name and path’s basename are equal for all the Oracle Homes

Repository table

First we need a metadata table. Let’s keep it as simple as possible:

CREATE TABLE "OH_GOLDEN_IMAGES"  (
     NAME VARCHAR2(50 BYTE)
   , FULLPATH VARCHAR2(200 BYTE)
   , CREATED TIMESTAMP (6)
   , CONSTRAINT PK_OH_GOLDEN_IMAGES PRIMARY KEY (NAME)
);

Helpers

The script has some functions that check stuff inside the central inventory.

e.g.

F_OH_Installed () {
        CENTRAL_ORAINV=`grep ^inventory_loc /etc/oraInst.loc | awk -F= '{print $2}'`
        grep "<HOME NAME=\"$1\"" $CENTRAL_ORAINV/ContentsXML/inventory.xml | grep -v "REMOVED=\"T\"" >/dev/null
        if [ $? -eq 0 ] ; then
                echo -e "${colgrn}Installed${colrst}"
        else
                echo "Not installed"
        fi
}

checks if a specific Oracle Home (name) is present in the central inventory. It is helpful to check, for every golden image in the matadata repository, if it is already provisioned or not:

F_list_OH () {
 
        F_colordef
        echo
        echo "Listing existing golden images:"
 
        RESULT=`$SQLPLUS -S -L ${REPO_CREDENTIALS} <<EOF  | grep ";"
        set line 200 pages 1000
        set feed off head off
        col name format a32
        alter session set nls_timestamp_format='YYYY-MM-DD';
        select name||';'||created||';'||fullpath from oh_golden_images order by created desc;
EOF
`
        echo
        printf "%-35s %-10s %-18s\n" "OH_Name" "Created" "Installed locally?"
        echo "----------------------------------- ---------- ------------------"
 
        for line in $RESULT ; do
                L_GI_Name=`echo $line | awk -F\; '{print $1}'`
                L_GI_Date=`echo $line | awk -F\; '{print $2}'`
                L_GI_Path=`echo $line | awk -F\; '{print $3}'`
                L_Installed=`F_OH_Installed "$L_GI_Name"`
                printf "%-35s %-10s %-18s\n" "$L_GI_Name" "$L_GI_Date" "$L_Installed"
        done
}

Variables

Some variables must be changed, but in general you might want to adapt the whole script to fit your needs.

REPO_OWNER=scott
REPO_PWD=tiger
REPO_CONN="//localhost:1521/ORCL"
REPO_CREDENTIALS=${REPO_OWNER}/${REPO_PWD}@${REPO_CONN}
 
PRODUCT_INSTALL_PATH="/u01/app/oracle/product"
GOLDEN_IMAGE_DEST="/share/oracle/oh_repository/golden_images"
WORKING_COPY_DEST="/share/oracle/oh_repository/working_copies"

Image creation

The image creation would be as easy as creating a zip file, but there are some files that we do not want to include in the golden image, therefore we need to create a staging directory (working copy) to clean up everything:

# Copy to NFS working copy
        echo "Cleaning previous working copy"
        WC=$WORKING_COPY_DEST/$L_New_Name
        [ -d $WC ] && rm -rf $WC
 
        echo "Copying the OH to the working copy"
        mkdir -p  $WC
        cp -rp $ORACLE_HOME/* $WC/ 2>/tmp/ohctl.err
 
        # Cleanup files
        echo "Cleansing files in Working Copy"
        rm -rf $WC/log/$HOSTNAME
        rm -rf $WC/log/diag/rdbms/*
        rm -rf $WC/gpnp/$HOSTNAME
        find $WC/gpnp -type f -exec rm {} \; 2>/dev/null
        rm -rf $WC/cfgtoollogs/*
        rm -rf $WC/crs/init/*
        rm -rf $WC/cdata/*
        rm -rf $WC/crf/*
        rm -rf $WC/admin/*
        rm -rf $WC/network/admin/*.ora
        rm -rf $WC/crs/install/crsconfig_params
        find $WC -name '*.ouibak' -exec rm {} \; 2>/dev/null
        find $WC -name '*.ouibak.1' -exec rm {} \; 2>/dev/null
        # rm -rf $WC/root.sh
        find $WC/rdbms/audit -name '*.aud' -exec rm {} \; 2>/dev/null
        rm -rf $WC/rdbms/log/*
        rm -rf $WC/inventory/backup/*
        rm -rf $WC/dbs/*
 
        # create zip
        echo "Creating the Golden Image zip file"
        [ -f $GOLDEN_IMAGE_DEST/$L_New_Name.zip ] && rm $GOLDEN_IMAGE_DEST/$L_New_Name.zip
        pushd $WC
        zip -r $GOLDEN_IMAGE_DEST/$L_New_Name.zip . >/dev/null
        popd $OLDPWD

Home provisioning

Home provisioning requires, beside some checks, a runInstaller -clone command, eventually a relink, eventually a setasmgid, eventually some other tasks, but definitely  run root.sh. This last task is not automated yet in my deployment script.

# ... some checks ...
        # if no new OH name specified, get the golden image name
        ...
        # - check if image to install exists
        ...
        # - check if OH name to install is not already installed
        ...
        # - check if the zip exists
        ...
        # - check if the destination directory exist
        ...

        L_Clone_Command="$RUNINST -clone -waitForCompletion -silent ORACLE_HOME=$ORACLE_HOME ORACLE_BASE=$ORACLE_BASE ORACLE_HOME_NAME=$L_New_Name"
 
        echo $L_Clone_Command
        $L_Clone_Command
 
        if [ $? -eq 0 ] ; then
                echo "Clone command completed successfully."
        else
                echo "There was a problem during the clone command. The script will exit."
                exit 1
        fi
 
        if [ "${L_Link_RAC}" == "yes" ] ; then
                pushd $ORACLE_HOME/rdbms/lib
                make -f ins_rdbms.mk rac_on
                make -f ins_rdbms.mk ioracle
                popd
        fi
 
        # - run setasmgid
        if [ -x /etc/oracle/setasmgid ] ; then
                echo "setasmgid found: running it on Oracle binary"
                /etc/oracle/setasmgid oracle_binary_path=$ORACLE_HOME/bin/oracle
        else
                echo "setasmgid not found: ignoring"
        fi
 
        # - create symlinks for ldap, sqlnet and tnsnames.ora
        TNS_ADMIN=${TNS_ADMIN:-/var/opt/oracle}
        ln -s $TNS_ADMIN/sqlnet.ora   $ORACLE_HOME/network/admin/sqlnet.ora
        ln -s $TNS_ADMIN/tnsnames.ora $ORACLE_HOME/network/admin/tnsnames.ora
        ln -s $TNS_ADMIN/ldap.ora     $ORACLE_HOME/network/admin/ldap.ora
 
# ... other checks ...

 

Usage

Purpose : Management of Golden Images (Oracle Homes)
 
        Usage   : To list the available images:
                    ohctl -l
                  To install an image on the localhost:
                    ohctl -i goldenimage [-n newname] [-r]
                  To create an image based on the current OH:
                    ohctl -c [-n newname] [ -f ]
                  To remove a golden image from the repository:
                    ohctl -d goldenimage [ -f ]
 
        Options : -l                    List the available Oracle Homes in the golden image repository
                  -i goldenimage        Installs locally the specified golden image. (If already deployed, an error is thrown)
                                        if the option -l is given, the list action has the priority over the deploy.
                  -n newname            Specify a new name for the Oracle Home: use it in case you need to patch
                                        and create a new Golden Image from it or if you want to change the Golden Image name
                                        for the current Oracle Home you are converting to Image.
                                        When creating a new Image (-c), it takes the basename of the OH by default, and not the
                                        OHname inside the inventory.
                  -c                    Creates a new Golden Image from the current Oracle Home.
                  -d goldenimage        Removes the golden image from the repository
                  -f                    If the Golden Image to be created exists, force the overwrite.
                  -r                    Relink with RAC option (install only)
 
        Example : ohctl -i DB12_1_0_2_BP170718_home1 -n DB12_1_0_2_BP171018_home1
                                        installs the Oracle Home DB12_1_0_2_BP170718_home1 with new name DB12_1_0_2_BP171018_home1
                                        in order to apply the Bundle Patch 171018 on it
 
                  ohctl -i DB12_1_0_2_BP170718_home1
                                        installs the Oracle Home DB12_1_0_2_BP170718_home1 for normal usage
 
                  ohctl -c -n DB12_1_0_2_BP180116
                                        Creates a new Golden Image named DB12_1_0_2_BP180116 from the current ORACLE_HOME
 
                  ohctl -c -f
                                        Creates a new Golden Image with the name of the current OH basename, overwriting
                                        the eventual existing image.
                                        E.g. if the current OH is /ccv/app/oracle/product/DB12_1_0_2_BP180116, the new GI name
                                         will be "DB12_1_0_2_BP180116"

Examples

List installed homes:

# [ oracle@myserver:/u01/app/oracle/scripts [17:43:04] [12.1.0.2.0 SID=GRID] 0 ] #
# lsoh

HOME                        LOCATION                                                VERSION      EDITION
--------------------------- ------------------------------------------------------- ------------ ---------
OraGI12Home1                /u01/app/grid/product/grid                              12.1.0.2.0   GRID
OraDB12Home1                /u01/app/oracle/product/12.1.0.2                        12.1.0.2.0   DBMS EE
agent12c1                   /u01/app/oracle/product/agent12c/core/12.1.0.5.0        12.1.0.5.0   AGT
OraDb11g_home1              /u01/app/oracle/product/11.2.0.4                        11.2.0.4.0   DBMS EE
OraDB12Home2                /u01/app/oracle/product/12.1.0.2_BP170718               12.1.0.2.0   DBMS EE

Create a golden image 12_1_0_2_BP170718 from the Oracle Home named OraDB12Home2 (tha latter having been installed manually without naming convention):

# [ oracle@myserver:/u01/app/oracle/scripts [17:43:07] [12.1.0.2.0 SID=GRID] 0 ] #
# setoh OraDB12Home2

# [ oracle@myserver:/u01/app/oracle/scripts [17:43:04] [12.1.0.2.0 SID=GRID] 0 ] #
# ohctl -c -n 12_1_0_2_BP170718 -f
Image 12_1_0_2_BP170718 already exists but -f specified. The script will continue.


Creating the new Golden Image 12_1_0_2_BP170718

Cleaning previous working copy
Copying the OH to the working copy
Cleansing files in Working Copy
Creating the Golden Image zip file

# [ oracle@myserver:/u01/app/oracle/scripts [17:52:09] [12.1.0.2.0 SID=GRID] 0 ] #
#

List the new golden image from the metadata repository:

# [ oracle@myserver:/u01/app/oracle/scripts [17:57:46] [12.1.0.2.0 SID=GRID] 0 ] #
# ohctl -l

Listing existing golden images:

OH_Name                             Created    Installed locally?
----------------------------------- ---------- ------------------
12_1_0_2_BP170718                   2018-02-06 Not installed

Reinstalling the same home with the new naming convention:

# ohctl -i 12_1_0_2_BP170718
OK, the image exists.
Zip file exists.
The unzip completed successfully.
/u01/app/oracle/product/12_1_0_2_BP170718/oui/bin/runInstaller -clone -waitForCompletion -silent ORACLE_HOME=/u01/app/oracle/product/12_1_0_2_BP170718 ORACLE_BASE=/u01/app/oracle ORACLE_HOME_NAME=12_1_0_2_BP170718
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 16383 MB    Passed
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2018-02-06_06-04-33PM. Please wait ...Oracle Universal Installer, Version 12.1.0.2.0 Production
Copyright (C) 1999, 2014, Oracle. All rights reserved.

You can find the log of this install session at:
 /u01/app/oracle/oraInventory/logs/cloneActions2018-02-06_06-04-33PM.log
.................................................................................................... 100% Done.



Installation in progress (Tuesday, February 6, 2018 6:04:41 PM CET)
................................................................................                                                80% Done.
Install successful

Linking in progress (Tuesday, February 6, 2018 6:04:44 PM CET)
.                                                                81% Done.
Link successful

Setup in progress (Tuesday, February 6, 2018 6:05:01 PM CET)
..........                                                      100% Done.
Setup successful

Saving inventory (Tuesday, February 6, 2018 6:05:01 PM CET)
Saving inventory complete
Configuration complete

End of install phases.(Tuesday, February 6, 2018 6:05:22 PM CET)
WARNING:
The following configuration scripts need to be executed as the "root" user.
/u01/app/oracle/product/12_1_0_2_BP170718/root.sh
To execute the configuration scripts:
    1. Open a terminal window
    2. Log in as "root"
    3. Run the scripts

The cloning of 12_1_0_2_BP170718 was successful.
Please check '/u01/app/oracle/oraInventory/logs/cloneActions2018-02-06_06-04-33PM.log' for more details.
Clone command completed successfully.
setasmgid found: running it on Oracle binary
The image 12_1_0_2_BP170718 has been installed and exists in the inventory.

Installation completed. Please run /u01/app/oracle/product/12_1_0_2_BP170718/root.sh as root before using the new home.

# [ oracle@myserver:/u01/app/oracle/scripts [18:05:24] [12.1.0.2.0 SID=GRID] 0 ] #
#

# and manually...
-bash-4.2$ sudo /u01/app/oracle/product/12_1_0_2_BP170718/root.sh
Check /u01/app/oracle/product/12_1_0_2_BP170718/install/root_myserver_2018-02-06_18-06-07.log for the output of root script

Installing the same home in a new path for manual patching from 170718 to 180116:

# [ oracle@myserver:/u01/app/oracle/scripts [12:48:36] [12.1.0.2.0 SID=GRID] 0 ] #
# ohctl -i 12_1_0_2_BP170718 -n 12_1_0_2_BP180116
OK, the image exists.
Zip file exists.
The unzip completed successfully.
/u01/app/oracle/product/12_1_0_2_BP180116/oui/bin/runInstaller -clone -waitForCompletion -silent ORACLE_HOME=/u01/a                                           pp/oracle/product/12_1_0_2_BP180116 ORACLE_BASE=/u01/app/oracle ORACLE_HOME_NAME=12_1_0_2_BP180116
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 16383 MB    Passed
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2018-02-07_12-49-50PM. Please wait ...Oracle Univers                                           al Installer, Version 12.1.0.2.0 Production
Copyright (C) 1999, 2014, Oracle. All rights reserved.

You can find the log of this install session at:
 /u01/app/oracle/oraInventory/logs/cloneActions2018-02-07_12-49-50PM.log
.................................................................................................... 100% Done.



Installation in progress (Wednesday, February 7, 2018 12:49:58 PM CET)
................................................................................                                                                                           80% Done.
Install successful

Linking in progress (Wednesday, February 7, 2018 12:50:00 PM CET)
.                                                                81% Done.
Link successful

Setup in progress (Wednesday, February 7, 2018 12:50:17 PM CET)
..........                                                      100% Done.
Setup successful

Saving inventory (Wednesday, February 7, 2018 12:50:17 PM CET)
Saving inventory complete
Configuration complete

End of install phases.(Wednesday, February 7, 2018 12:50:38 PM CET)
WARNING:
The following configuration scripts need to be executed as the "root" user.
/u01/app/oracle/product/12_1_0_2_BP180116/root.sh
To execute the configuration scripts:
    1. Open a terminal window
    2. Log in as "root"
    3. Run the scripts

The cloning of 12_1_0_2_BP180116 was successful.
Please check '/u01/app/oracle/oraInventory/logs/cloneActions2018-02-07_12-49-50PM.log' for more details.
Clone command completed successfully.
setasmgid found: running it on Oracle binary
The image 12_1_0_2_BP180116 has been installed and exists in the inventory.

Installation completed. Please run /u01/app/oracle/product/12_1_0_2_BP180116/root.sh as root before using the new home.

New home situation:

# [ oracle@myserver:/u01/app/oracle/scripts [12:50:41] [12.1.0.2.0 SID=GRID] 0 ] #
# lsoh

HOME                        LOCATION                                                VERSION      EDITION                                                          
--------------------------- ------------------------------------------------------- ------------ --------                                                         -
OraGI12Home1                /u01/app/grid/product/grid                              12.1.0.2.0   GRID                                                             
OraDB12Home1                /u01/app/oracle/product/12.1.0.2                        12.1.0.2.0   DBMS EE                                                          
agent12c1                   /u01/app/oracle/product/agent12c/core/12.1.0.5.0        12.1.0.5.0   AGT                                                              
OraDb11g_home1              /u01/app/oracle/product/11.2.0.4                        11.2.0.4.0   DBMS EE                                                          
OraDB12Home2                /u01/app/oracle/product/12.1.0.2_BP170718               12.1.0.2.0   DBMS EE
12_1_0_2_BP170718           /u01/app/oracle/product/12_1_0_2_BP170718               12.1.0.2.0   DBMS EE                                                          
12_1_0_2_BP180116           /u01/app/oracle/product/12_1_0_2_BP180116               12.1.0.2.0   DBMS EE

Patch manually the home named  12_1_0_2_BP180116 with the January bundle patch:

# [ oracle@myserver:/u01/app/oracle/scripts [18:07:00] [12.1.0.2.0 SID=GRID] 0 ] #
# setoh 12_1_0_2_BP170718

# [ oracle@myserver:/share/oracle/database/patches/12c/12.1.0.2.BP180116/27010930/26925263 [12:55:58] [12.1.0.2.0 SID=GRID] 0 ] #
# opatch apply
Oracle Interim Patch Installer version 12.2.0.1.12
Copyright (c) 2018, Oracle Corporation.  All rights reserved.


Oracle Home       : /u01/app/oracle/product/12_1_0_2_BP180116
Central Inventory : /u01/app/oracle/oraInventory
   from           : /u01/app/oracle/product/12_1_0_2_BP180116/oraInst.loc
OPatch version    : 12.2.0.1.12
OUI version       : 12.1.0.2.0
Log file location : /u01/app/oracle/product/12_1_0_2_BP180116/cfgtoollogs/opatch/opatch2018-02-07_12-                                                         54-50PM_1.log

Verifying environment and performing prerequisite checks...
OPatch continues with these patches:   26609798  26717470  26925263

Do you want to proceed? [y|n]
y
User Responded with: Y
All checks passed.

Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/u01/app/oracle/product/12_1_0_2_BP180116')


Is the local system ready for patching? [y|n]
y
User Responded with: Y
Backing up files...
Applying sub-patch '26609798' to OH '/u01/app/oracle/product/12_1_0_2_BP180116'

Patching component oracle.oracore.rsf, 12.1.0.2.0...

Patching component oracle.rdbms, 12.1.0.2.0...

Patching component oracle.rdbms.rsf, 12.1.0.2.0...
Applying sub-patch '26717470' to OH '/u01/app/oracle/product/12_1_0_2_BP180116'
ApplySession: Optional component(s) [ oracle.oid.client, 12.1.0.2.0 ] , [ oracle.has.crs, 12.1.0.2.0 ]  n                                                         ot present in the Oracle Home or a higher version is found.

Patching component oracle.ldap.client, 12.1.0.2.0...

Patching component oracle.rdbms.crs, 12.1.0.2.0...

Patching component oracle.rdbms.deconfig, 12.1.0.2.0...

Patching component oracle.xdk, 12.1.0.2.0...

Patching component oracle.tfa, 12.1.0.2.0...

Patching component oracle.rdbms, 12.1.0.2.0...

Patching component oracle.rdbms.dbscripts, 12.1.0.2.0...

Patching component oracle.nlsrtl.rsf, 12.1.0.2.0...

Patching component oracle.xdk.parser.java, 12.1.0.2.0...

Patching component oracle.xdk.rsf, 12.1.0.2.0...

Patching component oracle.rdbms.rsf, 12.1.0.2.0...

Patching component oracle.rdbms.rman, 12.1.0.2.0...

Patching component oracle.rdbms.rman, 12.1.0.2.0...

Patching component oracle.has.deconfig, 12.1.0.2.0...
Applying sub-patch '26925263' to OH '/u01/app/oracle/product/12_1_0_2_BP180116'
ApplySession: Optional component(s) [ oracle.has.crs, 12.1.0.2.0 ]  not present in the Oracle Home or a h                                                         igher version is found.

Patching component oracle.network.rsf, 12.1.0.2.0...

Patching component oracle.rdbms.crs, 12.1.0.2.0...

Patching component oracle.rdbms.util, 12.1.0.2.0...

Patching component oracle.rdbms, 12.1.0.2.0...

Patching component oracle.rdbms.dbscripts, 12.1.0.2.0...

Patching component oracle.rdbms.rsf, 12.1.0.2.0...

Patching component oracle.rdbms.rman, 12.1.0.2.0...
Composite patch 26925263 successfully applied.
Sub-set patch [22652097] has become inactive due to the application of a super-set patch [26925263].
Please refer to Doc ID 2161861.1 for any possible further required actions.
Log file location: /u01/app/oracle/product/12_1_0_2_BP180116/cfgtoollogs/opatch/opatch2018-02-07_12-5                                                         4-50PM_1.log

OPatch succeeded.

# [ oracle@myserver:/share/oracle/database/patches/12c/12.1.0.2.BP180116/27010930/26925263 [12:55:47] [12.1.0.2.0 SID=GRID] 0 ] #
# opatch lspatches
26925263;Database Bundle Patch : 12.1.0.2.180116 (26925263)
22243983;

OPatch succeeded.

Create the new golden image from the home patched with January bundle patch:

# [ oracle@myserver:/u01/app/oracle/scripts [18:07:00] [12.1.0.2.0 SID=GRID] 0 ] #
# setoh 12_1_0_2_BP180116

# [ oracle@myserver:/u01/app/oracle/scripts [12:57:24] [12.1.0.2.0 SID=GRID] 1 ] #
# ohctl -c -f
Creating the new Golden Image 12_1_0_2_BP180116

Cleaning previous working copy
Copying the OH to the working copy
Cleansing files in Working Copy
Creating the Golden Image zip file


# [ oracle@myserver:/u01/app/oracle/scripts [13:04:57] [12.1.0.2.0 SID=GRID] 0 ] #
# ohctl -l
Listing existing golden images:

OH_Name                             Created    Installed locally?
----------------------------------- ---------- ------------------
12_1_0_2_BP180116                   2018-02-07 Installed
12_1_0_2_BP170718                   2018-02-06 Installed

Full source code

Full source code of ohctl

I hope you find it useful! The cool thing is that once you have the golden images ready in the golden image repository, then the provisioning to all the servers is striaghtforward and requires just a couple of minutes, from nothing to a full working and patched Oracle Home.

Why applying the patch manually?

If you read everything carefully, I automated the golden image creation and provisioning, but the patching is still done manually.

The aim of this framework is not to patch all the Oracle Homes with the same patch, but to install the patch ONCE and then deploy the patched home everywhere. Because each patch has different conflicts, bugs, etc, it might be convenient to install it manually the first time and then forget it. At least this is my opinion 🙂

Of course, patch download, conflict detection, etc. can also be automated (and it is a good idea, if you have the time to implement it carefully and bullet-proof).

In the addendum blog post, I will show some scripts made by Hutchison Austria and why I find them really useful in this context.

Blog posts in this series:

Oracle Home Management – part 1: Patch soon, patch often vs. reality
Oracle Home Management – part 2: Common patching patterns
Oracle Home Management – part 3: Strengths and limitations of Rapid Home Provisioning
Oracle Home Management – part 4: Challenges and opportunities of the New Release Model
Oracle Home Management – part 5: Oracle Home Inventory and Naming Conventions
Oracle Home Management – part 6: Simple Golden Image blueprint
Oracle Home Management – part 7: Putting all together
Oracle Home Management – Addendum: Managing and controlling the patch level (berx’s work)

Oracle Database 18c and version numbers

$
0
0

The Oracle New Release Model is very young, and thus suffers of some small inconsistencies in the release naming.
Oracle already announced that 18c was a renaming of what was intended to be 12.2.0.2 in the original roadmap.
I though that 19c would have been 12.2.0.3, but now I have some doubts when looking at the local inventory contents.

I am consistently using my functions lsoh and setoh, as described in my posts:

Getting the Oracle Homes in a server from the oraInventory

and:

Oracle Home Management – part 5: Oracle Home Inventory and Naming Conventions

What I do, basically, is to get the list of attached Oracle Homes from the Central Inventory, and then get some details (like version and edition) from the local inventory of each Oracle Home.

But now that Oracle 18.3 is out, my function shows release 18.0.0.0.0 when I try to get it in the previous way.

# [ oracle@server1:/u01/app/grid/crs1830 [08:53:50] [18.0.0.0.0 [GRID] SID=+ASM1] 0 ] #
# lsoh

HOME                   LOCATION                           VERSION      EDITION
---------------------- ---------------------------------- ------------ ---------
OraGI18Home1           /u01/app/grid/crs1830              18.0.0.0.0   GRID

The fact is that prior to 18c, the component version was showing the actual version (without patches):

<COMP NAME="oracle.rdbms" VER="12.1.0.2.0"  [...]  " ACT_INST_VER="12.1.0.2.0"  [...]>

but now, it shows the “base release” 18.0.0.0.0, whereas the ACT_INST_VER property shows the “Active” version:

<COMP NAME="oracle.rdbms" VER="18.0.0.0.0" [...] ACT_INST_VER="12.2.0.4.0"[...]>

You can see that ACT_INST_VER is 12.2.0.4.0! does it indicate that 18.3 was planned to be 12.2.0.4?

like …

12.2.0.2 -> 18.1

12.2.0.3 -> 18.2

12.2.0.4 -> 18.3

?

This is in contrast with MOS Doc ID 230.1 that states that 18c was a “sort of” 12.2.0.2, so probably I get it wrong.

My first reflex has been to search, in the local inventory, where the string 18.3.0  was written down, but with my surprise, it is just a description, not a “real value”:

<ONEOFF REF_ID="28090523" UNIQ_ID="22329768" ROLLBACK="T" XML_INV_LOC="oneoffs/28090523/" ACT_INST_VER="12.2.0.4.0" INSTALL_TIME="2018.Jul.18 20:06:33 CEST">
 <DESC>Database Release Update : 18.3.0.0.180717 (28090523)</DESC>

[...]

# grep '18\.3\.0' comps.xml
   <DESC>Database Release Update : 18.3.0.0.180717 (28090523)</DESC>
   <DESC>OCW RELEASE UPDATE 18.3.0.0.0 (28090553)</DESC>
   <DESC>ACFS RELEASE UPDATE 18.3.0.0.0 (28090557)</DESC>
   <DESC>DBWLM RELEASE UPDATE 18.3.0.0.0 (28090564)</DESC>
   <DESC>TOMCAT RELEASE UPDATE 18.3.0.0.0 (28256701)</DESC>
   <DESC>OJVM RELEASE UPDATE: 18.3.0.0.180717 (27923415)</DESC>

Again, the ACT_INST_VER property reports 12.2.0.4.0.

So, where can we extract the version we would expect (18.3.0.0.0)?

Oracle 18c provides a new binary oraversion that gives us this information:

# oraversion
This program prints release version information.
These are its possible arguments:
-compositeVersion: Print the full version number: a.b.c.d.e.
-baseVersion: Print the base version number: a.0.0.0.0.
-majorVersion: Print the major version number: a.
-buildStamp: Print the date/time associated with the build.
-buildDescription: Print a description of the build.
-help: Print this message.

# oraversion -compositeVersion
18.3.0.0.0

Note that 18.3.0.0.0 differs from the description:

Database Release Update : 18.3.0.0.180717

which is, as far as I understand, just a bad way to use the old notation to give the idea of the release date of such release update.

Also note that baseVersion has always the format <MAJOR>.0.0.0.0.

In the future I expect that ACT_INST_VER will be consistent with the compositeVersion, but I cannot be sure.

Ludo

 

 

Setting Grid Infrastructure 18c Oracle Home name during the install

$
0
0

A colleague has been struggling for some time in order to get the correct Oracle Home name for the Grid Infrastructure18.3.0 when running gridSetup.sh.

In the graphical Oracle Universal Installer there is no way (as far as we could find) to set the Home name. Moreover, it was our intention to automate the install of Grid Infrastructure.

The complete responsefile ($OH/inventory/response/oracle.crs_Complete.rsp) contains the parameter:

#-------------------------------------------------------------------------------
#Name       : ORACLE_HOME_NAME
#Datatype   : String
#Description: Oracle Home Name. Used in creating folders and services.
#Example: ORACLE_HOME_NAME = "OHOME1"
#-------------------------------------------------------------------------------
ORACLE_HOME_NAME="OraGI18Home1"

However, when using a responsefile with such parameter, gridSetup.sh fails with the error:

Cause - Syntactically incorrect response file.
Either unexpected variables are specified or expected variables are not specified in the response file.
Action - Refer the latest product specific response file template
Summary  - cvc-complex-type.2.4.a: Invalid content was found starting with element 'ORACLE_HOME_NAME'.
One of '{..... long list .....}' is expected.

After some tries (and a SR), this happens to actually work:

  • strip the ORACLE_HOME_NAME parameter from the responsefile
  • pass it as a double-quoted parameter at the end of the gridSetup.sh command line

./gridSetup.sh -debug -responseFile inventory/response/Grid_Config.rsp "ORACLE_HOME_NAME=YourGIHomeName"

HTH

Converting SQL*Plus calls in shell scripts to ORDS calls

$
0
0

I develop a lot of shell scripts. I would not define myself an old dinosaur that keeps avoiding python or other modern languages. It is just that most of my scripts automate OS commands that I would normally run interactively in an interactive shell… tar, cp, expdp, rman, dgmgrl, etc… and of course, some SQL*Plus executions.

For database calls, the shell is not appropriate: no drivers, no connection, no statement, no resultset… that’s why I need to make SQL*Plus executions (with some hacks to make them work correctly), and that’s also why I normally use python or perl for data-related tasks.

Using SQL*Plus in shell scripts

For SQL*Plus executions within a shell scripts there are some hacks, as I have said, that allow to get the data correctly.

As example, let’s use this table (that you might have found in my recent posts):

SQL> desc OH_GOLDEN_IMAGES
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 NAME                                      NOT NULL VARCHAR2(50)
 OH_TYPE                                            VARCHAR2(10)
 VERSION                                            VARCHAR2(10)
 FULLPATH                                           VARCHAR2(200)
 CREATED                                            TIMESTAMP(6)
 DESCRIPTION                                        VARCHAR2(2000)

SQL> insert into OH_GOLDEN_IMAGES values ('18_3_0_cerndb1', 'RDBMS', '18.3.0', '/test/path/18_3_0_cerndb1.zip', sysdate-10, 'First version 18.3.0');

1 row created.

SQL> insert into OH_GOLDEN_IMAGES values ('18_3_0_cerndb2', 'RDBMS', '18.3.0', '/test/path/18_3_0_cerndb2.zip', sysdate-1, '18_3_0_cerndb1 + Patch XXX');

1 row created.

SQL> commit;

Commit complete.

In order to get, as example, the result of this query:

SELECT name, version, fullpath, TO_CHAR(created,'YYYY-MM-DD') as created
FROM oh_golden_images WHERE oh_type='RDBMS' order by created

and assign the values to some variables (in a shell loop), it is common to do something like this:

REPO_CREDENTIALS='scott/tiger@orcl'
RESULT=`$ORACLE_HOME/bin/sqlplus -s $REPO_CREDENTIALS 2>&1 <<EOF | grep ";"
        set line 200 pages 1000
        set echo off feedback off heading off
        alter session set nls_timestamp_format='YYYY-MM-DD';
        SELECT name || ';' ||version || ';' || fullpath || ';' || created
          FROM oh_golden_images
        WHERE oh_type='RDBMS'
           order by created;
        exit;
EOF
`

for line in $RESULT ; do
        L_GI_Name=`echo $line | awk -F\; '{print $1}'`
        L_GI_Version=`echo $line | awk -F\; '{print $2}'`
        L_GI_Path=`echo $line | awk -F\; '{print $3}'`
        L_GI_Date=`echo $line | awk -F\; '{print $4}'`
        echo "doing something with variables $L_GI_Name $L_GI_Date $L_GI_Path $L_GI_Version"
done

As you can see, there are several hacks:

  • The credentials must be defined somewhere (I recommend putting them in a wallet)
  • All the output goes in a variable (or looping directly)
  • SQL*Plus formatting can be a problem (both sqlplus settings and concatenating fields)
  • Loop and get, for each line, the variables (using awk in my case)

It is not rock solid (unexpected data might compromise the results) and there are dependencies (sqlplus binary, credentials, etc.). But for many simple tasks, that’s more than enough.

Here’s the output:

$ sh sqlplus_test.sh
doing something with values 18_3_0_cerndb1 2018-08-19 /test/path/18_3_0_cerndb1.zip 18.3.0
doing something with values 18_3_0_cerndb2 2018-08-28 /test/path/18_3_0_cerndb2.zip 18.3.0

 

Using ORDS instead

Recently I have come across a situation where I had no Oracle binaries but needed to get some data from a table. That is often a situation where I use python or perl, but even in these cases, I need compatible software and drivers!

So I used ORDS instead (that by chance, was already configured for the databases I wanted to query), and used curl and jq to get the data in the shell script.

First, I have defined the service in the database:

BEGIN
  ORDS.DEFINE_SERVICE(
    p_module_name    => 'ohctl',
    p_base_path      => 'ohctl/',
    p_pattern        => 'list/',
    p_method         => 'GET',
    p_source_type    => ORDS.source_type_collection_feed,
    p_source         => 'SELECT name, version, fullpath, TO_CHAR(created,''YYYY-MM-DD'') as created FROM oh_golden_images WHERE oh_type=''RDBMS'' order by created',
    p_items_per_page => 0);
	COMMIT;
END;
/

At this point, a direct call gives this:

$ curl $rest_ep/ohctl/list/
{"items":[{"name":"18_3_0_cerndb1","version":"18.3.0","fullpath":"/test/path/18_3_0_cerndb1.zip","created":"2018-08-19"},{"name":"18_3_0_cerndb2","version":"18.3.0","fullpath":"/test/path/18_3_0_cerndb2.zip","created":"2018-08-28"}],"hasMore":false,"limit":0,"offset":0,"count":2,"links":[{"rel":"self","href":"https://rest_endpoint/ohctl/list/"},{"rel":"describedby","href":"https://rest_endpoint/metadata-catalog/ohctl/list/"}]}

How to parse the data?

jq is a command-line JSON processor that can be used in a pipeline.

I can get the items:

$ curl -s $rest_ep/ohctl/list/ | jq --raw-output  '.items[]'
{
  "created": "2018-08-19",
  "fullpath": "/test/path/18_3_0_cerndb1.zip",
  "version": "18.3.0",
  "name": "18_3_0_cerndb1"
}
{
  "created": "2018-08-28",
  "fullpath": "/test/path/18_3_0_cerndb2.zip",
  "version": "18.3.0",
  "name": "18_3_0_cerndb2"
}

And I can produce a csv output:

$ curl -s $rest_ep/ohctl/list/ | jq --raw-output  '.items[] | @csv "\([.created]),\([.fullpath]),\([.version]),\([.name])"'
"2018-08-19","/test/path/18_3_0_cerndb1.zip","18.3.0","18_3_0_cerndb1"
"2018-08-28","/test/path/18_3_0_cerndb2.zip","18.3.0","18_3_0_cerndb2"

But the best, is the shell formatter, that returns strings properly escaped for usage in shell commands:

$ curl -s $rest_ep/ohctl/list/ | jq --raw-output  '.items[] | @sh "L_GI_Date=\([.created]); L_GI_Path=\([.fullpath]); L_GI_Version=\([.version]); L_GI_Name=\([.name])"'
L_GI_Date='2018-08-19'; L_GI_Path='/test/path/18_3_0_cerndb1.zip'; L_GI_Version='18.3.0'; L_GI_Name='18_3_0_cerndb1'
L_GI_Date='2018-08-28'; L_GI_Path='/test/path/18_3_0_cerndb2.zip'; L_GI_Version='18.3.0'; L_GI_Name='18_3_0_cerndb2'

At this point, the call to eval is a natural step 🙂

IFS="
"
for line in `curl -s $rest_ep/ohctl/list/ | jq --raw-output  '.items[] | @sh "L_GI_Date=\([.created]); L_GI_Path=\([.fullpath]); L_GI_Version=\([.version]); L_GI_Name=\([.name])"'` ; do
        eval $line
        echo "doing something with values $L_GI_Name $L_GI_Date $L_GI_Path $L_GI_Version"
done

The output:

$ sh ords_test.sh
doing something with values 18_3_0_cerndb1 2018-08-19 /test/path/18_3_0_cerndb1.zip 18.3.0
doing something with values 18_3_0_cerndb2 2018-08-28 /test/path/18_3_0_cerndb2.zip 18.3.0

😉

Ludovico

Grid Infrastructure 18c: changes in gridSetup.sh -applyRU and -createGoldImage

$
0
0

Starting with release 12cR2, Grid Infrastructure binaries are no more shipped as an installer, but as a zip file that is uncompressed directly in the Oracle Home path.
This opened a few new possibilities including patching the software before the Grid Infrastructure configuration.
My former colleague Markus Flechtner wrote an excellent blog post about it, here: https://www.markusdba.net/?p=294

Now, with 18c, there are a couple of things that changed comparing to Markus blog.

The -applyRU switch replaces the -applyPSU

While it is possible to apply several sub-patches of a PSU one by one:

./gridSetup.sh -silent -applyOneOffs <path to sub-patch>
e.g.

./gridSetup.sh -silent -applyOneOffs /work/p28659165_180000_Linux-x86-64/28659165/28547619
./gridSetup.sh -silent -applyOneOffs /work/p28659165_180000_Linux-x86-64/28659165/28655784
./gridSetup.sh -silent -applyOneOffs /work/p28659165_180000_Linux-x86-64/28659165/28655916
...

it was possible to do all at once with:

./gridSetup.sh -silent -applyPSU <path to PSU>

Now the switch is called, for consistency with the patch naming, -applyRU.

E.g.:

# [ oracle@server:/u01/app/grid/crs1840 [16:38:40] [18.4.0.0.0 [GRID] SID=GRID] 255 ] #
$ ./gridSetup.sh -silent -applyRU /u01/app/oracle/stage/p28659165_180000_Linux-x86-64/28659165
Preparing the home to patch...
Applying the patch  /u01/app/oracle/stage/p28659165_180000_Linux-x86-64/28659165...
Successfully applied the patch.
The log can be found at: /u01/app/oraInventory/logs/GridSetupActions2018-11-02_04-39-54PM/installerPatchActions_2018-11-02_04-39-54PM.log
Launching Oracle Grid Infrastructure Setup Wizard...

[FATAL] [INS-40426] Grid installation option has not been specified.
   ACTION: Specify the valid installation option.

Still there are no options to avoid the run of the Setup Wizard, but it is safe to ignore the error as the patch has been applied successfully.

The -createGoldImage does not work anymore if the Home is not attached

I have tried to create the golden image as per Markus post, but I get this error:

# [ oracle@server:/u01/app/grid/crs1840 [09:43:39] [18.4.0.0.0 [GRID] SID=GRID] 0 ] #
$ ./gridSetup.sh -createGoldImage -destinationlocation  /u01/app/oracle/stage/golden_images/crs1840 -silent
Launching Oracle Grid Infrastructure Setup Wizard...

[FATAL] [INS-32715] The source home (/u01/app/grid/crs1840) is not registered in the central inventory.
   ACTION: Ensure that the source home is registered in the central inventory.

To workaround the issue, there are two ways:

  1. Create a zip file manually, as all the content needed to install the patched version is right there. No need to touch anything as the software is not configured yet.
  2. Configure the software with CRS_SWONLY before creating the gold image:
    $ cat grid1840_swonly.rsp
    oracle.install.responseFileVersion=/oracle/install/rspfmt_crsinstall_response_schema_v18.0.0
    INVENTORY_LOCATION=/u01/app/oraInventory
    oracle.install.option=CRS_SWONLY
    ORACLE_BASE=/u01/app/oracle
    oracle.install.asm.OSDBA=dba
    oracle.install.asm.OSASM=asmdba
    oracle.install.crs.config.scanType=LOCAL_SCAN
    oracle.install.crs.config.gpnp.configureGNS=false
    oracle.install.crs.config.autoConfigureClusterNodeVIP=false
    oracle.install.crs.config.gpnp.gnsOption=CREATE_NEW_GNS
    oracle.install.crs.config.clusterNodes=server1,server2
    oracle.install.asm.configureGIMRDataDG=false
    oracle.install.crs.config.useIPMI=false
    oracle.install.asm.storageOption=ASM
    oracle.install.asmOnNAS.configureGIMRDataDG=false
    oracle.install.asm.diskGroup.name=OCRVOT
    oracle.install.asm.diskGroup.AUSize=1
    oracle.install.asm.gimrDG.AUSize=1
    oracle.install.asm.configureAFD=false
    oracle.install.crs.configureRHPS=false
    oracle.install.crs.config.ignoreDownNodes=false
    oracle.install.config.managementOption=NONE
    oracle.install.config.omsPort=0
    oracle.install.crs.rootconfig.executeRootScript=false
    
    $ ./gridSetup.sh -silent -responseFile grid1840_swonly.rsp ORACLE_HOME_NAME=crs1840
    Launching Oracle Grid Infrastructure Setup Wizard...
    
    The response file for this session can be found at:
     /u01/app/grid/crs1840/install/response/grid_2018-11-05_01-18-28PM.rsp
    
    You can find the log of this install session at:
     /u01/app/oraInventory/logs/GridSetupActions2018-11-05_01-18-28PM/gridSetupActions2018-11-05_01-18-28PM.log
    
    As a root user, execute the following script(s):
            1. /u01/app/grid/crs1840/root.sh
    
    Execute /u01/app/grid/crs1840/root.sh on the following nodes:
    [server1, server2]
    
    [root@server1 dbs01]# /u01/app/grid/crs1840/root.sh
    Check /u01/app/grid/crs1840/install/root_server1.cern.ch_2018-11-05_14-13-58-835084539.log for the output of root script
    
    [root@server2 dbs01]# /u01/app/grid/crs1840/root.sh
    Check /u01/app/grid/crs1840/install/root_server2.cern.ch_2018-11-05_14-15-18-835087641.log for the output of root script
    
    $ ./gridSetup.sh -createGoldImage -destinationlocation  /u01/app/oracle/stage/golden_images/crs1840 -silent
    Launching Oracle Grid Infrastructure Setup Wizard...
    
    Successfully Setup Software.
    Gold Image location: /u01/app/oracle/stage/golden_images/crs1840/grid_home_2018-11-05_02-25-52PM.zip

 

HTH

Ludo


Port conflict with “Oracle Remote Method Invocation (ORMI)” during Grid Infrastructure install

$
0
0

After years of installing Grid Infrastructures, today I have got for the first time an error on something new:

$ /u01/app/grid/crs1840/gridSetup.sh -silent -responseFile /u01/app/grid/crs1840/inventory/response/CERNDB_Grid_Config.rsp ORACLE_HOME_NAME=crs1840
Launching Oracle Grid Infrastructure Setup Wizard...

[FATAL] [INS-13013] Target environment does not meet some mandatory requirements.
   CAUSE: Some of the mandatory prerequisites are not met. See logs for details. /tmp/GridSetupActions2018-11-13_12-40-03PM/gridSetupActions2018-11-13_12-40-03PM.log
   ACTION: Identify the list of failed prerequisite checks from the log: /tmp/GridSetupActions2018-11-13_12-40-03PM/gridSetupActions2018-11-13_12-40-03PM.log. Then either from the log file or from installation manual find the appropriate configuration to meet the prerequisites and fix it manually.

Looking at the logs (which I do not have now as I removed them as part of the failed install cleanup 🙁 ), the error is generated by the cluster verification utility (CVU) on this check:

Verifying Port Availability for component "Oracle Remote Method Invocation (ORMI)"

The components verified by the CVU can be found inside $ORACLE_HOME/cv/cvdata/. In my case, precisely:

$ grep -i ORMI $ORACLE_HOME/cv/cvdata/18/crsinst_prereq.xml
         <PORT NAME="Oracle Remote Method Invocation (ORMI)" VALUE="23791" PROTOCOL="TCP" NETWORK_TYPE="PUBLIC"/>
         <PORT NAME="Oracle Remote Method Invocation (ORMI)" VALUE="23792" PROTOCOL="TCP" NETWORK_TYPE="PUBLIC"/>

This check is critical, so the install fails.

In my case the port was used by mcollectived.

[root@server1 work]# netstat -anp | grep 23791

[root@server1 work]# netstat -anp | grep 23792
tcp 0 0 x.x.x.x:23792 x.x.x.x:61613 ESTABLISHED 2298/ruby

[root@server1 work]# ps -eaf | grep 2298
root 2298 1 0 11:16 ? 00:00:02 /opt/puppetlabs/puppet/bin/ruby /opt/puppetlabs/puppet/bin/mcollectived --config=/etc/puppetlabs/mcollective/server.cfg --pidfile=/var/run/puppetlabs/mcollective.pid --daemonize
root 47116 4114 0 12:50 pts/0 00:00:00 grep --color=auto 2298

The port has been taken dynamically, and previous runs of CVU did not encounter the problem.

A rare port conflict that might happen when configuring GI 🙂

Ludo

Oracle Grid Infrastructure 18c patching part 1: Some history

$
0
0

Down the memory lane

Although sometimes I think I have been working with Oracle Grid Infrastructure since it exists, sometimes my memory does not work well. I still like to go through the Oracle RAC family history from time to time:

  • 8i -> no Oracle cluster did exist. RAC was leveraging 3rd party clusters (like Tru Cluster, AIX HACMP, Sun Cluster)…
  • 9i -> if I remember well, Oracle hired some developers of Tru Cluster after the acquisition of Compaq by HP. Oracle CRS was born and was quite similar to Tru Cluster. (The commands were almost the same: crs_stat instead of caa_stat, etc)
  • 10g -> Oracle re-branded CRS to Clusterware
  • 11g -> With the addition of ASM (and other components), Oracle created the concept of “Grid Infrastructure”, composed by Clusterware and additional products. All the new versions still use the name Grid Infrastructure and new products have been added through the years (ACFS, RHP, QoS …)

But I have missing souvenirs. For example, I cannot remember having ever upgraded an Oracle Cluster from 9i to 10g or from 10g to 11g. At that time I was working for several customers, and every new release was installed on new Hardware.

My first, real upgrade (as far as I can remember) was from 11gR2 to 12c, where the upgrade process was a nice, OUI-driven, out-of-place install.

The process was (still is 🙂 ) nice and smooth:

  • The installer copies, prepares and links the binaries on all the nodes in a new Oracle Home
  • The upgrade process is rolling: the first node puts the cluster in upgrade mode
  • The last node does the final steps and exists the cluster from the upgrade mode.

This is about Upgrading to a new release. But what about patching?

In-place patching

Patching of Grid Infrastructure has always been in-place and, I will not hide it, quite painful.

If you wanted to patch a Grid Infrastructure before release 12cR2, you had to:

  • read the documentation carefully and check for possible conflicts
  • backup the Grid Home
  • copy the patch on the host
  • evacuate all the services and databases from the cluster node that you want to patch
  • patch the binaries (depending on the versions and patches, this might be easy with opatchauto or quite painful with manual unlocking/locking and manual opatch steps)
  • restart/relocate the services back on the node
  • repeat the tasks for every node

The disadvantages of in-place patching are many:

  • Need to stage the patch on every node
  • Need to repeat the patching process for every node
  • No easy rollback (some bad problems might lead to deconfiguring the cluster from one node and then adding it back to the cluster)

Out-of-place patching

Out-of-place patching is proven to be much a better solution. I am doing it regularly since a while for Oracle Database homes and I am very satisfied with it. I am implementing it at CERN as well, and it will unlock new levels of server consolidation 🙂

I have written a blog series here, and presented about it a few times.

But out-of-place patching for Grid Infrastructure is VERY recent.

12cR2: opatchauto 

Oracle 12cR2 introduced out-of-place patching as a new feature of opatchauto.

This MOS document explains it quite in detail:

Grid Infrastructure Out of Place ( OOP ) Patching using opatchauto (Doc ID 2419319.1)

The process is the following:

  • a preparation process clones the active Oracle Home on the current node and patches it
  • a switch process switches the active Oracle Home from the old one to the prepared clone
  • those two phases are repeated for each node

12cr2-oop

The good thing is that the preparation can be done in advance on all the nodes and the switch can be triggered only if all the clones are patched successfully.

However, the staging of the patch, the cloning and patching must still happen on every node, making the concept of golden images quite useless for patching.

It is worth to mention, at this point, that Grid Infrastructure Golden Images ARE A THING, and that they have been introduced by Rapid Home Provisioning release 12cR2, where cluster automatic provisioning has been included as a new feature.

This Grid Infrastructure golden images have already been mentioned here and here.

I have discussed about Rapid Home provisioning itself here, but I will ad a couple of thoughts in the next paragraph.

18c and the brand new Independent local-mode Automaton

I have been early tester of the Rapid Home Provisioning product, when it has been released with Oracle 12.1.0.2. I have presented about it at UKOUG and as a RAC SIG webinar.
https://www.youtube.com/watch?v=vaB4RWjYPq0
http://www.ludovicocaldara.net/dba/rhp-presentation/

I liked the product A LOT, despite a few bugs due to the initial release. The concept of out-of-placing patching that RHP uses is the best one, in my opinion, to cope with frequent patches and upgrades.

Now, with Oracle 18c, the Rapid Home Provisioning Independent Local-mode Automaton comes to play. There is not that much documentation about it, even in the Oracle documentation, but a few things are clear:

  • The Independent local-mode automaton comes without additional licenses as it is not part of the RHP Server/Client infrastructure
  • It is 100% local to the cluster where it is used
  • Its main “job” is to allow moving Grid Infrastructure Homes from a non-patched version to an out-of-place patched one.

$ rhpctl move gihome –sourcehome Oracle_home_path -destinationhome Oracle_home_path

I will not disclore more here, as the rest of this blog series is focused on this new product 🙂

Stay tuned for details, examples and feedback from its usage at CERN 😉

Ludo

Oracle Grid Infrastructure 18c patching part 2: Independent Local-mode Automaton architecture and activation

$
0
0

The first important step before starting using the new Independent Local-mode Automaton is understanding which are its components inside a cluster.

Resources

Here’s the list of service that you will find when you install a Grid Infrastructure 18c:

# [ oracle@server1:/u01/app/oracle/home [15:14:41] [18.3.0.0.0 [GRID] SID=GRID] 0 ] #
$ crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
ora.MGMT.GHCHKPT.advm
               OFFLINE OFFLINE      server1                STABLE
               OFFLINE OFFLINE      server2                STABLE
ora.MGMT.dg
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
ora.OCRVOT.dg
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
ora.chad
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
ora.helper
               OFFLINE OFFLINE      server1                IDLE,STABLE
               OFFLINE OFFLINE      server2                STABLE
ora.mgmt.ghchkpt.acfs
               OFFLINE OFFLINE      server1                STABLE
               OFFLINE OFFLINE      server2                STABLE
ora.net1.network
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
ora.ons
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
ora.proxy_advm
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       server2                STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       server1                STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       server1                169.254.17.12 10.30.
                                                             200.73,STABLE
ora.asm
      1        ONLINE  ONLINE       server1                Started,STABLE
      2        ONLINE  ONLINE       server2                Started,STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cvu
      1        ONLINE  ONLINE       server1                STABLE
ora.server1.vip
      1        ONLINE  ONLINE       server1                STABLE
ora.server2.vip
      1        ONLINE  ONLINE       server2                STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       server1                Open,STABLE
ora.qosmserver
      1        ONLINE  ONLINE       server1                STABLE
ora.rhpserver
      1        OFFLINE OFFLINE                               STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       server2                STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       server1                STABLE
--------------------------------------------------------------------------------

As you can see, there are 4 components that are OFFLINE by default:

Three local resources (that are present on each node):

  • ora.MGMT.GHCHKPT.advm
  • ora.mgmt.ghchkpt.acfs
  • ora.helper

One cluster resource (active on only one server at a time, it can relocate):

  • ora.rhpserver

If you have ever worked with 12c Rapid Home Provisioning, those name should sound familiar.

The GHCHKPT filesystem (ant its relative volume), is used to store some data regarding the ongoing operations across the cluster during the GI home move.

The ora.helper is the process that actually does the operations. It is local because each node needs it to execute some actions at some point.

The rhpserver is the server process that coordinates the operations and delegates them to the helpers.

All those services compose the independent local-mode automaton, that is the default deployment. The full RHP framework (RHP Server and RHP Client) might be configured instead with some additional work.

Important note: Just a few weeks ago Oracle changed the name of Rapid Home Provisioning (RHP) to Fleet Patching and Provisioning (FPP). The name is definitely more appealing now, but it generates again some confusion about product names and acronyms, so beware that in this series sometimes I refer to RHP, sometimes to FPP, but actually it is the same thing.

Tomcat?

You might have noticed that tomcat is deployed now in the GI home, as there are patches specific to it (here I paste the 18.4 version):

$ opatch lspatches
28655963;DBWLM RELEASE UPDATE 18.4.0.0.0 (28655963)
28655784;Database Release Update : 18.4.0.0.181016 (28655784)
28655916;ACFS RELEASE UPDATE 18.4.0.0.0 (28655916)
28656071;OCW RELEASE UPDATE 18.4.0.0.0 (28656071)
28547619;TOMCAT RELEASE UPDATE 18.0.0.0.0 (28547619)
27908644;UPDATE 18.3 DATABASE CLIENT JDK IN ORACLE HOME TO JDK8U171
27923415;OJVM RELEASE UPDATE: 18.3.0.0.180717 (27923415)

 

Indeed Tomcat is registered in the inventory and patched just like any other product inside the OH:

<COMP NAME="oracle.tomcat.crs" VER="18.0.0.0.0" BUILD_NUMBER="0" BUILD_TIME="20180207.193003" REP_VER="0.0.0.0.0" RELEASE="Production" INV_LOC="Components/oracle.tomcat.crs/18.0.0.0.0/1/" LANGS="ALL_LANGS" XML_INV_LOC="Components21/oracle.tomcat.crs/18.0.0.0.0/" ACT_INST_VER="12.2.0.4.0" DEINST_VER="11.2.0.0.0" INSTALL_TIME="2018.Nov.05 13:27:32 CET" INST_LOC="/u01/app/grid/crs1840/tomcat">
   <EXT_NAME>Tomcat Container</EXT_NAME>
   <DESC>Packages files from the Tomcat Container.</DESC>
   <DESCID>COMPONENT_DESC</DESCID>
   <STG_INFO OSP_VER="10.2.0.0.0"/>
   <CMP_JAR_INFO>
      <INFO NAME="filemapObj" VAL="Components/oracle/tomcat/crs/v18_0_0_0_0/filemap.xml"/>
      <INFO NAME="helpDir" VAL="Components/oracle/tomcat/crs/v18_0_0_0_0/help/"/>
      <INFO NAME="actionsClass" VAL="Components.oracle.tomcat.crs.v18_0_0_0_0.CompActions"/>
      <INFO NAME="resourceClass" VAL="Components.oracle.tomcat.crs.v18_0_0_0_0.resources.CompRes"/>
      <INFO NAME="identifiersXML" VAL="Components/oracle/tomcat/crs/v18_0_0_0_0/identifiers.xml"/>
      <INFO NAME="contextClass" VAL="Components.oracle.tomcat.crs.v18_0_0_0_0.CompContext"/>
      <INFO NAME="fastCopyLogXML" VAL="Components/oracle/tomcat/crs/v18_0_0_0_0/fastCopyLog.xml"/>
   </CMP_JAR_INFO>
   <LOC_INFO INST_DFN_LOC="../Scripts" JAR_NAME="install1.jar"/>
   <BOOK NAME="oracle.tomcat.crs.hs"/>
   <PRE_REQ DEF="F"/>
   <PROD_HOME DEF="F"/>
   <LANG_IDX_MAP>
      <LANG LIST="en fr ar bn pt_BR bg fr_CA ca hr cs da nl ar_EG en_GB et fi de el iw hu is in it ja ko es lv lt ms es_MX no pl pt ro ru zh_CN sk sl es_ES sv th zh_TW tr uk vi"/>
      <LANGSET IDX="1" BITSET="{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44}"/>
   </LANG_IDX_MAP>
   <PLAT_IDX_MAP>
      <PLAT LIST="46"/>
      <PLATSET IDX="1" BITSET="{0}"/>
   </PLAT_IDX_MAP>
   <DST_IDX_MAP>
      <DST LIST="%ORACLE_HOME% %INVENTORY_LOCATION%"/>
   </DST_IDX_MAP>
   <DEP_GRP_LIST/>
   <DEP_LIST/>
   <REF_LIST>
      <REF NAME="oracle.crs" VER="18.0.0.0.0" HOME_IDX="3"/>
   </REF_LIST>
   <INST_TYPE_LIST>
      <INST_TYPE NAME="Complete" NAME_ID="Maximum" DESC_ID=""/>
   </INST_TYPE_LIST>
   <FILESIZEINFO>
      <DEST VOLUME="%ORACLE_HOME%" SPACE_REQ="3375301"/>
      <DEST VOLUME="%INVENTORY_LOCATION%" SPACE_REQ="2000"/>
   </FILESIZEINFO>
</COMP>

 

# [ oracle@server2:/u01/app/grid/crs1830/inventory/Components21/oracle.tomcat.crs/18.0.0.0.0 [08:56:06] [18.3.0.0.0 [GRID] SID=GRID] 0 ] #
$ vi context.xml

<?xml version="1.0" standalone="yes" ?>
<!-- Copyright (c) 1999, 2018, Oracle and/or its affiliates.
All rights reserved. -->
<!-- Do not modify the contents of this file by hand. -->
<COMP_CONTEXT>
   <VAR_LIST SIZE="0">
      <VAR NAME="PROD_HOME" TYPE="String" DESC_RES_ID="" SECURE="F" VAL="/u01/app/grid/crs1830/tomcat" ADV="F" CLONABLE="T" USER_INPUT="DEFAULT"/>
   </VAR_LIST>
   <CONST_LIST SIZE="2">
      <CONST NAME="COMPONENT_DESC" PLAT_SP="F" TYPE="String" TRANS="T" VAL="COMPONENT_DESC_ALL"/>
      <CONST NAME="COMPONENT_NAME" PLAT_SP="F" TYPE="String" TRANS="F" VAL="Tomcat Container"/>
   </CONST_LIST>
</COMP_CONTEXT>

Out of the box, Tomcat is used for the Quality of Services Management (ora.qosmserver resource):

$ ps -eaf | grep tomcat
oracle    58746 142151  0 13:10 pts/1    00:00:00 grep --color=auto tomcat
oracle   108610      1  0 Dec04 ?        00:25:33 /CRS/dbs01/crs1830/jdk/bin/java -server -Xms128M -Xmx384M -Djava.awt.headless=true -Ddisable.checkForUpdate=true -Djava.util.logging.config.file=/ORA/dbs01/oracle/crsdata/itrac1602/qos/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -DTRACING.ENABLED=false -Djava.rmi.server.hostname=itrac1602.cern.ch -Doracle.http.port=8888 -Doracle.jmx.port=23792 -Doracle.tls.enabled=false -Doracle.jwc.tls.http.enabled=false -Djava.security.manager -Djava.security.policy=/ORA/dbs01/oracle/crsdata/itrac1602/qos/conf/catalina.policy -Djava.security.egd=file:/dev/urandom -Dcatalina.home=/CRS/dbs01/crs1840/tomcat -Dcatalina.base=/ORA/dbs01/oracle/crsdata/itrac1602/qos -Djava.io.tmpdir=/ORA/dbs01/oracle/crsdata/itrac1602/qos/temp -Doracle.home=/CRS/dbs01/crs1840 -classpath /CRS/dbs01/crs1840/tomcat/lib/tomcat-juli.jar:/CRS/dbs01/crs1840/tomcat/lib/bootstrap.jar:/CRS/dbs01/crs1840/jlib/jwc-logging.jar org.apache.catalina.startup.Bootstrap start

But it is used for the Independent Local Mode Automaton as well, when it is started.

Enabling and starting the independent local-mode automaton

The resources are started using the following commands (as root, the order is quite important):

# /u01/app/grid/crs1830/bin/srvctl enable volume -volume GHCHKPT  -diskgroup mgmt
# /u01/app/grid/crs1830/bin/srvctl enable filesystem -volume GHCHKPT -diskgroup mgmt
# /u01/app/grid/crs1830/bin/srvctl start filesystem -volume GHCHKPT -diskgroup mgmt

Before continuing with the rhpserver resource, you might want to check if the filesystem is mounted:

$ crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
ora.MGMT.GHCHKPT.advm
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
ora.MGMT.dg
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
ora.OCRVOT.dg
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
ora.chad
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
ora.helper
               OFFLINE OFFLINE      server1                IDLE,STABLE
               OFFLINE OFFLINE      server2                STABLE
ora.mgmt.ghchkpt.acfs
               ONLINE  ONLINE       server1                mounted on /opt/orac
                                                             le/rhp_images/chkbas
                                                             e,STABLE
               ONLINE  ONLINE       server2                mounted on /opt/orac
                                                             le/rhp_images/chkbas
                                                             e,STABLE
ora.net1.network
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
ora.ons
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
ora.proxy_advm
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       server2                STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       server1                STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       server1                169.254.17.12 10.30.
                                                             200.73,STABLE
ora.asm
      1        ONLINE  ONLINE       server1                Started,STABLE
      2        ONLINE  ONLINE       server2                Started,STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cvu
      1        ONLINE  ONLINE       server1                STABLE
ora.server1.vip
      1        ONLINE  ONLINE       server1                STABLE
ora.server2.vip
      1        ONLINE  ONLINE       server2                STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       server1                Open,STABLE
ora.qosmserver
      1        ONLINE  ONLINE       server1                STABLE
ora.rhpserver
      1        OFFLINE OFFLINE                               STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       server2                STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       server1                STABLE
--------------------------------------------------------------------------------


[root@server2 dbs01]# df -k | grep ghchkpt
/dev/asm/ghchkpt-213              1572864   499572   1073292  32% /opt/oracle/rhp_images/chkbase

Now the rhpserver should start without problems, as oracle:

# [ oracle@server1:/u01/app/oracle/home [17:00:49] [18.3.0.0.0 [GRID] SID=GRID] 0 ] #
$ srvctl start rhpserver

Please note that if you omit to activate the filesystem first, the rhpserver will fail to start.

As you can see, now both rhpserver and the helper are online:

# [ oracle@server1:/u01/app/oracle/home [17:00:49] [18.3.0.0.0 [GRID] SID=GRID] 0 ] #
$ srvctl start rhpserver

# [ oracle@server1:/u01/app/oracle/home [17:02:39] [18.3.0.0.0 [GRID] SID=GRID] 0 ] #
$ crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
ora.MGMT.GHCHKPT.advm
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
ora.MGMT.dg
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
ora.OCRVOT.dg
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
ora.chad
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
ora.helper
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
ora.mgmt.ghchkpt.acfs
               ONLINE  ONLINE       server1                mounted on /opt/orac
                                                             le/rhp_images/chkbas
                                                             e,STABLE
               ONLINE  ONLINE       server2                mounted on /opt/orac
                                                             le/rhp_images/chkbas
                                                             e,STABLE
ora.net1.network
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
ora.ons
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
ora.proxy_advm
               ONLINE  ONLINE       server1                STABLE
               ONLINE  ONLINE       server2                STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       server2                STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       server1                STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       server1                169.254.17.12 10.30.
                                                             200.73,STABLE
ora.asm
      1        ONLINE  ONLINE       server1                Started,STABLE
      2        ONLINE  ONLINE       server2                Started,STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cvu
      1        ONLINE  ONLINE       server1                STABLE
ora.server1.vip
      1        ONLINE  ONLINE       server1                STABLE
ora.server2.vip
      1        ONLINE  ONLINE       server2                STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       server1                Open,STABLE
ora.qosmserver
      1        ONLINE  ONLINE       server1                STABLE
ora.rhpserver
      1        ONLINE  ONLINE       server2                STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       server2                STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       server1                STABLE
--------------------------------------------------------------------------------

# [ oracle@server2:/u01/app/grid/crs1830 [08:59:43] [18.3.0.0.0 [GRID] SID=GRID] 0 ] #
$ ps -eaf | grep tomca
oracle   132330      1 15 08:48 ?        00:01:39 /u01/app/grid/crs1830/jdk/bin/java -server -Xms128M -Xmx384M -Djava.awt.headless=true -Ddisable.checkForUpdate=true -Djava.util.logging.config.file=/u01/app/oracle/crsdata/server2/rhp/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -DTRACING.ENABLED=false -Djava.rmi.server.hostname=server2.cern.ch -Doracle.http.port=8894 -Doracle.jmx.port=23795 -Doracle.tls.enabled=false -Doracle.jwc.tls.http.enabled=true -Doracle.rhp.storagebase=/opt/oracle/rhp_images -Djava.security.egd=file:/dev/urandom -Dcatalina.home=/u01/app/grid/crs1830/tomcat -Dcatalina.base=/u01/app/oracle/crsdata/server2/rhp -Djava.io.tmpdir=/u01/app/oracle/crsdata/server2/rhp/temp -Doracle.home=/u01/app/grid/crs1830 -classpath /u01/app/grid/crs1830/tomcat/lib/tomcat-juli.jar:/u01/app/grid/crs1830/tomcat/lib/bootstrap.jar:/u01/app/grid/crs1830/jlib/jwc-logging.jar org.apache.catalina.startup.Bootstrap start

Now all is set to start using it!

We’ll see how to use it in the next posts.

Ludo

 

Oracle Grid Infrastructure 18c patching part 3: Executing out-of-place patching with the local-mode automaton

$
0
0

I wish I had more time to blog in the recent weeks. Sorry for the delay in this blog series 🙂

If you have not read the two previous blog posts, please do it now. I suppose here that you have the Independent Local-Mode Automaton already enabled.

What does the Independent Local-mode Automaton?

The automaton automates the process of moving the active Grid Infrastructure Oracle Home from the current one to a new one. The new one can be either at a higher patch level or at a lower one. Of course, you will probably want to patch your grid infrastructure, going then to a higher level of patching.

Preparing the new Grid Infrastructure Oracle Home

The GI home, starting from 12.2, is just a zip that is extracted directly in the new Oracle Home. In this blog post I suppose that you want to patch your Grid Infrastructure from an existing 18.3 to a brand new 18.4 (18.5 will be released very soon).

So, if your current OH is /u01/app/grid/crs1830, you might want to prepare the new home in /u01/app/grid/crs1840 by unzipping the software and then patching using the steps described here.

If you already have a golden image with the correct version, you can unzip it directly.

Beware of four important things: 

  1. You have to register the new Oracle home in the Central Inventory using the SW_ONLY install, as  described here.
  2. You must do it for all the nodes in the cluster prior to upgrading
  3. The response file must contain the same groups (DBA, OPER, etc) as the current active Home, otherwise errors will appear.
  4. You must relink by hand your Oracle binaries with the RAC option:
    $ cd /u01/app/grid/1crs1840/rdbms/lib
    $ make -f ins_rdbms.mk rac_on ioracle

In fact, after every attach to the central inventory the binaries are relinked without RAC option, so it is important to activate RAC again to avoid bad problems when upgrading the ASM with the new Automaton.

Executing the move gihome

If everything is correct, you should have now the current and new Oracle Homes, correctly registered in the Central Inventory, with the RAC option activated.

You can now do a first eval to check if everything looks good:

# [ oracle@server1:/u01/app/oracle/home [12:01:52] [18.3.0.0.0 [GRID] SID=GRID] 0 ] #
$ rhpctl move gihome -sourcehome /u01/app/grid/crs1830 -desthome /u01/app/grid/crs1840 -eval
server2.cern.ch: Audit ID: 4
server2.cern.ch: Evaluation in progress for "move gihome" ...
server2.cern.ch: verifying versions of Oracle homes ...
server2.cern.ch: verifying owners of Oracle homes ...
server2.cern.ch: verifying groups of Oracle homes ...
server2.cern.ch: Evaluation finished successfully for "move gihome".

My personal suggestion at least at your first experiences with the automaton, is to move the Oracle Home on one node at a time. This way, YOU control the relocation of the services and resources before doing the actual move operation.

Here is the execution for the first node:

# [ oracle@server1:/u01/app/oracle/home [15:17:26] [18.3.0.0.0 [GRID] SID=GRID] 0 ] #
$ rhpctl move gihome -sourcehome /u01/app/grid/crs1830 -desthome /u01/app/grid/crs1840 -node server1
server2.cern.ch: Audit ID: 4
server2.cern.ch: verifying versions of Oracle homes ...
server2.cern.ch: verifying owners of Oracle homes ...
server2.cern.ch: verifying groups of Oracle homes ...
server2.cern.ch: starting to move the Oracle Grid Infrastructure home from "/u01/app/grid/crs1830" to "/u01/app/grid/crs1840" on server cluster "CRSTEST-RAC16"
server2.cern.ch: Executing prepatch and postpatch on nodes: "server1".
server2.cern.ch: Executing root script on nodes [server1].
server2.cern.ch: Successfully executed root script on nodes [server1].
server2.cern.ch: Executing root script on nodes [server1].
Using configuration parameter file: /u01/app/grid/crs1840/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/oracle/crsdata/server1/crsconfig/crs_postpatch_server1_2018-11-14_03-27-43PM.log
Oracle Clusterware active version on the cluster is [18.0.0.0.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [70732493].
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'server1'
CRS-2673: Attempting to stop 'ora.crsd' on 'server1'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on server 'server1'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN2.lsnr' on 'server1'
CRS-2673: Attempting to stop 'ora.mgmt.ghchkpt.acfs' on 'server1'
CRS-2673: Attempting to stop 'ora.helper336.hlp' on 'server1'
CRS-2673: Attempting to stop 'ora.chad' on 'server1'
CRS-2673: Attempting to stop 'ora.chad' on 'server2'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'server1'
CRS-2673: Attempting to stop 'ora.OCRVOT.dg' on 'server1'
CRS-2673: Attempting to stop 'ora.MGMT.dg' on 'server1'
CRS-2673: Attempting to stop 'ora.helper' on 'server1'
CRS-2673: Attempting to stop 'ora.cvu' on 'server1'
CRS-2673: Attempting to stop 'ora.qosmserver' on 'server1'
CRS-2677: Stop of 'ora.helper336.hlp' on 'server1' succeeded
CRS-2677: Stop of 'ora.OCRVOT.dg' on 'server1' succeeded
CRS-2677: Stop of 'ora.MGMT.dg' on 'server1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'server1'
CRS-2677: Stop of 'ora.LISTENER_SCAN2.lsnr' on 'server1' succeeded
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'server1' succeeded
CRS-2673: Attempting to stop 'ora.scan2.vip' on 'server1'
CRS-2677: Stop of 'ora.helper' on 'server1' succeeded
CRS-2677: Stop of 'ora.cvu' on 'server1' succeeded
CRS-2677: Stop of 'ora.scan2.vip' on 'server1' succeeded
CRS-2677: Stop of 'ora.asm' on 'server1' succeeded
CRS-2673: Attempting to stop 'ora.ASMNET1LSNR_ASM.lsnr' on 'server1'
CRS-2677: Stop of 'ora.mgmt.ghchkpt.acfs' on 'server1' succeeded
CRS-2673: Attempting to stop 'ora.MGMT.GHCHKPT.advm' on 'server1'
CRS-2677: Stop of 'ora.MGMT.GHCHKPT.advm' on 'server1' succeeded
CRS-2673: Attempting to stop 'ora.proxy_advm' on 'server1'
CRS-2677: Stop of 'ora.chad' on 'server2' succeeded
CRS-2677: Stop of 'ora.chad' on 'server1' succeeded
CRS-2673: Attempting to stop 'ora.mgmtdb' on 'server1'
CRS-2677: Stop of 'ora.qosmserver' on 'server1' succeeded
CRS-2677: Stop of 'ora.ASMNET1LSNR_ASM.lsnr' on 'server1' succeeded
CRS-2677: Stop of 'ora.proxy_advm' on 'server1' succeeded
CRS-2677: Stop of 'ora.mgmtdb' on 'server1' succeeded
CRS-2673: Attempting to stop 'ora.MGMTLSNR' on 'server1'
CRS-2677: Stop of 'ora.MGMTLSNR' on 'server1' succeeded
CRS-2673: Attempting to stop 'ora.server1.vip' on 'server1'
CRS-2677: Stop of 'ora.server1.vip' on 'server1' succeeded
CRS-2672: Attempting to start 'ora.MGMTLSNR' on 'server2'
CRS-2672: Attempting to start 'ora.qosmserver' on 'server2'
CRS-2672: Attempting to start 'ora.scan2.vip' on 'server2'
CRS-2672: Attempting to start 'ora.cvu' on 'server2'
CRS-2672: Attempting to start 'ora.server1.vip' on 'server2'
CRS-2676: Start of 'ora.cvu' on 'server2' succeeded
CRS-2676: Start of 'ora.server1.vip' on 'server2' succeeded
CRS-2676: Start of 'ora.MGMTLSNR' on 'server2' succeeded
CRS-2672: Attempting to start 'ora.mgmtdb' on 'server2'
CRS-2676: Start of 'ora.scan2.vip' on 'server2' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN2.lsnr' on 'server2'
CRS-2676: Start of 'ora.LISTENER_SCAN2.lsnr' on 'server2' succeeded
CRS-2676: Start of 'ora.qosmserver' on 'server2' succeeded
CRS-2676: Start of 'ora.mgmtdb' on 'server2' succeeded
CRS-2672: Attempting to start 'ora.chad' on 'server2'
CRS-2676: Start of 'ora.chad' on 'server2' succeeded
CRS-2673: Attempting to stop 'ora.ons' on 'server1'
CRS-2677: Stop of 'ora.ons' on 'server1' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'server1'
CRS-2677: Stop of 'ora.net1.network' on 'server1' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'server1' has completed
CRS-2677: Stop of 'ora.crsd' on 'server1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'server1'
CRS-2673: Attempting to stop 'ora.crf' on 'server1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'server1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'server1'
CRS-2677: Stop of 'ora.drivers.acfs' on 'server1' succeeded
CRS-2677: Stop of 'ora.crf' on 'server1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'server1' succeeded
CRS-2677: Stop of 'ora.asm' on 'server1' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'server1'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'server1' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'server1'
CRS-2673: Attempting to stop 'ora.evmd' on 'server1'
CRS-2677: Stop of 'ora.ctssd' on 'server1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'server1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'server1'
CRS-2677: Stop of 'ora.cssd' on 'server1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'server1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'server1'
CRS-2677: Stop of 'ora.gipcd' on 'server1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'server1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'server1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
2018/11/14 15:30:10 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.service'
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'server1'
CRS-2672: Attempting to start 'ora.evmd' on 'server1'
CRS-2676: Start of 'ora.mdnsd' on 'server1' succeeded
CRS-2676: Start of 'ora.evmd' on 'server1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'server1'
CRS-2676: Start of 'ora.gpnpd' on 'server1' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'server1'
CRS-2676: Start of 'ora.gipcd' on 'server1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'server1'
CRS-2672: Attempting to start 'ora.crf' on 'server1'
CRS-2676: Start of 'ora.cssdmonitor' on 'server1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'server1'
CRS-2672: Attempting to start 'ora.diskmon' on 'server1'
CRS-2676: Start of 'ora.diskmon' on 'server1' succeeded
CRS-2676: Start of 'ora.crf' on 'server1' succeeded
CRS-2676: Start of 'ora.cssd' on 'server1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'server1'
CRS-2672: Attempting to start 'ora.ctssd' on 'server1'
CRS-2676: Start of 'ora.ctssd' on 'server1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'server1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'server1'
CRS-2676: Start of 'ora.asm' on 'server1' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'server1'
CRS-2676: Start of 'ora.storage' on 'server1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'server1'
CRS-2676: Start of 'ora.crsd' on 'server1' succeeded
CRS-6017: Processing resource auto-start for servers: server1
CRS-2673: Attempting to stop 'ora.server1.vip' on 'server2'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'server2'
CRS-2672: Attempting to start 'ora.ons' on 'server1'
CRS-2672: Attempting to start 'ora.chad' on 'server1'
CRS-2677: Stop of 'ora.server1.vip' on 'server2' succeeded
CRS-2672: Attempting to start 'ora.server1.vip' on 'server1'
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'server2' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'server2'
CRS-2677: Stop of 'ora.scan1.vip' on 'server2' succeeded
CRS-2672: Attempting to start 'ora.scan1.vip' on 'server1'
CRS-2676: Start of 'ora.chad' on 'server1' succeeded
CRS-2676: Start of 'ora.server1.vip' on 'server1' succeeded
CRS-2672: Attempting to start 'ora.LISTENER.lsnr' on 'server1'
CRS-2676: Start of 'ora.scan1.vip' on 'server1' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN1.lsnr' on 'server1'
CRS-2676: Start of 'ora.LISTENER.lsnr' on 'server1' succeeded
CRS-2679: Attempting to clean 'ora.asm' on 'server1'
CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'server1' succeeded
CRS-2681: Clean of 'ora.asm' on 'server1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'server1'
CRS-2676: Start of 'ora.ons' on 'server1' succeeded
ORA-15150: instance lock mode 'EXCLUSIVE' conflicts with other ASM instance(s)
CRS-2674: Start of 'ora.asm' on 'server1' failed
CRS-2672: Attempting to start 'ora.asm' on 'server1'
ORA-15150: instance lock mode 'EXCLUSIVE' conflicts with other ASM instance(s)
CRS-2674: Start of 'ora.asm' on 'server1' failed
CRS-2679: Attempting to clean 'ora.proxy_advm' on 'server1'
CRS-2681: Clean of 'ora.proxy_advm' on 'server1' succeeded
CRS-2672: Attempting to start 'ora.proxy_advm' on 'server1'
CRS-2676: Start of 'ora.proxy_advm' on 'server1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'server1'
ORA-15150: instance lock mode 'EXCLUSIVE' conflicts with other ASM instance(s)
CRS-2674: Start of 'ora.asm' on 'server1' failed
CRS-2672: Attempting to start 'ora.MGMT.GHCHKPT.advm' on 'server1'
CRS-2676: Start of 'ora.MGMT.GHCHKPT.advm' on 'server1' succeeded
CRS-2672: Attempting to start 'ora.mgmt.ghchkpt.acfs' on 'server1'
CRS-2676: Start of 'ora.mgmt.ghchkpt.acfs' on 'server1' succeeded
===== Summary of resource auto-start failures follows =====
CRS-2807: Resource 'ora.asm' failed to start automatically.
CRS-6016: Resource auto-start has completed for server server1
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.
Oracle Clusterware active version on the cluster is [18.0.0.0.0]. The cluster upgrade state is [ROLLING PATCH]. The cluster active patch level is [70732493].
2018/11/14 15:35:23 CLSRSC-4015: Performing install or upgrade action for Oracle Trace File Analyzer (TFA) Collector.
2018/11/14 15:37:11 CLSRSC-4003: Successfully patched Oracle Trace File Analyzer (TFA) Collector.
2018/11/14 15:37:13 CLSRSC-672: Post-patch steps for patching GI home successfully completed.
server2.cern.ch: Successfully executed root script on nodes [server1].
server2.cern.ch: Updating inventory on nodes: server1.
========================================
server2.cern.ch:
Starting Oracle Universal Installer...

The inventory pointer is located at /etc/oraInst.loc
'UpdateNodeList' was successful.
server2.cern.ch: Updated inventory on nodes: server1.
server2.cern.ch: Updating inventory on nodes: server1.
========================================
server2.cern.ch:
Starting Oracle Universal Installer...

The inventory pointer is located at /etc/oraInst.loc
'UpdateNodeList' was successful.
server2.cern.ch: Updated inventory on nodes: server1.
server2.cern.ch: Continue by running 'rhpctl move gihome -destwc <workingcopy_name> -continue [-root | -sudouser <sudo_username> -sudopath <path_to_sudo_binary>]'.
server2.cern.ch: completed the move of Oracle Grid Infrastructure home on server cluster "CRSTEST-RAC16"

From this output you can see at line 15 that the cluster status is NORMAL, then the cluster is stopped on node 1 (lines 16 to 100), then the active version is modified in the oracle-ohasd.service file (line 101), then started back with the new version (lines 102 to 171). The cluster status now is ROLLING PATCH (line 172). The TFA and the node list are updated. 

Before continuing with the other(s) node(s), make sure that all the resources are up & running:

# [ oracle@server1:/u01/app/oracle/home [15:37:26] [18.3.0.0.0 [GRID] SID=GRID] 0 ] #
$ crss
HA Resource                                   Targets                          States
-----------                                   -----------------------------    ----------------------------------------
ora.ASMNET1LSNR_ASM.lsnr                      ONLINE,ONLINE                    ONLINE on server1,ONLINE on server2
ora.LISTENER.lsnr                             ONLINE,ONLINE                    ONLINE on server1,ONLINE on server2
ora.LISTENER_SCAN1.lsnr                       ONLINE                           ONLINE on server1
ora.LISTENER_SCAN2.lsnr                       ONLINE                           ONLINE on server2
ora.MGMT.GHCHKPT.advm                         ONLINE,ONLINE                    ONLINE on server1,ONLINE on server2
ora.MGMT.dg                                   ONLINE,ONLINE                    OFFLINE,ONLINE on server2
ora.MGMTLSNR                                  ONLINE                           ONLINE on server2
ora.OCRVOT.dg                                 OFFLINE,ONLINE                   OFFLINE,ONLINE on server2
ora.asm                                       ONLINE,ONLINE,OFFLINE            OFFLINE,ONLINE on server2,OFFLINE
ora.chad                                      ONLINE,ONLINE                    ONLINE on server1,ONLINE on server2
ora.cvu                                       ONLINE                           ONLINE on server2
ora.helper                                    ONLINE,ONLINE                    ONLINE on server1,ONLINE on server2
ora.helper336.hlp                             ONLINE,ONLINE                    ONLINE on server1,ONLINE on server2
ora.server1.vip                             ONLINE                           ONLINE on server1
ora.server2.vip                             ONLINE                           ONLINE on server2
ora.mgmt.ghchkpt.acfs                         ONLINE,ONLINE                    ONLINE on server1,ONLINE on server2
ora.mgmtdb                                    ONLINE                           ONLINE on server2
ora.net1.network                              ONLINE,ONLINE                    ONLINE on server1,ONLINE on server2
ora.ons                                       ONLINE,ONLINE                    ONLINE on server1,ONLINE on server2
ora.proxy_advm                                ONLINE,ONLINE                    ONLINE on server1,ONLINE on server2
ora.qosmserver                                ONLINE                           ONLINE on server2
ora.rhpserver                                 ONLINE                           ONLINE on server2
ora.scan1.vip                                 ONLINE                           ONLINE on server1
ora.LISTENER_LEAF.lsnr
ora.scan2.vip                                 ONLINE                           ONLINE on server2



# [ oracle@server1:/u01/app/oracle/home [15:52:10] [18.4.0.0.0 [GRID] SID=GRID] 1 ] #
$ crsctl query crs releasepatch
Oracle Clusterware release patch level is [59717688] and the complete list of patches [27908644 27923415 28090523 28090553 28090557 28256701 28547619 28655784 28655916 28655963 28656071 ] have been applied on the local node. The release patch string is [18.4.0.0.0].

You might want as well to relocate manually your resources back to node 1 prior to continuing on node 2.

After that, node 2 can follow the very same procedure:

# [ oracle@server1:/u01/app/oracle/home [15:54:30] [18.4.0.0.0 [GRID] SID=GRID] 130 ] #
$ rhpctl move gihome -sourcehome /u01/app/grid/crs1830 -desthome /u01/app/grid/crs1840 -node server2
server2.cern.ch: Audit ID: 51
server2.cern.ch: Executing prepatch and postpatch on nodes: "server2".
server2.cern.ch: Executing root script on nodes [server2].
server2.cern.ch: Successfully executed root script on nodes [server2].
server2.cern.ch: Executing root script on nodes [server2].
Using configuration parameter file: /u01/app/grid/crs1840/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/oracle/crsdata/server2/crsconfig/crs_postpatch_server2_2018-11-14_03-58-21PM.log
Oracle Clusterware active version on the cluster is [18.0.0.0.0]. The cluster upgrade state is [ROLLING PATCH]. The cluster active patch level is [70732493].
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'server2'
CRS-2673: Attempting to stop 'ora.crsd' on 'server2'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on server 'server2'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN2.lsnr' on 'server2'
CRS-2673: Attempting to stop 'ora.cvu' on 'server2'
CRS-2673: Attempting to stop 'ora.rhpserver' on 'server2'
CRS-2673: Attempting to stop 'ora.OCRVOT.dg' on 'server2'
CRS-2673: Attempting to stop 'ora.MGMT.dg' on 'server2'
CRS-2673: Attempting to stop 'ora.qosmserver' on 'server2'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'server2'
CRS-2673: Attempting to stop 'ora.chad' on 'server1'
CRS-2673: Attempting to stop 'ora.chad' on 'server2'
CRS-2673: Attempting to stop 'ora.helper336.hlp' on 'server2'
CRS-2673: Attempting to stop 'ora.helper' on 'server2'
CRS-2677: Stop of 'ora.LISTENER_SCAN2.lsnr' on 'server2' succeeded
CRS-2673: Attempting to stop 'ora.scan2.vip' on 'server2'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'server2' succeeded
CRS-2677: Stop of 'ora.chad' on 'server1' succeeded
CRS-2677: Stop of 'ora.chad' on 'server2' succeeded
CRS-2673: Attempting to stop 'ora.mgmtdb' on 'server2'
CRS-2677: Stop of 'ora.OCRVOT.dg' on 'server2' succeeded
CRS-2677: Stop of 'ora.MGMT.dg' on 'server2' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'server2'
CRS-2677: Stop of 'ora.helper336.hlp' on 'server2' succeeded
CRS-2677: Stop of 'ora.helper' on 'server2' succeeded
CRS-2677: Stop of 'ora.scan2.vip' on 'server2' succeeded
CRS-2677: Stop of 'ora.asm' on 'server2' succeeded
CRS-2673: Attempting to stop 'ora.ASMNET1LSNR_ASM.lsnr' on 'server2'
CRS-2677: Stop of 'ora.cvu' on 'server2' succeeded
CRS-2677: Stop of 'ora.qosmserver' on 'server2' succeeded
CRS-2677: Stop of 'ora.ASMNET1LSNR_ASM.lsnr' on 'server2' succeeded
CRS-2677: Stop of 'ora.mgmtdb' on 'server2' succeeded
CRS-2673: Attempting to stop 'ora.MGMTLSNR' on 'server2'
CRS-2677: Stop of 'ora.MGMTLSNR' on 'server2' succeeded
CRS-2673: Attempting to stop 'ora.server2.vip' on 'server2'
CRS-2672: Attempting to start 'ora.MGMTLSNR' on 'server1'
CRS-2677: Stop of 'ora.server2.vip' on 'server2' succeeded
CRS-2676: Start of 'ora.MGMTLSNR' on 'server1' succeeded
CRS-2672: Attempting to start 'ora.mgmtdb' on 'server1'
CRS-2676: Start of 'ora.mgmtdb' on 'server1' succeeded
CRS-2672: Attempting to start 'ora.chad' on 'server1'
CRS-2676: Start of 'ora.chad' on 'server1' succeeded
Stop JWC
CRS-5014: Agent "ORAROOTAGENT" timed out starting process "/u01/app/grid/crs1830/bin/ghappctl" for action "stop": details at "(:CLSN00009:)" in "/u01/app/oracle/diag/crs/server2/crs/trace/crsd_orarootagent_root.trc"
CRS-2675: Stop of 'ora.rhpserver' on 'server2' failed
CRS-2679: Attempting to clean 'ora.rhpserver' on 'server2'
CRS-2681: Clean of 'ora.rhpserver' on 'server2' succeeded
CRS-2673: Attempting to stop 'ora.mgmt.ghchkpt.acfs' on 'server2'
CRS-2677: Stop of 'ora.mgmt.ghchkpt.acfs' on 'server2' succeeded
CRS-2673: Attempting to stop 'ora.MGMT.GHCHKPT.advm' on 'server2'
CRS-2677: Stop of 'ora.MGMT.GHCHKPT.advm' on 'server2' succeeded
CRS-2673: Attempting to stop 'ora.proxy_advm' on 'server2'
CRS-2677: Stop of 'ora.proxy_advm' on 'server2' succeeded
CRS-2672: Attempting to start 'ora.qosmserver' on 'server1'
CRS-2672: Attempting to start 'ora.scan2.vip' on 'server1'
CRS-2672: Attempting to start 'ora.cvu' on 'server1'
CRS-2672: Attempting to start 'ora.server2.vip' on 'server1'
CRS-2676: Start of 'ora.cvu' on 'server1' succeeded
CRS-2676: Start of 'ora.server2.vip' on 'server1' succeeded
CRS-2676: Start of 'ora.scan2.vip' on 'server1' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN2.lsnr' on 'server1'
CRS-2676: Start of 'ora.LISTENER_SCAN2.lsnr' on 'server1' succeeded
CRS-2676: Start of 'ora.qosmserver' on 'server1' succeeded
CRS-2673: Attempting to stop 'ora.ons' on 'server2'
CRS-2677: Stop of 'ora.ons' on 'server2' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'server2'
CRS-2677: Stop of 'ora.net1.network' on 'server2' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'server2' has completed
CRS-2677: Stop of 'ora.crsd' on 'server2' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'server2'
CRS-2673: Attempting to stop 'ora.crf' on 'server2'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'server2'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'server2'
CRS-2677: Stop of 'ora.drivers.acfs' on 'server2' succeeded
CRS-2677: Stop of 'ora.crf' on 'server2' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'server2' succeeded
CRS-2677: Stop of 'ora.asm' on 'server2' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'server2'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'server2' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'server2'
CRS-2673: Attempting to stop 'ora.evmd' on 'server2'
CRS-2677: Stop of 'ora.ctssd' on 'server2' succeeded
CRS-2677: Stop of 'ora.evmd' on 'server2' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'server2'
CRS-2677: Stop of 'ora.cssd' on 'server2' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'server2'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'server2'
CRS-2677: Stop of 'ora.gpnpd' on 'server2' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'server2' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'server2' has completed
CRS-4133: Oracle High Availability Services has been stopped.
2018/11/14 16:01:42 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.service'
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'server2'
CRS-2672: Attempting to start 'ora.evmd' on 'server2'
CRS-2676: Start of 'ora.mdnsd' on 'server2' succeeded
CRS-2676: Start of 'ora.evmd' on 'server2' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'server2'
CRS-2676: Start of 'ora.gpnpd' on 'server2' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'server2'
CRS-2676: Start of 'ora.gipcd' on 'server2' succeeded
CRS-2672: Attempting to start 'ora.crf' on 'server2'
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'server2'
CRS-2676: Start of 'ora.cssdmonitor' on 'server2' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'server2'
CRS-2672: Attempting to start 'ora.diskmon' on 'server2'
CRS-2676: Start of 'ora.diskmon' on 'server2' succeeded
CRS-2676: Start of 'ora.crf' on 'server2' succeeded
CRS-2676: Start of 'ora.cssd' on 'server2' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'server2'
CRS-2672: Attempting to start 'ora.ctssd' on 'server2'
CRS-2676: Start of 'ora.ctssd' on 'server2' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'server2' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'server2'
CRS-2676: Start of 'ora.asm' on 'server2' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'server2'
CRS-2676: Start of 'ora.storage' on 'server2' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'server2'
CRS-2676: Start of 'ora.crsd' on 'server2' succeeded
CRS-6017: Processing resource auto-start for servers: server2
CRS-2673: Attempting to stop 'ora.server2.vip' on 'server1'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'server1'
CRS-2672: Attempting to start 'ora.ons' on 'server2'
CRS-2672: Attempting to start 'ora.chad' on 'server2'
CRS-2677: Stop of 'ora.server2.vip' on 'server1' succeeded
CRS-2672: Attempting to start 'ora.server2.vip' on 'server2'
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'server1' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'server1'
CRS-2677: Stop of 'ora.scan1.vip' on 'server1' succeeded
CRS-2672: Attempting to start 'ora.scan1.vip' on 'server2'
CRS-2676: Start of 'ora.server2.vip' on 'server2' succeeded
CRS-2672: Attempting to start 'ora.LISTENER.lsnr' on 'server2'
CRS-2676: Start of 'ora.chad' on 'server2' succeeded
CRS-2676: Start of 'ora.scan1.vip' on 'server2' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN1.lsnr' on 'server2'
CRS-2676: Start of 'ora.LISTENER.lsnr' on 'server2' succeeded
CRS-2679: Attempting to clean 'ora.asm' on 'server2'
CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'server2' succeeded
CRS-2681: Clean of 'ora.asm' on 'server2' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'server2'
CRS-2676: Start of 'ora.ons' on 'server2' succeeded
ORA-15150: instance lock mode 'EXCLUSIVE' conflicts with other ASM instance(s)
CRS-2674: Start of 'ora.asm' on 'server2' failed
CRS-2672: Attempting to start 'ora.asm' on 'server2'
ORA-15150: instance lock mode 'EXCLUSIVE' conflicts with other ASM instance(s)
CRS-2674: Start of 'ora.asm' on 'server2' failed
CRS-2679: Attempting to clean 'ora.proxy_advm' on 'server2'
CRS-2681: Clean of 'ora.proxy_advm' on 'server2' succeeded
CRS-2672: Attempting to start 'ora.proxy_advm' on 'server2'
CRS-2676: Start of 'ora.proxy_advm' on 'server2' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'server2'
ORA-15150: instance lock mode 'EXCLUSIVE' conflicts with other ASM instance(s)
CRS-2674: Start of 'ora.asm' on 'server2' failed
CRS-2672: Attempting to start 'ora.MGMT.GHCHKPT.advm' on 'server2'
CRS-2676: Start of 'ora.MGMT.GHCHKPT.advm' on 'server2' succeeded
CRS-2672: Attempting to start 'ora.mgmt.ghchkpt.acfs' on 'server2'
CRS-2676: Start of 'ora.mgmt.ghchkpt.acfs' on 'server2' succeeded
===== Summary of resource auto-start failures follows =====
CRS-2807: Resource 'ora.asm' failed to start automatically.
CRS-6016: Resource auto-start has completed for server server2
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.
Oracle Clusterware active version on the cluster is [18.0.0.0.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [59717688].

SQL Patching tool version 18.0.0.0.0 Production on Wed Nov 14 16:09:01 2018
Copyright (c) 2012, 2018, Oracle.  All rights reserved.

Log file for this invocation: /u01/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_181222_2018_11_14_16_09_01/sqlpatch_invocation.log

Connecting to database...OK
Gathering database info...done

Note:  Datapatch will only apply or rollback SQL fixes for PDBs
       that are in an open state, no patches will be applied to closed PDBs.
       Please refer to Note: Datapatch: Database 12c Post Patch SQL Automation
       (Doc ID 1585822.1)

Bootstrapping registry and package to current versions...done
Determining current state...done

Current state of interim SQL patches:
Interim patch 27923415 (OJVM RELEASE UPDATE: 18.3.0.0.180717 (27923415)):
  Binary registry: Installed
  PDB CDB$ROOT: Applied successfully on 13-NOV-18 04.35.06.794463 PM
  PDB GIMR_DSCREP_10: Applied successfully on 13-NOV-18 04.43.16.948526 PM
  PDB PDB$SEED: Applied successfully on 13-NOV-18 04.43.16.948526 PM

Current state of release update SQL patches:
  Binary registry:
    18.4.0.0.0 Release_Update 1809251743: Installed
  PDB CDB$ROOT:
    Applied 18.3.0.0.0 Release_Update 1806280943 successfully on 13-NOV-18 04.35.06.791214 PM
  PDB GIMR_DSCREP_10:
    Applied 18.3.0.0.0 Release_Update 1806280943 successfully on 13-NOV-18 04.43.16.940471 PM
  PDB PDB$SEED:
    Applied 18.3.0.0.0 Release_Update 1806280943 successfully on 13-NOV-18 04.43.16.940471 PM

Adding patches to installation queue and performing prereq checks...done
Installation queue:
  For the following PDBs: CDB$ROOT PDB$SEED GIMR_DSCREP_10
    No interim patches need to be rolled back
    Patch 28655784 (Database Release Update : 18.4.0.0.181016 (28655784)):
      Apply from 18.3.0.0.0 Release_Update 1806280943 to 18.4.0.0.0 Release_Update 1809251743
    No interim patches need to be applied

Installing patches...
Patch installation complete.  Total patches installed: 3

Validating logfiles...done
Patch 28655784 apply (pdb CDB$ROOT): SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/28655784/22509982/28655784_apply__MGMTDB_CDBROOT_2018Nov14_16_11_00.log (no errors)
Patch 28655784 apply (pdb PDB$SEED): SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/28655784/22509982/28655784_apply__MGMTDB_PDBSEED_2018Nov14_16_11_51.log (no errors)
Patch 28655784 apply (pdb GIMR_DSCREP_10): SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/28655784/22509982/28655784_apply__MGMTDB_GIMR_DSCREP_10_2018Nov14_16_11_50.log (no errors)
SQL Patching tool complete on Wed Nov 14 16:12:50 2018
2018/11/14 16:13:40 CLSRSC-4015: Performing install or upgrade action for Oracle Trace File Analyzer (TFA) Collector.
2018/11/14 16:15:28 CLSRSC-4003: Successfully patched Oracle Trace File Analyzer (TFA) Collector.
2018/11/14 16:17:48 CLSRSC-672: Post-patch steps for patching GI home successfully completed.
server2.cern.ch: Updating inventory on nodes: server2.
========================================
server2.cern.ch:
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 16367 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
'UpdateNodeList' was successful.
server2.cern.ch: Updated inventory on nodes: server2.
server2.cern.ch: Updating inventory on nodes: server2.
========================================
server2.cern.ch:
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 16367 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
'UpdateNodeList' was successful.
server2.cern.ch: Updated inventory on nodes: server2.
server2.cern.ch: Completed the 'move gihome' operation on server cluster.

As you can see, there are two differencse here: the second node was in this case the last one, so the cluster status gets back to NORMAL, and the GIMR is patched with datapatch (lines 176-227).

At this point, the cluster has been patched. After some testing, you can safely remove the inactive version of Grid Infrastructure using the deinstall binary ($OLD_OH/deinstall/deinstall).

Quite easy, huh?

If you combine the Independent Local-mode Automaton with a home-developed solution for the creation and the provisioning of Grid Infrastructure Golden Images, you can easily achieve automated Grid Infrastructure patching of a big, multi-cluster environment.

Of course, Fleet Patching and Provisioning remains the Rolls-Royce: if you can afford it, GI patching and much more is completely automated and developed by Oracle, so you will have no headaches when new versions are released. But the local-mode automaton might be enough for your needs.

— 

Ludo

Oracle Clusterware Services Status at a glance, fast!

$
0
0

If you use Oracle Clusterware or you deploy your databases to the Oracle Cloud, you probably have some application services defined with srvctl for your database.

If you have many databases, services and nodes, it might be annoying, when doing maintenance or service relocation, to have a quick overview about how services are distributed across the nodes and what’s their status.

With srvctl (the official tool for that), it is a per-database operation:

$ srvctl status service
PRKO-2082 : Missing mandatory option -db

If you have many databases, you have to run db by db.

It is also slow! For example, this database has 20 services. Getting the status takes 27 seconds:

# [ oracle@server1:/home/oracle/ [15:52:00] [11.2.0.4.0 [DBMS EE] SID=HRDEV1] 1 ] #
$ time srvctl status service -d hrdev_site1
Service SERVICE_NUMBER_01 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_02 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_03 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_04 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_05 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_06 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_07 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_08 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_09 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_10 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_11 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_12 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_13 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_14 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_15 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_16 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_17 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_18 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_19 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_20 is running on instance(s) HRDEV4

real    0m27.858s
user    0m1.365s
sys     0m1.143s

Instead of operating row-by-row (get the status for each service), why not relying on the cluster resources with crsctl and get the big picture once?

$ time crsctl stat res -f -w "(TYPE = ora.service.type)"
...
...

real    0m0.655s
user    0m0.169s
sys     0m0.098s

crsctl stat res -f  returns a list of ATTRIBUTE_NAME=value for each service, eventually more than one if the service is not singleton/single instance  but uniform/multi instance.

By parsing them with some awk code can provide nice results!

STATE, INTERNAL_STATE and TARGET are useful in this case and might be used to display colours as well.

  • Green: Status ONLINE, Target ONLINE, STABLE
  • Black: Status OFFLINE, Target OFFLNE, STABLE
  • Red: Status ONLINE, Target OFFLINE, STABLE
  • Yellow: all other cases

Here’s the code:

if [ -f /etc/oracle/olr.loc ] ; then
        export ORA_CLU_HOME=`cat /etc/oracle/olr.loc 2>/dev/null | grep crs_home | awk -F= '{print $2}'`
        export CRS_EXISTS=1
        export CRSCTL=$ORA_CLU_HOME/bin/crsctl
else
        export CRS_EXISTS=0
fi

svcstat ()
{
    if [ $CRS_EXISTS -eq 1 ]; then
        ${CRSCTL} stat res -f -w "(TYPE = ora.service.type)" | awk -F= '
function print_row() {
        dbbcol="";
        dbecol="";
        instbcol="";
        instecol="";
        instances=res["INSTANCE_COUNT 1"];
        for(i=1;i<=instances;i++) {
                # if at least one of the services is online, the service is online (then I paint it green)
                if (res["STATE " i] == "ONLINE" ) {
                        dbbcol="\033[0;32m";
                        dbecol="\033[0m";
                }
        }
        # db unique name is always the second part of the resource name
        # because it does not change, I can get it once from the resource name
        res["DB_UNIQUE_NAME"]=substr(substr(res["NAME"],5),1,index(substr(res["NAME"],5),".")-1);

        # same for service name
        res["SERVICE_NAME"]=substr(res["NAME"],index(substr(res["NAME"],5),".")+5,length(substr(res["NAME"],index(substr(res["NAME"],5),".")+5))-4);

        #starting printing the first part of the information
        printf ("%s%-24s %-30s%s",dbbcol, res["DB_UNIQUE_NAME"], res["SERVICE_NAME"], dbecol);

        # here, instance need to map to the correct server.
        # the mapping is node by attribute TARGET_SERVER (not last server)
        for ( n in node ) {
                node_name=node[n];
                status[node_name]="";
                for (i=1; i<=instances; i++) {
                        # we are on the instance that matches the server
                        if (node_name == res["TARGET_SERVER " i]) {
                                res["SERVER_NAME " i]=node_name;
                                if (status[node_name] !~ "ONLINE") {
                                        # when a service relocates both instances get the survival target_server
                                        # but just one is ONLINE... so we need to get always the ONLINE one.
                                        #printf("was::%s:", status[node_name]);
                                        status[node_name]=res["STATE " i];
                                }

                                # colors modes
                                if ( res["STATE " i] == "ONLINE" && res["INTERNAL_STATE " i] == "STABLE" ) {
                                        # online and stable: GREEN
                                        status[node_name]=sprintf("\033[0;32m%-14s\033[0m", status[node_name]);
                                }
                                else if ( res["STATE " i] != "ONLINE" && res["INTERNAL_STATE " i] == "STABLE" ) {
                                        # offline and stable
                                        if ( res["TARGET " i] == "OFFLINE" ) {
                                                # offline, stable, target offline: BLACK
                                                status[node_name]=sprintf("%-14s", status[node_name]);
                                        }
                                        else {
                                                # offline, stable, target online: RED
                                                status[node_name]=sprintf("\033[0;31m%-14s\033[0m", status[node_name]);
                                        }
                                }
                                else {
                                        # all other cases: offline and starting, online and stopping, clearning, etc.: YELLOW
                                        status[node_name]=sprintf("\033[0;33m%-14s\033[0m", status[node_name]);
                                }
                                #printf("%s %s %s %s\n", status[node_name], node[n], res["STATE " i], res["INTERNAL_STATE " i]);
                        }
                }
               printf(" %-14s", status[node_name]);
        }
        printf("\n");
}
function pad (string, len, char) {
        ret = string;
        for ( i = length(string); i<len ; i++) {
                ret = sprintf("%s%s",ret,char);
        }
        return ret;
}
BEGIN {
        debug = 0;
        first = 1;
        afterempty=1;
        # this loop should set:
        # node[1]=server1; node[2]=server2; nodes=2;
        nodes=0;
        while ("olsnodes" | getline a) {
                nodes++;
                node[nodes] = a;
        }
        fmt="%-24s %-30s";
        printf (fmt, "DB_Unique_Name", "Service_Name");
        for ( n in node ) {
                printf (" %-14s", node[n]);
        }
        printf ("\n");
        printf (fmt, pad("",24,"-"), pad("",30,"-"));
        for ( n in node ) {
                printf (" %s", pad("",14,"-"));
        }
        printf ("\n");

}
# MAIN awk svcstat
{
        if ( $1 == "NAME" ) {
                if ( first != 1 && res["NAME"] == $2 ) {
                        if ( debug == 1 ) print "Secondary instance";
                        instance++;
                }
                else {
                        if ( first != 1 ) {
                                print_row();
                        }
                        first = 0;
                        instance=1;
                        delete res;
                        res["NAME"] = $2;
                }
        }
        else  {
                res[$1 " " instance] = $2 ;

        }
}
END {
        #if ( debug == 1 ) for (key in res) { print key ": " res[key] }
        print_row();
}
';
    else
        echo "svcstat not available on non-clustered environments";
        false;
    fi
}

Here’s what you can expect, for 92 services distributed on 4 nodes and a dozen of databases (the output is snipped and the names are masked):

$ time svcstat
DB_Unique_Name     Service_Name       server1  server2  server3  server4
------------------ ------------------ -------- -------- -------- --------
hrdev_site1        SERVICE_NUMBER_01                             ONLINE
hrdev_site1        SERVICE_NUMBER_02                             ONLINE
...
hrdev_site1        SERVICE_NUMBER_20                             ONLINE
hrstg_site1        SERVICE_NUMBER_21                    ONLINE  
hrstg_site1        SERVICE_NUMBER_22                    ONLINE  
...
hrstg_site1        SERVICE_NUMBER_41                    ONLINE  
hrtest_site1       SERVICE_NUMBER_42           ONLINE           
hrtest_site1       SERVICE_NUMBER_43           ONLINE           
...
hrtest_site1       SERVICE_NUMBER_62           ONLINE           
hrtest_site1       SERVICE_NUMBER_63           ONLINE           
hrtest_site1       SERVICE_NUMBER_64           ONLINE           
hrtest_site1       SERVICE_NUMBER_65           ONLINE           
hrtest_site1       SERVICE_NUMBER_66           ONLINE           
erpdev_site1       SERVICE_NUMBER_67  ONLINE                    
erptest_site1      SERVICE_NUMBER_68  ONLINE                    
cmsstg_site1       SERVICE_NUMBER_69  ONLINE                    
cmsstg_site1       SERVICE_NUMBER_70  ONLINE                    
...
cmsstg_site1       SERVICE_NUMBER_74  ONLINE                    
cmsstg_site1       SERVICE_NUMBER_75  ONLINE                    
cmstest_site1      SERVICE_NUMBER_76  ONLINE                    
...
cmstest_site1      SERVICE_NUMBER_81  ONLINE                    
kbtest_site1       SERVICE_NUMBER_82                    ONLINE           
...
kbtest_site1       SERVICE_NUMBER_84                    ONLINE           
reporting_site1    SERVICE_NUMBER_85  ONLINE                    
paydev_site1       SERVICE_NUMBER_86           ONLINE           
payrep_site1       SERVICE_NUMBER_87           ONLINE           
...
paytest_site1      SERVICE_NUMBER_90           ONLINE           
paytest_site1      SERVICE_NUMBER_91           ONLINE           
crm_site1          SERVICE_NUMBER_92                             ONLINE

real    0m0.358s
user    0m0.232s
sys     0m0.134s

I’d be curious to know if it works well for your environment, please comment here. 🙂

Thanks

Ludo

Viewing all 119 articles
Browse latest View live