Autumn: a season of conferences and travels

October 6, 2016, 2:20 am

≫ Next: How Adaptive Plans work with SQL Plan Baselines?

≪ Previous: How to fix CPU usage problem in 12c due to DBMS_FEATURE_AWR

It is not a news that autumn is the busiest season for people involved in the Oracle Community. Thanks to the OTN Nordic Tour this year I am setting my new record

In the next 2 months I will give 13 presentations in 8 distinct countries and in 3 distinct languages (Italian, French, English).

If you are based in one of those countries, you can join and say hello

Date/Time	Event
11/10/2016 11:00 am - 12:00 pm	Adaptive Features or: How I Learned to Stop Worrying and Troubleshoot the Bomb [Nordic Tour 2016 - Denmark] Oracle Denmark, Ballerup
11/10/2016 2:10 pm - 3:10 pm	Migrating to 12c: 300 DBs in 300 days. What we learned [Nordic Tour 2016 - Denmark] Oracle Denmark, Ballerup
12/10/2016 11:15 am - 12:00 pm	Migrating to 12c: 300 DBs in 300 days. What we learned. [Nordic Tour 2016 - Norway] Felix Conference Center, Oslo
12/10/2016 1:00 pm - 1:45 pm	Self-Service Database Operations made easy with APEX [Nordic Tour 2016 - Norway] Felix Conference Center, Oslo
12/10/2016 3:00 pm - 3:45 pm	Database Migration Assistant for Unicode (DMU): a Real Customer Case [Nordic Tour 2016 - Norway] Felix Conference Center, Oslo
13/10/2016 3:10 pm - 4:00 pm	Migrating to 12c: 300 DBs in 300 days. What we learned. [Nordic Tour 2016 - Finland] Accenture Finland, Helsinki
14/10/2016 9:00 am - 9:45 am	Migrating to 12c: 300 DBs in 300 days. What we learned. [Nordic Tour 2016 - Sweden] Stockholm, Stockholm
14/10/2016 10:00 am - 10:45 am	Adaptive Features or: How I Learned to Stop Worrying and Troubleshoot the Bomb. [Nordic Tour 2016 - Sweden] Stockholm, Stockholm
11/11/2016 9:30 am - 10:15 am	Migrating to 12c: 300 DBs in 300 days. What we learned. [ITOUG Tech Day 2016] UNA Hotel Century, Milano
11/11/2016 12:00 pm - 12:45 pm	Adaptive Features or: How I Learned to Stop Worrying and Troubleshoot the Bomb. UNA Hotel Century, Milano
16/11/2016 11:00 am - 11:45 am	Adaptive Features or: How I Learned to Stop Worrying and Troubleshoot the Bomb [DOAG 2016] DOAG Konferenz 2016, Nürnberg
23/11/2016 9:00 am - 12:00 pm	Migrating to Oracle Database 12c: 300 Databases in 300 Days [Oracle Tech Breakfast] Oracle Business Breakfast, Oracle Suisse SA, Geneva
07/12/2016 12:30 pm - 1:15 pm	Upgrading 300 Databases to 12c in 300 Days. What Can Go Wrong? [UKOUG_Tech16] International Convention Centre, Birmingham, Birmingham

The updated list of upcoming events can be found here.

↧

How Adaptive Plans work with SQL Plan Baselines?

November 18, 2016, 12:56 am

≫ Next: Loading resolved Adaptive Plans in the SQL Plan Management

≪ Previous: Autumn: a season of conferences and travels

Disclaimer: after writing this post (but before publishing it) I have seen that other people already blogged about it, so I am ashamed of publishing it anyway… but that’s blogger’s life

Wednesday I have got a nice question after my presentation about Adaptive Features at the DOAG16 conference:

What happens when you load an adaptive plan in a SQL Plan Baseline?
Does it load only the final plan or does it load the whole plan including the inactive operations? Will the plan be evaluated again using the inflection point?

I have decided to do some tests in order to give the best possible answer. I did not spend the time to rethink about producing an adaptive plan. Tim Hall already did an excellent test case to create and alter an adaptive plan in his blog, so I have reused massively most of its code. Thanks Tim :-).

I will not post all the code (please find it in Tim’s post), I will go straight to the plans.

First: I have an adaptive plan that resolves to NESTED LOOPS:

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */ 
  2    a.data AS tab1_data,
  4    b.data AS tab2_data
  5  FROM   tab1 a
  6         JOIN tab2 b ON b.tab1_id = a.id
  7  WHERE  a.code = 'ONE';

...
 
30 rows selected.

SQL> SET LINESIZE 200 PAGESIZE 100
SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(format => 'adaptive'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------
SQL_ID  4r3harjun4dvz, child number 0
-------------------------------------
SELECT a.data AS tab1_data,        b.data AS tab2_data FROM   tab1 a
    JOIN tab2 b ON b.tab1_id = a.id WHERE  a.code = 'ONE'

Plan hash value: 2672205743

-----------------------------------------------------------------------------------------------------------
|   Id  | Operation                               | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT                        |               |       |       |     3 (100)|          |
|- *  1 |  HASH JOIN                              |               |    25 |   425 |     3   (0)| 00:00:01 |
|     2 |   NESTED LOOPS                          |               |    25 |   425 |     3   (0)| 00:00:01 |
|     3 |    NESTED LOOPS                         |               |    25 |   425 |     3   (0)| 00:00:01 |
|-    4 |     STATISTICS COLLECTOR                |               |       |       |            |          |
|     5 |      TABLE ACCESS BY INDEX ROWID BATCHED| TAB1          |     1 |    11 |     2   (0)| 00:00:01 |
|  *  6 |       INDEX RANGE SCAN                  | TAB1_CODE     |     1 |       |     1   (0)| 00:00:01 |
|  *  7 |     INDEX RANGE SCAN                    | TAB2_TAB1_FKI |    25 |       |     0   (0)|          |
|     8 |    TABLE ACCESS BY INDEX ROWID          | TAB2          |    25 |   150 |     1   (0)| 00:00:01 |
|-    9 |   TABLE ACCESS FULL                     | TAB2          |    25 |   150 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("B"."TAB1_ID"="A"."ID")
   6 - access("A"."CODE"='ONE')
   7 - access("B"."TAB1_ID"="A"."ID")

Note
-----
   - this is an adaptive plan (rows marked '-' are inactive)


33 rows selected.

Second: I load the plan (lazy way: using baseline capture at session level)

SQL> ALTER SESSION SET OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES = TRUE;

Session altered.

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */
  2         a.data AS tab1_data,
  3         b.data AS tab2_data
  4  FROM   tab1 a
  5         JOIN tab2 b ON b.tab1_id = a.id
  6  WHERE  a.code = 'ONE';

 TAB1_DATA  TAB2_DATA
---------- ----------
...

30 rows selected.

SQL> r 
  1  SELECT /*+ GATHER_PLAN_STATISTICS */
  2         a.data AS tab1_data,
  3         b.data AS tab2_data
  4  FROM   tab1 a
  5         JOIN tab2 b ON b.tab1_id = a.id
  6* WHERE  a.code = 'ONE'

 TAB1_DATA  TAB2_DATA
---------- ----------
...

30 rows selected.

SQL> ALTER SESSION SET OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES = FALSE;

Session altered.

SQL> select sql_handle, plan_name, sql_text, enabled, accepted, fixed from dba_sql_plan_baselines;

SQL_HANDLE           PLAN_NAME                          SQL_TEXT                                   ENA ACC FIX
-------------------- ---------------------------------- ------------------------------------------ --- --- ---
SQL_6c4c6680810dd01a SQL_PLAN_6sm36h20hvn0u55a25f73     SELECT /*+ GATHER_PLAN_STATISTICS */       YES YES NO
                                                                       a.data AS tab1_data,
                                                                       b.data A

Third: re-run the statement and check the plan

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */
  2         a.data AS tab1_data,
  3         b.data AS tab2_data
  4  FROM   tab1 a
  5         JOIN tab2 b ON b.tab1_id = a.id
  6  WHERE  a.code = 'ONE';

 TAB1_DATA  TAB2_DATA
---------- ----------
...

30 rows selected.

SQL> select * from table(dbms_xplan.display_cursor(format=>'+adaptive'));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------
SQL_ID  1km5kczcgr0fr, child number 3
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */        a.data AS tab1_data,
b.data AS tab2_data FROM   tab1 a        JOIN tab2 b ON b.tab1_id =
a.id WHERE  a.code = 'ONE'

Plan hash value: 2672205743

-------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |               |       |       |     3 (100)|          |
|   1 |  NESTED LOOPS                         |               |    25 |   425 |     3   (0)| 00:00:01 |
|   2 |   NESTED LOOPS                        |               |    25 |   425 |     3   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| TAB1          |     1 |    11 |     2   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | TAB1_CODE     |     1 |       |     1   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                   | TAB2_TAB1_FKI |    25 |       |     0   (0)|          |
|   6 |   TABLE ACCESS BY INDEX ROWID         | TAB2          |    25 |   150 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("A"."CODE"='ONE')
   5 - access("B"."TAB1_ID"="A"."ID")

Note
-----
   - SQL plan baseline SQL_PLAN_6sm36h20hvn0u55a25f73 used for this statement


30 rows selected.

It does not look adaptive, but I can also check from the function DBMS_XPLAN.DISPLAY_SQL_PLAN_BASELINE:

SQL> select * from table (DBMS_XPLAN.DISPLAY_SQL_PLAN_BASELINE('SQL_6c4c6680810dd01a', format=>'+adaptive'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
SQL handle: SQL_6c4c6680810dd01a
SQL text: SELECT /*+ GATHER_PLAN_STATISTICS */        a.data AS tab1_data,
            b.data AS tab2_data FROM   tab1 a        JOIN tab2 b ON b.tab1_id =
          a.id WHERE  a.code = 'ONE'
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_6sm36h20hvn0u55a25f73         Plan id: 1436704627
Enabled: YES     Fixed: NO      Accepted: YES     Origin: AUTO-CAPTURE
Plan rows: From dictionary
--------------------------------------------------------------------------------

Plan hash value: 2672205743

---------------------------------------------------------------------------------------------------------
|   Id  | Operation                             | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT                      |               |    25 |   425 |     3   (0)| 00:00:01 |
|     1 |  NESTED LOOPS                         |               |    25 |   425 |     3   (0)| 00:00:01 |
|     2 |   NESTED LOOPS                        |               |    25 |   425 |     3   (0)| 00:00:01 |
|     3 |    TABLE ACCESS BY INDEX ROWID BATCHED| TAB1          |     1 |    11 |     2   (0)| 00:00:01 |
|  *  4 |     INDEX RANGE SCAN                  | TAB1_CODE     |     1 |       |     1   (0)| 00:00:01 |
|  *  5 |    INDEX RANGE SCAN                   | TAB2_TAB1_FKI |    25 |       |     0   (0)| 00:00:01 |
|     6 |   TABLE ACCESS BY INDEX ROWID         | TAB2          |    25 |   150 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("A"."CODE"='ONE')
   5 - access("B"."TAB1_ID"="A"."ID")

Note
-----
   - this is an adaptive plan (rows marked '-' are inactive)

37 rows selected.

Again, despite in the Note section it says it is adaptive, it does not look like an adaptive plan.

Can I trust this information? Of course I did not and tried to check the plan with and without baseline after changing the rows to force a plan switch to HJ (again taking Tim’s example):

SQL> INSERT /*+ APPEND */ INTO tab1
  2   SELECT tab1_seq.nextval,
  2         'ONE',
  4         level
  5  FROM   dual
  6  CONNECT BY level <= 10000;

  10000 rows created.

SQL> COMMIT;

Commit complete.

SQL> alter session set optimizer_use_sql_plan_baselines=false;

Session altered.

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */
  2         a.data AS tab1_data,
  3         b.data AS tab2_data
  4  FROM   tab1 a
  5         JOIN tab2 b ON b.tab1_id = a.id
  6  WHERE  a.code = 'ONE';

 TAB1_DATA  TAB2_DATA
---------- ----------
...

30 rows selected.


SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(format => 'allstats last adaptive'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------
SQL_ID  1km5kczcgr0fr, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */        a.data AS tab1_data,
b.data AS tab2_data FROM   tab1 a        JOIN tab2 b ON b.tab1_id =
a.id WHERE  a.code = 'ONE'

Plan hash value: 1599395313

------------------------------------------------------------------------------------------------------------------------------------------------
|   Id  | Operation                               | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT                        |               |      1 |        |     30 |00:00:00.01 |     142 |       |       |          |
|  *  1 |  HASH JOIN                              |               |      1 |     25 |     30 |00:00:00.01 |     142 |  2261K|  2261K| 2180K (0)|
|-    2 |   NESTED LOOPS                          |               |      1 |     25 |  20001 |00:00:00.01 |     124 |       |       |          |
|-    3 |    NESTED LOOPS                         |               |      1 |     25 |  20001 |00:00:00.01 |     124 |       |       |          |
|-    4 |     STATISTICS COLLECTOR                |               |      1 |        |  20001 |00:00:00.01 |     124 |       |       |          |
|     5 |      TABLE ACCESS BY INDEX ROWID BATCHED| TAB1          |      1 |      1 |  20001 |00:00:00.01 |     124 |       |       |          |
|  *  6 |       INDEX RANGE SCAN                  | TAB1_CODE     |      1 |      1 |  20001 |00:00:00.01 |      74 |       |       |          |
|- *  7 |     INDEX RANGE SCAN                    | TAB2_TAB1_FKI |      0 |     25 |      0 |00:00:00.01 |       0 |       |       |          |
|-    8 |    TABLE ACCESS BY INDEX ROWID          | TAB2          |      0 |     25 |      0 |00:00:00.01 |       0 |       |       |          |
|     9 |   TABLE ACCESS FULL                     | TAB2          |      1 |     25 |    100 |00:00:00.01 |      18 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("B"."TAB1_ID"="A"."ID")
   6 - access("A"."CODE"='ONE')
   7 - access("B"."TAB1_ID"="A"."ID")

Note
-----
   - this is an adaptive plan (rows marked '-' are inactive)


34 rows selected.

SQL> alter session set optimizer_use_sql_plan_baselines=true;

Session altered.

SQL> SELECT /*+ GATHER_PLAN_STATISTICS */
  2         a.data AS tab1_data,
  3         b.data AS tab2_data
  4  FROM   tab1 a
  5         JOIN tab2 b ON b.tab1_id = a.id
  6  WHERE  a.code = 'ONE';

 TAB1_DATA  TAB2_DATA
---------- ----------
...

30 rows selected.

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(format => 'allstats last adaptive'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  1km5kczcgr0fr, child number 2
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */        a.data AS tab1_data,
b.data AS tab2_data FROM   tab1 a        JOIN tab2 b ON b.tab1_id =
a.id WHERE  a.code = 'ONE'

Plan hash value: 2672205743

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |               |      1 |        |     30 |00:00:00.01 |   30889 |
|   1 |  NESTED LOOPS                         |               |      1 |     25 |     30 |00:00:00.01 |   30889 |
|   2 |   NESTED LOOPS                        |               |      1 |     25 |     30 |00:00:00.01 |   30886 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| TAB1          |      1 |      1 |  20001 |00:00:00.01 |     125 |
|*  4 |     INDEX RANGE SCAN                  | TAB1_CODE     |      1 |      1 |  20001 |00:00:00.01 |      75 |
|*  5 |    INDEX RANGE SCAN                   | TAB2_TAB1_FKI |  20001 |     25 |     30 |00:00:00.02 |   30761 |
|   6 |   TABLE ACCESS BY INDEX ROWID         | TAB2          |     30 |     25 |     30 |00:00:00.01 |       3 |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("A"."CODE"='ONE')
   5 - access("B"."TAB1_ID"="A"."ID")

Note
-----
   - SQL plan baseline SQL_PLAN_6sm36h20hvn0u55a25f73 used for this statement


30 rows selected.

After changing the rows:

when I do not use the baseline, the plan resolves to HASH JOIN
when I use it, the baseline forces to NESTED LOOPS.

So the plan in the baseline is not adaptive and it forces to what has been loaded. Is it the final plan or the original one? I have to capture it again to see if a new baseline appears:

SQL>  ALTER SESSION SET OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES = TRUE;

Session altered.

SELECT /*+ GATHER_PLAN_STATISTICS */
  2         a.data AS tab1_data,
  3         b.data AS tab2_data
  4  FROM   tab1 a
  5         JOIN tab2 b ON b.tab1_id = a.id
  6  WHERE  a.code = 'ONE';

 TAB1_DATA  TAB2_DATA
---------- ----------
...

30 rows selected.

SQL> r
  1  SELECT /*+ GATHER_PLAN_STATISTICS */
  2         a.data AS tab1_data,
  3         b.data AS tab2_data
  4  FROM   tab1 a
  5         JOIN tab2 b ON b.tab1_id = a.id
  6* WHERE  a.code = 'ONE'

 TAB1_DATA  TAB2_DATA
---------- ----------
...

30 rows selected.

SQL>  ALTER SESSION SET OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES = FALSE;

Session altered.

SQL> select sql_handle, plan_name, sql_text, enabled, accepted, fixed from dba_sql_plan_baselines;

SQL_HANDLE           PLAN_NAME                      SQL_TEXT                                                                         ENABLED   ACCEPTED  FIXED
-------------------- ------------------------------ -------------------------------------------------------------------------------- --------- --------- ---------
SQL_6c4c6680810dd01a SQL_PLAN_6sm36h20hvn0u55a25f73 SELECT /*+ GATHER_PLAN_STATISTICS */                                             YES       YES       NO
                                                           a.data AS tab1_data,
                                                           b.data A

A new baseline does not appear, so it looks that the original plan is considered by the capture process and not the resolved one! To be 100% sure, let’s try to drop the existing one and redo the test:

SQL> connect / as sysdba
Connected
SQL> DECLARE
  2   v_dropped_plans number;
  3 BEGIN
  4   v_dropped_plans := DBMS_SPM.DROP_SQL_PLAN_BASELINE (
  5      sql_handle => 'SQL_6c4c6680810dd01a'
  6 );
  7 END;
  8 /
PL/SQL procedure successfully completed.

SQL> select sql_handle, plan_name, sql_text, enabled, accepted, fixed from dba_sql_plan_baselines;

no rows selected

SQL> conn ludo/ludo
Connected.
SQL>  ALTER SESSION SET OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES = TRUE;

Session altered.

SELECT /*+ GATHER_PLAN_STATISTICS */
  2         a.data AS tab1_data,
  3         b.data AS tab2_data
  4  FROM   tab1 a
  5         JOIN tab2 b ON b.tab1_id = a.id
  6  WHERE  a.code = 'ONE';

 TAB1_DATA  TAB2_DATA
---------- ----------
...

30 rows selected.

SQL> r
  1  SELECT /*+ GATHER_PLAN_STATISTICS */
  2         a.data AS tab1_data,
  3         b.data AS tab2_data
  4  FROM   tab1 a
  5         JOIN tab2 b ON b.tab1_id = a.id
  6* WHERE  a.code = 'ONE'

 TAB1_DATA  TAB2_DATA
---------- ----------
...

30 rows selected.

SQL>  ALTER SESSION SET OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES = FALSE;

Session altered.

SQL> select sql_handle, plan_name, sql_text, enabled, accepted, fixed from dba_sql_plan_baselines;

SQL_HANDLE           PLAN_NAME                      SQL_TEXT                                                                         ENABLED   ACCEPTED  FIXED
-------------------- ------------------------------ -------------------------------------------------------------------------------- --------- --------- ---------
SQL_6c4c6680810dd01a SQL_PLAN_6sm36h20hvn0u55a25f73 SELECT /*+ GATHER_PLAN_STATISTICS */                                             YES       YES       NO
                                                           a.data AS tab1_data,
                                                           b.data A

SQL> select * from table (DBMS_XPLAN.DISPLAY_SQL_PLAN_BASELINE('SQL_6c4c6680810dd01a', format=>'+adaptive'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
SQL handle: SQL_6c4c6680810dd01a
SQL text: SELECT /*+ GATHER_PLAN_STATISTICS */        a.data AS tab1_data,
            b.data AS tab2_data FROM   tab1 a        JOIN tab2 b ON b.tab1_id =
          a.id WHERE  a.code = 'ONE'
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_6sm36h20hvn0u55a25f73         Plan id: 1436704627
Enabled: YES     Fixed: NO      Accepted: YES     Origin: AUTO-CAPTURE
Plan rows: From dictionary
--------------------------------------------------------------------------------

Plan hash value: 2672205743

---------------------------------------------------------------------------------------------------------
|   Id  | Operation                             | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT                      |               |    25 |   425 |     3   (0)| 00:00:01 |
|     1 |  NESTED LOOPS                         |               |    25 |   425 |     3   (0)| 00:00:01 |
|     2 |   NESTED LOOPS                        |               |    25 |   425 |     3   (0)| 00:00:01 |
|     3 |    TABLE ACCESS BY INDEX ROWID BATCHED| TAB1          |     1 |    11 |     2   (0)| 00:00:01 |
|  *  4 |     INDEX RANGE SCAN                  | TAB1_CODE     |     1 |       |     1   (0)| 00:00:01 |
|  *  5 |    INDEX RANGE SCAN                   | TAB2_TAB1_FKI |    25 |       |     0   (0)| 00:00:01 |
|     6 |   TABLE ACCESS BY INDEX ROWID         | TAB2          |    25 |   150 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("A"."CODE"='ONE')
   5 - access("B"."TAB1_ID"="A"."ID")

Note
-----
   - this is an adaptive plan (rows marked '-' are inactive)

37 rows selected.

So, despite the fact that I have an adaptive plan that switches from NL to HJ, only the NESTED LOOPS operations are captured in the baseline, I can infer the only the original plan is loaded as SQL Plan Baseline.

References:

↧

Loading resolved Adaptive Plans in the SQL Plan Management

November 18, 2016, 2:30 am

≫ Next: Getting the Oracle Homes in a server from the oraInventory

≪ Previous: How Adaptive Plans work with SQL Plan Baselines?

In my previous post, I have shown that loading Adaptive Plans in the SQL Plan Baseline leads to using the original plan. Well, actually, this is true when you capture them via the OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES parameter.

Thanks to a tweet by Neil Chandler, I’ve realized that it was a good idea to show also the case when the plan is loaded manually.

When the adaptive plan switches to the alternative plan, the plan_hash_value also changes, and can be loaded manually in the baseline with DBMS_SPM.

Let’s reset everything and retry quickly to:

Capture the plan automatically (this will lead to the original plan)
Load the plan manually (I will specify to load the alternative plan, if resolved)
Drop the plan captured automatically
Use the newly accepted baseline

SQL> connect / as sysdba
Connected.

SQL> DECLARE
  2    v_dropped_plans number;
  3  BEGIN
  4    v_dropped_plans := DBMS_SPM.DROP_SQL_PLAN_BASELINE (
  5       sql_handle => 'SQL_6c4c6680810dd01a'
  6  );
  7    DBMS_OUTPUT.PUT_LINE('dropped ' || v_dropped_plans || ' plans');
  8  END;
  9  /

PL/SQL procedure successfully completed.

SQL> select sql_handle, plan_name, sql_text, enabled, accepted, fixed from dba_sql_plan_baselines;

no rows selected

SQL> alter system flush shared_pool;

System altered.

SQL> select sql_id, plan_hash_value, child_number from v$sql where sql_id='1km5kczcgr0fr';

no rows selected

SQL> connect ludo/ludo
Connected.
SQL> ALTER SESSION SET OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES = TRUE;

Session altered.

SELECT /*+ GATHER_PLAN_STATISTICS */
  2         a.data AS tab1_data,
  3         b.data AS tab2_data
  4  FROM   tab1 a
  5         JOIN tab2 b ON b.tab1_id = a.id
  6  WHERE  a.code = 'ONE';

 TAB1_DATA  TAB2_DATA
---------- ----------
...
30 rows selected.

SQL> r
  1  SELECT /*+ GATHER_PLAN_STATISTICS */
  2         a.data AS tab1_data,
  3         b.data AS tab2_data
  4  FROM   tab1 a
  5         JOIN tab2 b ON b.tab1_id = a.id
  6* WHERE  a.code = 'ONE'

 TAB1_DATA  TAB2_DATA
---------- ----------
...
30 rows selected.

SQL> ALTER SESSION SET OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES = FALSE;

Session altered.

SQL> select sql_id, plan_hash_value, child_number from v$sql where sql_id='1km5kczcgr0fr';

SQL_ID                                  PLAN_HASH_VALUE CHILD_NUMBER
--------------------------------------- --------------- ------------
1km5kczcgr0fr                                2672205743            1

SELECT /*+ GATHER_PLAN_STATISTICS */
       a.data AS tab1_data,
  2    3         b.data AS tab2_data
  4  FROM   tab1 a
  5         JOIN tab2 b ON b.tab1_id = a.id
  6  WHERE  a.code = 'ONE';

 TAB1_DATA  TAB2_DATA
---------- ----------
...
30 rows selected.

SQL> select sql_id, plan_hash_value, child_number from v$sql where sql_id='1km5kczcgr0fr';

SQL_ID                                  PLAN_HASH_VALUE CHILD_NUMBER
--------------------------------------- --------------- ------------
1km5kczcgr0fr                                2672205743            1
1km5kczcgr0fr                                2672205743            2

SQL> ALTER SESSION SET OPTIMIZER_USE_SQL_PLAN_BASELINES = FALSE;

Session altered.

SELECT /*+ GATHER_PLAN_STATISTICS */
  2         a.data AS tab1_data,
  3         b.data AS tab2_data
  4  FROM   tab1 a
  5         JOIN tab2 b ON b.tab1_id = a.id
  6  WHERE  a.code = 'ONE';

 TAB1_DATA  TAB2_DATA
---------- ----------
...
30 rows selected.

SQL> select sql_id, plan_hash_value, child_number from v$sql where sql_id='1km5kczcgr0fr';

SQL_ID                                  PLAN_HASH_VALUE CHILD_NUMBER
--------------------------------------- --------------- ------------
1km5kczcgr0fr                                1599395313            0
1km5kczcgr0fr                                2672205743            1
1km5kczcgr0fr                                2672205743            2


SQL> connect / as sysdba
Connected.
SQL> VARIABLE cnt NUMBER
SQL> EXECUTE :cnt := DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE( sql_id => '1km5kczcgr0fr',plan_hash_value => '1599395313');

PL/SQL procedure successfully completed.

SQL> select sql_handle, plan_name, sql_text, enabled, accepted, fixed from dba_sql_plan_baselines;

SQL_HANDLE                PLAN_NAME                           SQL_TEXT                               ENABLED   ACCEPTED  FIXED
------------------------- ----------------------------------- -------------------------------------- --------- --------- ---------
SQL_6c4c6680810dd01a      SQL_PLAN_6sm36h20hvn0u55a25f73      SELECT /*+ GATHER_PLAN_STATISTICS */   YES       YES       NO
                                                                     a.data AS tab1_data,
                                                                     b.data A

SQL_6c4c6680810dd01a      SQL_PLAN_6sm36h20hvn0ud64ac9be      SELECT /*+ GATHER_PLAN_STATISTICS */   YES       YES       NO
                                                                     a.data AS tab1_data,
                                                                     b.data A


SQL> select * from table (DBMS_XPLAN.DISPLAY_SQL_PLAN_BASELINE('SQL_6c4c6680810dd01a', format=>'+adaptive'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL handle: SQL_6c4c6680810dd01a
SQL text: SELECT /*+ GATHER_PLAN_STATISTICS */        a.data AS tab1_data,
            b.data AS tab2_data FROM   tab1 a        JOIN tab2 b ON b.tab1_id =
          a.id WHERE  a.code = 'ONE'
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_6sm36h20hvn0u55a25f73         Plan id: 1436704627
Enabled: YES     Fixed: NO      Accepted: YES     Origin: AUTO-CAPTURE
Plan rows: From dictionary
--------------------------------------------------------------------------------

Plan hash value: 2672205743

---------------------------------------------------------------------------------------------------------
|   Id  | Operation                             | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT                      |               |    25 |   425 |     3   (0)| 00:00:01 |
|     1 |  NESTED LOOPS                         |               |    25 |   425 |     3   (0)| 00:00:01 |
|     2 |   NESTED LOOPS                        |               |    25 |   425 |     3   (0)| 00:00:01 |
|     3 |    TABLE ACCESS BY INDEX ROWID BATCHED| TAB1          |     1 |    11 |     2   (0)| 00:00:01 |
|  *  4 |     INDEX RANGE SCAN                  | TAB1_CODE     |     1 |       |     1   (0)| 00:00:01 |
|  *  5 |    INDEX RANGE SCAN                   | TAB2_TAB1_FKI |    25 |       |     0   (0)| 00:00:01 |
|     6 |   TABLE ACCESS BY INDEX ROWID         | TAB2          |    25 |   150 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("A"."CODE"='ONE')
   5 - access("B"."TAB1_ID"="A"."ID")

Note
-----
   - this is an adaptive plan (rows marked '-' are inactive)

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_6sm36h20hvn0ud64ac9be         Plan id: 3595225534
Enabled: YES     Fixed: NO      Accepted: YES     Origin: MANUAL-LOAD
Plan rows: From dictionary
--------------------------------------------------------------------------------

Plan hash value: 1599395313

----------------------------------------------------------------------------------------------------
|   Id  | Operation                            | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT                     |           |       |       |     3 (100)|          |
|  *  1 |  HASH JOIN                           |           |    25 |   425 |     3   (0)| 00:00:01 |
|     2 |   TABLE ACCESS BY INDEX ROWID BATCHED| TAB1      |     1 |    11 |     2   (0)| 00:00:01 |
|  *  3 |    INDEX RANGE SCAN                  | TAB1_CODE |     1 |       |     1   (0)| 00:00:01 |
|     4 |   TABLE ACCESS FULL                  | TAB2      |    25 |   150 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("B"."TAB1_ID"="A"."ID")
   3 - access("A"."CODE"='ONE')

Note
-----
   - this is an adaptive plan (rows marked '-' are inactive)

65 rows selected.


SQL> VARIABLE cnt NUMBER
SQL>  EXECUTE :cnt := DBMS_SPM.DROP_SQL_PLAN_BASELINE( SQL_HANDLE => 'SQL_6c4c6680810dd01a',plan_name => 'SQL_PLAN_6sm36h20hvn0u55a25f73');

PL/SQL procedure successfully completed.

SQL> select sql_handle, plan_name, sql_text, enabled, accepted, fixed from dba_sql_plan_baselines;

SQL_HANDLE                PLAN_NAME                           SQL_TEXT                               ENABLED   ACCEPTED  FIXED
------------------------- ----------------------------------- -------------------------------------- --------- --------- ---------
SQL_6c4c6680810dd01a      SQL_PLAN_6sm36h20hvn0ud64ac9be      SELECT /*+ GATHER_PLAN_STATISTICS */   YES       YES       NO
                                                                     a.data AS tab1_data,
                                                                     b.data A

SQL> conn ludo/ludo
Connected.
SELECT /*+ GATHER_PLAN_STATISTICS */
  2         a.data AS tab1_data,
  3         b.data AS tab2_data
  4  FROM   tab1 a
  5         JOIN tab2 b ON b.tab1_id = a.id
  6  WHERE  a.code = 'ONE';

 TAB1_DATA  TAB2_DATA
---------- ----------
...
30 rows selected.

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(format => 'allstats last adaptive'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  1km5kczcgr0fr, child number 2
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */        a.data AS tab1_data,
b.data AS tab2_data FROM   tab1 a        JOIN tab2 b ON b.tab1_id =
a.id WHERE  a.code = 'ONE'

Plan hash value: 1599395313

---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |           |      1 |        |     24 |00:00:00.01 |      71 |       |       |          |
|*  1 |  HASH JOIN                           |           |      1 |     25 |     24 |00:00:00.01 |      71 |  1888K|  1888K| 1921K (0)|
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| TAB1      |      1 |      1 |  10001 |00:00:00.01 |      62 |       |       |          |
|*  3 |    INDEX RANGE SCAN                  | TAB1_CODE |      1 |      1 |  10001 |00:00:00.01 |      37 |       |       |          |
|   4 |   TABLE ACCESS FULL                  | TAB2      |      1 |    100 |    100 |00:00:00.01 |       9 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("B"."TAB1_ID"="A"."ID")
   3 - access("A"."CODE"='ONE')

Note
-----
   - SQL plan baseline SQL_PLAN_6sm36h20hvn0ud64ac9be used for this statement


28 rows selected.

To recap:

The capture process will always load the original plan
It is possible to decide to load manually the original one or the alternative one (if resolved)
Using automatic capture is a bad idea

HTH

—

Ludo

↧

Getting the Oracle Homes in a server from the oraInventory

November 20, 2016, 11:41 pm

≫ Next: DBMS_QOPATCH, datapatch, rollback, apply force

≪ Previous: Loading resolved Adaptive Plans in the SQL Plan Management

The information contained in the oratab should always be updated, but it is not always reliable. If you want to know what Oracle installations you have in a server, better to get it from the Oracle Universal Installer or, if you want some shortcuts, do some grep magics inside the inventory with the shell.

The following diagram is a simplified structure of the inventory that shows what entries are present in the central inventory (one per server) and the local inventories (one per Oracle Home).

You can use this simple function to get some content out of it, including the edition (that information is a step deeper in the local inventory).

# [ oracle@testlab:/u01/app/oracle/ [17:53:48] [12.1.0.2.0 EE SID=theludot] 0 ] #
# type lsoh
lsoh is a function
lsoh ()
{
    CENTRAL_ORAINV=`grep ^inventory_loc /etc/oraInst.loc | awk -F= '{print $2}'`;
    IFS='
';
    echo;
    printf "%-22s %-55s %-12s %-9s\n" HOME LOCATION VERSION EDITION;
    echo ---------------------- ------------------------------------------------------- ------------ ---------;
    for line in `grep "<HOME NAME=" ${CENTRAL_ORAINV}/ContentsXML/inventory.xml 2>/dev/null`;
    do
        unset ORAVERSION;
        unset ORAEDITION;
        OH=`echo $line | tr ' ' '\n' | grep ^LOC= | awk -F\" '{print $2}'`;
        OH_NAME=`echo $line | tr ' ' '\n' | grep ^NAME= | awk -F\" '{print $2}'`;
        comp_file=$OH/inventory/ContentsXML/comps.xml;
        comp_xml=`grep "COMP NAME" $comp_file | head -1`;
        comp_name=`echo $comp_xml | tr ' ' '\n' | grep ^NAME= | awk -F\" '{print $2}'`;
        comp_vers=`echo $comp_xml | tr ' ' '\n' | grep ^VER= | awk -F\" '{print $2}'`;
        case $comp_name in
            "oracle.crs")
                ORAVERSION=$comp_vers;
                ORAEDITION=GRID
            ;;
            "oracle.sysman.top.agent")
                ORAVERSION=$comp_vers;
                ORAEDITION=AGT
            ;;
            "oracle.server")
                ORAVERSION=`grep "PATCH NAME=\"oracle.server\"" $comp_file 2>/dev/null | tr ' ' '\n' | grep ^VER= | awk -F\" '{print $2}'`;
                ORAEDITION="DBMS";
                if [ -z "$ORAVERSION" ]; then
                    ORAVERSION=$comp_vers;
                fi;
                ORAMAJOR=`echo $ORAVERSION |  cut -d . -f 1`;
                case $ORAMAJOR in
                    11 | 12)
                        ORAEDITION="DBMS "`grep "oracle_install_db_InstallType" $OH/inventory/globalvariables/oracle.server/globalvariables.xml 2>/dev/null | tr ' ' '\n' | grep VALUE | awk -F\" '{print $2}'`
                    ;;
                    10)
                        ORAEDITION="DBMS "`grep "s_serverInstallType" $OH/inventory/Components21/oracle.server/*/context.xml 2>/dev/null | tr ' ' '\n' | grep VALUE | awk -F\" '{print $2}'`
                    ;;
                esac
            ;;
        esac;
        [[ -n $ORAEDITION ]] && printf "%-22s %-55s %-12s %-9s\n" $OH_NAME $OH $ORAVERSION $ORAEDITION;
    done;
    echo
}
# [ oracle@testlab:/u01/app/oracle/sbin [17:53:48] [12.1.0.2.0 EE SID=theludot] 0 ] #
# lsoh

HOME                   LOCATION                                                VERSION      EDITION
---------------------- ------------------------------------------------------- ------------ ---------
OraHome12C             /u01/app/oracle/product/12.1.0.2                        12.1.0.2.0   DBMS EE
OraDb11g_home1         /u01/app/oracle/product/11.2.0.4                        11.2.0.4.0   DBMS EE
OraGI12Home1           /u01/app/grid/product/grid                              12.1.0.2.0   GRID
agent12c1              /u01/app/oracle/product/agent12c/core/12.1.0.5.0        12.1.0.5.0   AGT

HTH

↧

DBMS_QOPATCH, datapatch, rollback, apply force

November 21, 2016, 7:39 am

≫ Next: trivadis sessions at UKOUG Tech16

≪ Previous: Getting the Oracle Homes in a server from the oraInventory

I am working for a customer on a quite big implementation of Cold Failover Cluster with Oracle Grid Infrastructure on Linux. I hope to have some material to publish soon about it! However, in this post I will be talking about patching the database in a cold-failover environment.

DISCLAIMER: I use massively scripts provided in this great blog post by Simon Pane:

https://www.pythian.com/blog/oracle-database-12c-patching-dbms_qopatch-opatch_xml_inv-and-datapatch/

Thank you Simon for sharing this

Intro

We are not yet in the process of doing out-of-place patching; at the moment the customer prefers to do in-place patching:

evacuate a node by relocating all the databases on other nodes
patching the node binaries
move back the databases and patch them with datapatch
do the same for the remaining nodes

I beg to disagree with this method, being a fan of having many patched golden copies distributed on all servers and patching the databases by just changing the ORACLE_HOME and running datapatch (like Rapid Home Provisioning does). But, this is the situation today, and we have to live with it.

Initial situation

Server 1, 2 and 3: one-off 20139391 applied
New database created

When the DBCA creates a new database, in 12.1.0.2, it does not run datapatch by default, thus, the database does not have any patches installed.

However, this specific one-off patch does not modify anything in the database (sql_patch=false)

SQL> -- Patches installed in the oracle home
SQL> r
  1   with a as (select dbms_qopatch.get_opatch_lsinventory patch_output from dual)
  2   select x.patch_id, x.patch_uid, x.description
  3     from a,
  4          xmltable('InventoryInstance/patches/*'
  5             passing a.patch_output
  6             columns
  7                patch_id number path 'patchID',
  8                patch_uid number path 'uniquePatchID',
  9                description varchar2(80) path 'patchDescription',
 10                sql_patch varchar2(8) path 'sqlPatch'
 10          ) x
 11 *

  PATCH_ID  PATCH_UID DESCRIPTION               SQL_PATCH
---------- ---------- ------------------------- ---------
  20139391   18466820                           false

SQL> -- Patches installed in the database
SQL> select s.patch_id, s.patch_uid, s.description from dba_registry_sqlpatch s;
no rows selected

SQL>

and the datapatch runs without touching the db:

oracle1> $ORACLE_HOME/OPatch/datapatch -verbose
SQL Patching tool version 12.2.0.0.0 on Wed Nov  2 13:34:10 2016
Copyright (c) 2014, Oracle.  All rights reserved.

Connecting to database...OK
Determining current state...done

Current state of SQL patches:

Adding patches to installation queue and performing prereq checks...
Installation queue:
  Nothing to roll back
  Nothing to apply

SQL Patching tool complete on Wed Nov  2 13:34:13 2016
oracle1>

Next step: I evacuate the server 2 and patch it, then I relocate my database on it

oracle2> $ORACLE_HOME/OPatch/opatch lspatches
24340679;DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679)

OPatch succeeded.
oracle2>
oracle2> crsctl relocate res theludot.db -n oracle2
CRS-2673: Attempting to stop 'theludot.db' on 'oracle1'
CRS-2677: Stop of 'theludot.db' on 'oracle1' succeeded
CRS-2672: Attempting to start 'theludot.db' on 'oracle2'
CRS-2676: Start of 'theludot.db' on 'oracle2' succeeded
oracle2>

Now the database is not at the same level of the binaries and need to be patched:

SQL> -- Patches installed in the oracle home
SQL> r
  1  with a as (select dbms_qopatch.get_opatch_lsinventory patch_output from dual)
  2   select x.*
  3     from a,
  4   xmltable('InventoryInstance/patches/*'
  5   passing a.patch_output
  6   columns
  7      patch_id number path 'patchID',
  8      patch_uid number path 'uniquePatchID',
  9      description varchar2(80) path 'patchDescription',
 10    constituent number path 'constituent',
 11    patch_type varchar2(20) path 'patchType',
 12    rollbackable varchar2(20) path 'rollbackable',
 13    sql_patch varchar2(8) path 'sqlPatch',
 14    DBStartMode varchar2(10) path 'sqlPatchDatabaseStartupMode'
 15*  ) x

  PATCH_ID  PATCH_UID DESCRIPTION                                        CONSTITUENT PATCH_TYPE           ROLLBACKABLE SQL_PATC DBSTARTMOD
---------- ---------- -------------------------------------------------- ----------- -------------------- ------------ -------- ----------
  24340679   20646358 DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679)     24340679 singleton            true         true     normal
  23144544   20247727 DATABASE BUNDLE PATCH: 12.1.0.2.160719 (23144544)     24340679 singleton            true         true     normal
  22806133   19983161 DATABASE BUNDLE PATCH: 12.1.0.2.160419 (22806133)     24340679 singleton            true         true     normal
  21949015   19576071 DATABASE BUNDLE PATCH: 12.1.0.2.160119 (21949015)     24340679 singleton            true         true     normal
  21694919   19338504 DATABASE BUNDLE PATCH: 12.1.0.2.13 (21694919)         24340679 singleton            true         true     normal
  21527488   19238856 DATABASE BUNDLE PATCH: 12.1.0.2.12 (21527488)         24340679 singleton            true         true     normal
  21359749   19147148 DATABASE BUNDLE PATCH: 12.1.0.2.11 (21359749)         24340679 singleton            true         true     normal
  21125181   18992109 DATABASE BUNDLE PATCH: 12.1.0.2.10 (21125181)         24340679 singleton            true         true     normal
  20950328   18903184 DATABASE BUNDLE PATCH: 12.1.0.2.9 (20950328)          24340679 singleton            true         true     normal
  20788771   18810992 DATABASE BUNDLE PATCH: 12.1.0.2.8 (20788771)          24340679 singleton            true         true     normal
  20594149   18687526 DATABASE BUNDLE PATCH: 12.1.0.2.7 (20594149)          24340679 singleton            true         true     normal
  20415006   18565812 DATABASE BUNDLE PATCH: 12.1.0.2.6 (20415006)          24340679 singleton            true         true     normal
  20243804   18468778 DATABASE BUNDLE PATCH: 12.1.0.2.5 (20243804)          24340679 singleton            true         true     normal

The column CONSTITUENT is important here because it tells us what the parent patch_id is. This is the column that we have to check when we want to know if the patch has been applied on the database.

oracle2> $ORACLE_HOME/OPatch/datapatch -verbose
SQL Patching tool version 12.1.0.2.0 on Wed Nov  2 13:47:49 2016
Copyright (c) 2016, Oracle.  All rights reserved.

Log file for this invocation: /u01/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_63956_2016_11_02_13_47_49/sqlpatch_invocation.log

Connecting to database...OK
Bootstrapping registry and package to current versions...done
Determining current state...done

Current state of SQL patches:
Bundle series DBBP:
  ID 161018 in the binary registry and not installed in the SQL registry

Adding patches to installation queue and performing prereq checks...
Installation queue:
  Nothing to roll back
  The following patches will be applied:
    24340679 (DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679))

Installing patches...
Patch installation complete.  Total patches installed: 1

Validating logfiles...
Patch 24340679 apply: SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/24340679/20646358/24340679_apply_THELUDOT_2016Nov02_13_48_03.log (no errors)
SQL Patching tool complete on Wed Nov  2 13:49:51 2016
oracle2>

Now the patch is visible inside the dba_registry_sqlpatch:

SQL> r
  1* select patch_id, patch_uid, description, action_time, action, status, bundle_series, bundle_id  from dba_registry_sqlpatch

  PATCH_ID  PATCH_UID DESCRIPTION                                        ACTION_TIME                    ACTION          STATUS   BUNDLE_SERIES  BUNDLE_ID
---------- ---------- -------------------------------------------------- ------------------------------ --------------- -------- ------------- ----------
  24340679   20646358 DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679)  02-NOV-16 01.49.51.664800 PM   APPLY           SUCCESS  DBBP              161018

Notice that the child patches are not listed in thie view.

Rolling back

Now, one node is patched, but the others are not. What happen if I relocate the patched database to a non-patched node?

oracle1> crsctl relocate res theludot.db -n oracle1
CRS-2673: Attempting to stop 'theludot.db' on 'oracle2'
CRS-2677: Stop of 'theludot.db' on 'oracle2' succeeded
CRS-2672: Attempting to start 'theludot.db' on 'oracle1'
CRS-2676: Start of 'theludot.db' on 'oracle1' succeeded
oracle1>

The patch is applied inside the database but not in the binaries!

SQL>  select patch_id, patch_uid, description, action_time, action, status, bundle_series, bundle_id
  2   from dba_registry_sqlpatch;

  PATCH_ID  PATCH_UID DESCRIPTION                                        ACTION_TIME                    ACTION          STATUS   BUNDLE_SERIES  BUNDLE_ID
---------- ---------- -------------------------------------------------- ------------------------------ --------------- -------- ------------- ----------
  24340679   20646358 DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679)  02.11.16 13:49:51.664800       APPLY           SUCCESS  DBBP              161018

SQL> r
  1  with a as (select dbms_qopatch.get_opatch_lsinventory patch_output from dual)
  2   select x.*
  3     from a,
  4   xmltable('InventoryInstance/patches/*'
  5   passing a.patch_output
  6   columns
  7      patch_id number path 'patchID',
  8      patch_uid number path 'uniquePatchID',
  9      description varchar2(80) path 'patchDescription',
 10    constituent number path 'constituent',
 11    patch_type varchar2(20) path 'patchType',
 12    rollbackable varchar2(20) path 'rollbackable',
 13    sql_patch varchar2(8) path 'sqlPatch',
 14    DBStartMode varchar2(10) path 'sqlPatchDatabaseStartupMode'
 15* ) x

  PATCH_ID  PATCH_UID DESCRIPTION                                        CONSTITUENT PATCH_TYPE           ROLLBACKABLE SQL_PATC DBSTARTMOD
---------- ---------- -------------------------------------------------- ----------- -------------------- ------------ -------- ----------
  20139391   18466820                                                                singleton            true         false

If I run datapatch again, the patch is rolled back:

oracle1> $ORACLE_HOME/OPatch/datapatch -verbose
SQL Patching tool version 12.2.0.0.0 on Wed Nov  2 14:48:50 2016
Copyright (c) 2014, Oracle.  All rights reserved.

Connecting to database...OK
Determining current state...done

Current state of SQL patches:

Adding patches to installation queue and performing prereq checks...
Installation queue:
  The following patches will be rolled back:
    24340679 (DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679))
  Nothing to apply

catcon: ALL catcon-related output will be written to /tmp/sqlpatch_catcon__catcon_24776.lst
catcon: See /tmp/sqlpatch_catcon_*.log files for output generated by scripts
catcon: See /tmp/sqlpatch_catcon__*.lst files for spool files, if any
Installing patches...
Patch installation complete.  Total patches installed: 1

Validating logfiles...
Patch 24340679 rollback: SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/24340679/20646358/24340679_rollback_THELUDOT_2016Nov. 02_14_48_53.log (no errors)
SQL Patching tool complete on Wed Nov  2 14:48:53 2016
oracle1>

The patch has been rolled back according to the datapatch, and the action is shown in the dba_registry_sqlpatch:

SQL> r
  1   select patch_id, patch_uid, description, action_time, action, status, bundle_series, bundle_id
  2*  from dba_registry_sqlpatch

  PATCH_ID  PATCH_UID DESCRIPTION                                        ACTION_TIME                    ACTION          STATUS   BUNDLE_SERIES  BUNDLE_ID
---------- ---------- -------------------------------------------------- ------------------------------ --------------- -------- ------------- ----------
  24340679   20646358 DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679)  02.11.16 13:49:51.664800       APPLY           SUCCESS  DBBP              161018
  24340679   20646358                                                    02.11.16 14:48:53.760632       ROLLBACK        SUCCESS

But if I look at the logfile, the patch had some errors:

oracle1> grep "ORA-\|PLS-" /tmp/sqlpatch_catcon_0.log
ORA-20001: set_patch_metadata not called
ORA-06512: a "SYS.DBMS_SQLPATCH", ligne 621
ORA-06512: a ligne 2
IGNORABLE ERRORS: ORA-02303
IGNORABLE ERRORS: ORA-01418
IGNORABLE ERRORS: ORA-01435
IGNORABLE ERRORS: ORA-01435
IGNORABLE ERRORS: ORA-01435
IGNORABLE ERRORS: ORA-01435
IGNORABLE ERRORS: ORA-01435
IGNORABLE ERRORS: ORA-01435
ORA-01555: cliches trop vieux : rollback segment no , nomme "", trop petit
ORA-22924: cliche trop ancien
ORA-06512: a "SYS.DBMS_SQLPATCH", ligne 102
ORA-06512: a "SYS.DBMS_SQLPATCH", ligne 663
ORA-06512: a ligne 1

Indeed, the patch looks still there:

SQL> r
  1  SELECT dbms_sqlpatch.sql_registry_state
  2* FROM dual

SQL_REGISTRY_STATE
--------------------------------------------------------------------------------
<sql_registry_state>
  <!-- Non bundle patches -->
  <!-- Bundle patches -->
  <patch bundle="yes" id="24340679" uid="20646358" action="APPLY" status="SUCCES
S" bundle_series="DBBP" bundle_id="161018">DBBP bundle patch 161018 (DATABASE BU
NDLE PATCH: 12.1.0.2.161018 (24340679))</patch>
</sql_registry_state>

If I try to run it again, it does nothing/it fails saying the patch is not there:

oracle1> $ORACLE_HOME/OPatch/datapatch -rollback 24340679
SQL Patching tool version 12.2.0.0.0 on Wed Nov  2 16:10:49 2016
Copyright (c) 2014, Oracle.  All rights reserved.

Connecting to database...OK
Determining current state...done
Adding patches to installation queue and performing prereq checks...done
Installation queue:
  Nothing to roll back
  Nothing to apply

SQL Patching tool complete on Wed Nov  2 16:10:51 2016

oracle1> $ORACLE_HOME/OPatch/datapatch -rollback 24340679 -force
SQL Patching tool version 12.2.0.0.0 on Wed Nov  2 16:11:01 2016
Copyright (c) 2014, Oracle.  All rights reserved.

Connecting to database...OK
Determining current state...done

Error: prereq checks failed!
  patch 24340679: Could not determine unique patch ID for patch 24340679 because it is not present in the SQL registry
Prereq check failed, exiting without installing any patches.

Please refer to MOS Note 1609718.1 for information on how to resolve the above errors.

SQL Patching tool complete on Wed Nov  2 16:11:01 2016

What does it say on the patched node?

oracle2> crsctl relocate res theludot.db -n oracle2
CRS-2673: Attempting to stop 'theludot.db' on 'oracle1'
CRS-2677: Stop of 'theludot.db' on 'oracle1' succeeded
CRS-2672: Attempting to start 'theludot.db' on 'oracle2'
CRS-2676: Start of 'theludot.db' on 'oracle2' succeeded
oracle2>
oracle2> $ORACLE_HOME/OPatch/datapatch -verbose
SQL Patching tool version 12.1.0.2.0 on Wed Nov  2 16:15:36 2016
Copyright (c) 2016, Oracle.  All rights reserved.

Log file for this invocation: /u01/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_7878_2016_11_02_16_15_36/sqlpatch_invocation.log

Connecting to database...OK
Bootstrapping registry and package to current versions...done
Determining current state...done

Current state of SQL patches:
Bundle series DBBP:
  ID 161018 in the binary registry and ID 161018 in the SQL registry

Adding patches to installation queue and performing prereq checks...
Installation queue:
  Nothing to roll back
  Nothing to apply

SQL Patching tool complete on Wed Nov  2 16:15:49 2016

Whaaat? datapatch there says that the patch IS in the registry and there’s nothing to do. Let’s try to force its apply again:

oracle2> $ORACLE_HOME/OPatch/datapatch -verbose -apply 24340679 -force
SQL Patching tool version 12.1.0.2.0 on Wed Nov  2 16:17:40 2016
Copyright (c) 2016, Oracle.  All rights reserved.

Log file for this invocation: /u01/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_12726_2016_11_02_16_17_40/sqlpatch_invocation.log

Connecting to database...OK
Determining current state...done

Current state of SQL patches:
Bundle series DBBP:
  ID 161018 in the binary registry and ID 161018 in the SQL registry

Adding patches to installation queue and performing prereq checks...
Installation queue:
  Nothing to roll back
  The following patches will be applied:
    24340679 (DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679))

Installing patches...
Patch installation complete.  Total patches installed: 1

Validating logfiles...
Patch 24340679 apply: SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/24340679/20646358/24340679_apply_THELUDOT_2016Nov02_16_17_40.log (no errors)
SQL Patching tool complete on Wed Nov  2 16:18:50 2016

SQL> r
  1  select patch_id, patch_uid, description, action_time, action, status, bundle_series, bundle_id
  2* from dba_registry_sqlpatch

  PATCH_ID  PATCH_UID DESCRIPTION                                        ACTION_TIME                    ACTION          STATUS   BUNDLE_SERIES  BUNDLE_ID
---------- ---------- -------------------------------------------------- ------------------------------ --------------- -------- ------------- ----------
  24340679   20646358 DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679)  02-NOV-16 01.49.51.664800 PM   APPLY           SUCCESS  DBBP              161018
  24340679   20646358                                                    02-NOV-16 02.48.53.760632 PM   ROLLBACK        SUCCESS
  24340679   20646358 DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679)  02-NOV-16 04.18.50.320745 PM   APPLY           SUCCESS  DBBP              161018

Conclusion

I’m not sure whether it is safe to run the patched database in a non-patched Oracle Home. I guess it is time for a new SR

Meanwhile, we will try hard not to relocate the databases once they have been patched.

Cheers

—

Ludo

↧

trivadis sessions at UKOUG Tech16

November 28, 2016, 4:09 am

≫ Next: Souvenirs from 2016

≪ Previous: DBMS_QOPATCH, datapatch, rollback, apply force

UKOUG Tech16 will start in less than a week. Trivadis will be there with many speakers, 10 sessions in total
If you are a delegate, come along and have a chat with us!

Super Sunday

Philipp Salvisberg – Fighting Bad PL/SQL – 04/12/2016 13:40 – Executive Room 8

Monday 05/12

Guido Schmutz – IoT Architecture – Are Traditional Architectures Good Enough or do we Need New Approaches? – 05/12/2016 08:50 – Hall 6A
Guido Schmutz – Apache Kafka – Scalable Message Processing & More! – 05/12/2016 16:35 – Executive Room 5

Tuesday 06/12

Christian Antognini – Identification of Performance Problems Without the Diagnostic Pack – 06/12/2016 09:00 – Hall 5
Jonathan Lewis, Maria Colgan, Nigel Bayliss, Christian Antognini – Cost Based Optimisation – The Panel Session – 06/12/2016 12:35 – Hall 5

Wednesday 07/12

Ludovico Caldara – Upgrading 300 Databases to 12c in 300 Days. What Can Go Wrong? – 07/12/2016 12:30 – Hall 9
Christian Antognini – Bloom Filters – 07/12/2016 14:10 – Hall 5
Ludovico Caldara – Adaptive Features or: How I Learned to Stop Worrying & Troubleshoot the Bomb – 07/12/2016 15:10 – Hall 5
Markus Flechtner – Application Containers – Multitenancy for Database Applications – 07/12/2016 15:10 – Hall 9
Tobias Deml – Oracle Database as a Service – Facts & First Experiences – 07/12/2016 15:10 – Hall 10B

See you there

↧

Souvenirs from 2016

December 27, 2016, 9:38 am

≫ Next: RMAN Catalog Housekeeping: how to purge the old incarnations

≪ Previous: trivadis sessions at UKOUG Tech16

The 2016 is ending, at least from the Oracle Community point of view. It has been tiring and exciting at the same time, so I would like to put some good memories together.

This post is mostly for me, sorry

February: Another nice Tech Event

Trivadis Tech Event is a great conference, sadly not open for everyone, but still a great one… Got two (or three?) talks there.

March: a good beer in good company

Nearby the CERN, in Geneva, with a few good friends and big technologists:-)

March again: That ACE Director tweet

Welcome and congratulations to new #ACEDs @ludodba @csierra_usa @gokhanatil!

— Oracle ACE Program (@oracleace) 24 marzo 2016

May: The DOAG Datenbank 2016

One speech there… the first of many about “upgrading 300 databases in 300 days”. It was my first time speaking in Germany.

May again: The Italian leg of the OTN EMEA Tour

The OTN tour has been a great starter for the activities of the Italian Oracle User Group (of which I am one of the founders). It was great to discover that the interest for Oracle Database in Italy is still high (we got almost 60 people: that is huge for a first event, IMO).

We had Mark Rittman (before he became famous :-D), Christian Antognini, Frits Hoogland and Mike Dietrich!

September: the ACED briefing, Oracle Open World and three spare days at Yosemite

It was my first time at the ACED briefind (key word: #cloud ) and also the first at Oracle HQ. It’s like going to Disney World, but the attractions are a little more scary

The Yosemite was also incredible. In a single day of trekking, I scored 42k steps, 31km, +1200 vertical meters…

October: the great OTN Nordic Tour

That was fun, but incredibly tiring. 4 days in a row, 4 countries, 4 fligths, 4 different currencies, 4 ACE Directors and now 4 friends

I did not know very well Joel and Martin and I did not know John at all. They are great people and I enjoyed a lot the time spent with them (and the beers :-D).

Copenhagen

Oslo

Helsinki

Stockholm

Stockholm was the last leg, I did it with John only. There I spent the rest of the week-end (the event was on Friday). I love Stockholm so much! Perhaps my favorite city (for sure in the top 5). I have also got a good whisky as speaker gift

November: thesecond Italian Oracle User Group Event

We had again 60 people. In november I have also been speaker along with Christian Antognini, Mauro Pagano, Francesco Renne and Francesco Tisiot.

November again: the DOAG

Definitely the best conference in Europe It was my second time there and first one as speaker.

November again: the Swiss Data Forum

It has been a great single-day event in Lausanne, not database centric but DATA centric, about Data, IoT, Big Data, Data Science, Deep Learning… I had one speech there.

December: the UKOUG Tech 16

Two final speeches at UKOUG in Birmingham. It was fun again, but the last day I did fell sick (and some how I am still recovering).

Plans for the 2017

I have got accepted for the IOUG Collaborate, but because of the many duties and all the recent travel, I have not confirmed my sessions (ouch, it is the first time that I do this, next time I will submit more carefully), so Open World will likely be my only US trip next year.

I look forward to submit for DOAG events again, speaking at SOUG (it’s already planned: 18th and 19th of May), and organizing at least two more events for the Italian Oracle User Group.

Happy New Year!

↧

RMAN Catalog Housekeeping: how to purge the old incarnations

February 21, 2017, 5:42 am

≫ Next: Another problem with “KSV master wait” and “ASM file metadata operation”

≪ Previous: Souvenirs from 2016

First, let me apologize because every post in my blog starts with a disclaimer… but sometimes it is really necessary.

Disclaimer: this blog post contains PL/SQL code that deletes incarnations from your RMAN recovery catalog. Please DON’T use it unless you deeply understand what you are doing, as it can compromise your backup and recovery strategy.

Small introduction

You may have a central RMAN catalog that stores all the backup metadata for your databases. If it is the case, you will have a database entry for each of your databases and a new incarnation entry for each duplicate, incomplete recovery or flashback (or whatever).

You should also have a delete strategy that deletes the obsolete backups from either your DISK or SBT_TAPE media. If you have old incarnations, however, after some time you will notice that their information never goes away from your catalog, and you may end up soon or later to do some housekeeping. But there is nothing more tedious than checking and deleting the incarnations one by one, especially if you have average big numbers like this catalog:

SQL> SELECT count(*) FROM db;

  COUNT(*)
----------
      1843

SQL> SELECT count(*) FROM dbinc;

  COUNT(*)
----------
      3870

SQL> SELECT count(*) FROM bdf;

  COUNT(*)
----------
   4130959

SQL> SELECT count(*) FROM brl;


  COUNT(*)
----------
  14876291

Where db, dbinc, bdf and brl contain reslectively the registered databases, incarnations, datafile backups and archivelog backups.

Different incarnations?

Consider the following query:

col dbinc_key_ for a60
set pages 100 lines 200
SELECT lpad(' ',2*(level-1))
  || TO_CHAR(DBINC_KEY) AS DBINC_KEY_,
  db_key,
  db_name,
  TO_CHAR(reset_time,'YYYY-MM-DD HH24:MI:SS'),
  dbinc_status
FROM rman.dbinc
  START WITH PARENT_DBINC_KEY IS NULL
  CONNECT BY prior DBINC_KEY   = PARENT_DBINC_KEY
ORDER BY db_name , db_key, level
;

You can run it safely: it returns the list of incarnations hierarchically connected to their parent, by database name, key and level.

Then you have several types of behaviors:

Normal databases (created once, never restored or flashed back) will have just one or two incarnations (it depends on how they are created):

DBINC_KEY                      DB_KEY DB_NAME  TO_CHAR(RESET_TIME, DBINC_ST
-------------------------- ---------- -------- ------------------- --------
104547357                   104546534 VxxxxxxP 2010-09-05 05:49:10 PARENT
  104546535                 104546534 VxxxxxxP 2012-01-18 09:31:01 CURRENT

They are usually the ones that you may want to keep in your catalog, unless the database no longer exist: in this case perhaps you omitted the deletion from the catalog when you have dropped your database?

Flashed back databases (flashed back multiple times) will have as many incarnations as the number of flashbacks, but all connected with the incarnation prior to the flashback:

DBINC_KEY                                                        DB_KEY DB_NAME  TO_CHAR(RESET_TIME, DBINC_ST
------------------------------------------------------------ ---------- -------- ------------------- --------
1164696351                                                   1164696336 VxxxxxxD 2014-07-07 05:38:47 PARENT
  1164696337                                                 1164696336 VxxxxxxD 2014-12-10 07:39:14 PARENT
    1328815631                                               1164696336 VxxxxxxD 2016-05-12 08:22:23 PARENT
      1329299866                                             1164696336 VxxxxxxD 2016-05-13 13:02:35 PARENT
        1329299867                                           1164696336 VxxxxxxD 2016-05-13 14:05:53 PARENT
          1329299833                                         1164696336 VxxxxxxD 2016-05-13 18:26:59 PARENT
            1331239226                                       1164696336 VxxxxxxD 2016-05-17 08:09:04 PARENT
              1331395102                                     1164696336 VxxxxxxD 2016-05-17 13:20:17 PARENT
                1331815030                                   1164696336 VxxxxxxD 2016-05-18 07:32:13 PARENT
                  1331814966                                 1164696336 VxxxxxxD 2016-05-18 10:57:45 PARENT
                    1387023006                               1164696336 VxxxxxxD 2016-07-13 09:33:05 PARENT
                      1407484366                             1164696336 VxxxxxxD 2016-09-09 13:25:31 PARENT
                        1419007163                           1164696336 VxxxxxxD 2016-10-17 14:32:59 PARENT
                          1436430179                         1164696336 VxxxxxxD 2016-12-12 15:13:55 PARENT
                            1436430034                       1164696336 VxxxxxxD 2016-12-12 16:28:57 PARENT
                              1437118959                     1164696336 VxxxxxxD 2016-12-14 14:48:53 PARENT
                                1437365509                   1164696336 VxxxxxxD 2016-12-15 09:45:00 PARENT
                                  1437365456                 1164696336 VxxxxxxD 2016-12-15 11:11:06 PARENT
                                    1437484026               1164696336 VxxxxxxD 2016-12-15 13:26:37 PARENT
                                      1437483983             1164696336 VxxxxxxD 2016-12-15 17:21:11 PARENT
                                        1437822754           1164696336 VxxxxxxD 2016-12-16 12:07:46 CURRENT

Here, despite you have several incarnations, they all belong to the same database (same DB_KEY and DBID), then you must also keep it inside the recovery catalog.

Non-production databases that are frequently refreshed from the production database (via duplicate) will have several incarnations with different DBIDs and DB_KEY:

DBINC_KEY                   DB_KEY DB_NAME  TO_CHAR(RESET_TIME, DBINC_ST
----------------------- ---------- -------- ------------------- --------
1173852671              1173852633 VxxxxxxV 2014-07-07 05:38:47 PARENT
  1173852635            1173852633 VxxxxxxV 2015-01-16 07:29:01 PARENT
    1188550385          1173852633 VxxxxxxV 2015-03-16 16:06:00 CURRENT
1220896058              1220896027 VxxxxxxV 2015-02-27 16:25:13 PARENT
  1220896028            1220896027 VxxxxxxV 2015-07-17 08:06:00 CURRENT
1233975755              1233975724 VxxxxxxV 2015-02-27 16:25:13 PARENT
  1233975725            1233975724 VxxxxxxV 2015-09-10 11:23:53 CURRENT
1244346785              1244346754 VxxxxxxV 2015-02-27 16:25:13 PARENT
  1244346755            1244346754 VxxxxxxV 2015-10-23 07:46:34 CURRENT
1281775847              1281775816 VxxxxxxV 2015-02-27 16:25:13 PARENT
  1281775817            1281775816 VxxxxxxV 2016-02-08 09:44:03 CURRENT
1317447322              1317447257 VxxxxxxV 2015-02-27 16:25:13 PARENT
  1317447258            1317447257 VxxxxxxV 2016-04-07 12:20:56 CURRENT
1323527351              1323527316 VxxxxxxV 2015-02-27 16:25:13 PARENT
  1323527317            1323527316 VxxxxxxV 2016-04-29 10:09:12 CURRENT
1437346753              1437346718 VxxxxxxV 2015-02-27 16:25:13 PARENT
  1437346719            1437346718 VxxxxxxV 2016-12-12 13:33:31 CURRENT

This is usually the most frequent case: here you want to delete the old incarnations, but only as far as there are no backups attached to them that are still in the recovery window.

You may also have orphaned incarnations:

DBINC_KEY                                                        DB_KEY DB_NAME  TO_CHAR(RESET_TIME, DBINC_ST
------------------------------------------------------------ ---------- -------- ------------------- --------
1262827482                                                   1262827435 TxxxxxxT 2014-07-07 05:38:47 PARENT
  1262827436                                                 1262827435 TxxxxxxT 2016-01-05 12:15:22 PARENT
    1267262014                                               1262827435 TxxxxxxT 2016-01-19 09:15:47 PARENT
      1267290962                                             1262827435 TxxxxxxT 2016-01-19 11:09:05 PARENT
        1284933855                                           1262827435 TxxxxxxT 2016-02-11 11:18:52 PARENT
          1299685647                                         1262827435 TxxxxxxT 2016-02-23 13:40:18 ORPHAN
          1299837528                                         1262827435 TxxxxxxT 2016-02-23 17:08:36 CURRENT
          1299767977                                         1262827435 TxxxxxxT 2016-02-23 15:34:13 ORPHAN
          1298269640                                         1262827435 TxxxxxxT 2016-02-22 13:16:46 ORPHAN
            1299517249                                       1262827435 TxxxxxxT 2016-02-23 10:37:29 ORPHAN

In this case, again, it depends whether the DBID and DB_KEY are the same as the current incarnation or not.

What do you need to delete?

Basically:

Incarnations of databases that no longer exist
Incarnations of existing databases where the database has a more recent current incarnation, only if there are no backups still in the retention window

How to do it?

In order to be sure 100% that you can delete an incarnation, you have to verify that there are no recent backups (for instance, no backups more rercent than the current recovery window for that database). If the database does not have a specified recovery window but rather a default “CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default”, it is a bit more problematic… in this case let’s assume that we consider “old” an incarnation that does not backup since 1 year (365 days), ok?

Getting the last backup of each database

Sadly, there is not a single table where you can verify that. You have to collect the information from several tables. I think bdf, al, cdf, bs would suffice in most cases.

When you delete an incarnation you specify a db_key: you have to get the last backup for each db_key, with queries like this:

SELECT dbinc_key
     ,max(completion_time) max_al_time
  FROM al
    GROUP by dbinc_key;

Putting together all the tables:

WITH
   incs AS (
      SELECT lpad(' ',2*(level-1))|| to_char(dbinc_key) AS dbinc_key_
	     ,dbinc_key
         ,db_key
	     ,db_name
	     ,reset_time
	     ,dbinc_status
      FROM rman.dbinc
        START WITH parent_dbinc_key IS NULL
      CONNECT BY PRIOR dbinc_key   = parent_dbinc_key
        ORDER BY db_name, db_key, level
    ),
   mbdf AS (
      SELECT dbinc_key
	     ,max(completion_time) max_bdf_time
	  FROM bdf
	     GROUP by dbinc_key
   ),
   mbrl AS (
      SELECT dbinc_key
	     ,max(next_time) max_brl_time
	  FROM brl
	     GROUP by dbinc_key
   ),
   mal AS (
      SELECT dbinc_key
	     ,max(completion_time) max_al_time
	  FROM al
	     GROUP by dbinc_key
   ),
   mcdf AS (
      SELECT dbinc_key
	     ,max(completion_time) max_cdf_time
	  FROM cdf
	     GROUP by dbinc_key
   ),
   mbs AS (
      SELECT db_key
	     ,max(completion_time) max_bs_time
	  FROM bs
	     GROUP by db_key
   )
SELECT incs.db_key, db.db_id, db.REG_DB_UNIQUE_NAME AS db_uq_name , incs.db_name, dbinc_status
  ,greatest(
     nvl(max_bdf_time,to_date('1970-01-01','YYYY-MM-DD')),
	 nvl(max_brl_time,to_date('1970-01-01','YYYY-MM-DD')),
	 nvl(max_al_time,to_date('1970-01-01','YYYY-MM-DD')),
	 nvl(max_cdf_time,to_date('1970-01-01','YYYY-MM-DD')),
	 nvl(max_bs_time,to_date('1970-01-01','YYYY-MM-DD'))
	 ) AS last_bck
FROM incs
  JOIN db ON (db.db_key=incs.db_key)
  LEFT OUTER JOIN mbdf ON (incs.dbinc_key=mbdf.dbinc_key)
  LEFT OUTER JOIN mcdf ON (incs.dbinc_key=mcdf.dbinc_key)
  LEFT OUTER JOIN mbrl ON (incs.dbinc_key=mbrl.dbinc_key)
  LEFT OUTER JOIN mal ON (incs.dbinc_key=mal.dbinc_key)
  LEFT OUTER JOIN mbs ON (incs.db_key=mbs.db_key)
;

Getting the recovery window

The configuration information for each database is stored inside the conf table, but the retention information is stored in a VARCHAR2, either ‘TO RECOVERY WINDOW OF % DAYS’ or ‘TO REDUNDANCY %’

You need to convert it to a number when the retention policy is recovery windows, otherwise you default it to 365 days wher the redundancy is used. You can add a column and a join to the query:

-- new column in the projection
,nvl(to_number(substr(c.value,23,length(c.value)-27)),365) as days

-- new join in the "from"
LEFT OUTER JOIN conf c ON (c.db_key=incs.db_key AND c.NAME = 'RETENTION POLICY' AND value LIKE 'TO RECOVERY WINDOW OF %')

and eventually, either display if it the incarnation is no more used or filter by usage:

-- display if the incarnation is still used
,CASE WHEN
     greatest(
     nvl(max_bdf_time,to_date('1970-01-01','YYYY-MM-DD')),
	 nvl(max_brl_time,to_date('1970-01-01','YYYY-MM-DD')),
	 nvl(max_al_time,to_date('1970-01-01','YYYY-MM-DD')),
	 nvl(max_cdf_time,to_date('1970-01-01','YYYY-MM-DD')),
	 nvl(max_bs_time,to_date('1970-01-01','YYYY-MM-DD'))
	 ) < (sysdate - nvl(to_number(substr(c.value,23,length(c.value)-27)),365))
	 THEN 'OLD ONE!'
	 ELSE 'USED'
  END AS USED

-- or filter it
WHERE greatest(
     nvl(max_bdf_time,to_date('1970-01-01','YYYY-MM-DD')),
	 nvl(max_brl_time,to_date('1970-01-01','YYYY-MM-DD')),
	 nvl(max_al_time,to_date('1970-01-01','YYYY-MM-DD')),
	 nvl(max_cdf_time,to_date('1970-01-01','YYYY-MM-DD')),
	 nvl(max_bs_time,to_date('1970-01-01','YYYY-MM-DD'))
	 ) < (sysdate - nvl(to_number(substr(c.value,23,length(c.value)-27)),365))

Delete the incarnations!

You can delete the incarnations with this procedure:

BEGIN
  dbms_rcvcat.unregisterdatabase(DB_KEY=>:db_key, DB_ID=>:db_id);
END;

This procedure will raise an exception (-20001, ‘Database not found’) when a database does not exist anymore (either already deleted by this procedure or by another session), so you need to handle it.

Putting all together:

col db_uq_name for a12
col dbinc_key_ for a30
set pages 100 lines 200
set serveroutput on
DECLARE

  e_dbatabase_not_found EXCEPTION;
  PRAGMA EXCEPTION_INIT (e_dbatabase_not_found, -20001);

  CURSOR c_old_incarnations IS
  WITH
   incs AS (
      SELECT lpad(' ',2*(level-1))|| to_char(dbinc_key) AS dbinc_key_
             ,dbinc_key
         ,db_key
             ,db_name
             ,reset_time
             ,dbinc_status
      FROM rman.dbinc
        START WITH parent_dbinc_key IS NULL
      CONNECT BY PRIOR dbinc_key   = parent_dbinc_key
        ORDER BY db_name, db_key, level
    ),
   mbdf AS (
      SELECT dbinc_key
             ,max(completion_time) max_bdf_time
          FROM bdf
             GROUP by dbinc_key
   ),
   mbrl AS (
      SELECT dbinc_key
             ,max(next_time) max_brl_time
          FROM brl
             GROUP by dbinc_key
   ),
   mal AS (
      SELECT dbinc_key
             ,max(completion_time) max_al_time
          FROM al
             GROUP by dbinc_key
   ),
   mcdf AS (
      SELECT dbinc_key
             ,max(completion_time) max_cdf_time
          FROM cdf
             GROUP by dbinc_key
   ),
   mbs AS (
      SELECT db_key
             ,max(completion_time) max_bs_time
          FROM bs
             GROUP by db_key
   )
  SELECT distinct incs.db_key, db.db_id, db.REG_DB_UNIQUE_NAME AS db_uq_name , incs.db_name
  ,greatest(
     nvl(max_bdf_time,to_date('1970-01-01','YYYY-MM-DD')),
         nvl(max_brl_time,to_date('1970-01-01','YYYY-MM-DD')),
         nvl(max_al_time,to_date('1970-01-01','YYYY-MM-DD')),
         nvl(max_cdf_time,to_date('1970-01-01','YYYY-MM-DD')),
         nvl(max_bs_time,to_date('1970-01-01','YYYY-MM-DD'))
         ) AS last_bck
  ,CASE WHEN
     greatest(
     nvl(max_bdf_time,to_date('1970-01-01','YYYY-MM-DD')),
         nvl(max_brl_time,to_date('1970-01-01','YYYY-MM-DD')),
         nvl(max_al_time,to_date('1970-01-01','YYYY-MM-DD')),
         nvl(max_cdf_time,to_date('1970-01-01','YYYY-MM-DD')),
         nvl(max_bs_time,to_date('1970-01-01','YYYY-MM-DD'))
         ) < (sysdate - nvl(to_number(substr(c.value,23,length(c.value)-27)),365))
         THEN 'OLD ONE!'
         ELSE 'USED'
  END AS USED
FROM incs
  JOIN db ON (db.db_key=incs.db_key)
  LEFT OUTER JOIN mbdf ON (incs.dbinc_key=mbdf.dbinc_key)
  LEFT OUTER JOIN mcdf ON (incs.dbinc_key=mcdf.dbinc_key)
  LEFT OUTER JOIN mbrl ON (incs.dbinc_key=mbrl.dbinc_key)
  LEFT OUTER JOIN mal ON (incs.dbinc_key=mal.dbinc_key)
  LEFT OUTER JOIN mbs ON (incs.db_key=mbs.db_key)
  LEFT OUTER JOIN conf c ON (c.db_key=incs.db_key AND c.NAME = 'RETENTION POLICY' AND value LIKE 'TO RECOVERY WINDOW OF %')
 WHERE 1=1
 AND greatest(
     nvl(max_bdf_time,to_date('1970-01-01','YYYY-MM-DD')),
         nvl(max_brl_time,to_date('1970-01-01','YYYY-MM-DD')),
         nvl(max_al_time ,to_date('1970-01-01','YYYY-MM-DD')),
         nvl(max_cdf_time,to_date('1970-01-01','YYYY-MM-DD')),
         nvl(max_bs_time,to_date('1970-01-01','YYYY-MM-DD'))
         ) < (sysdate - nvl(to_number(substr(c.value,23,length(c.value)-27)),365))
  order by 4,3, 5
  ;

   r_old_incarnation    c_old_incarnations%ROWTYPE;

   BEGIN
        OPEN c_old_incarnations;
        LOOP
                FETCH c_old_incarnations INTO r_old_incarnation;
                EXIT WHEN  c_old_incarnations%NOTFOUND;

                dbms_output.put('Purging db: ' || r_old_incarnation.db_name);
                dbms_output.put('       IncKey: ' || r_old_incarnation.db_key);
                dbms_output.put('       DBID: ' || r_old_incarnation.db_id);
                dbms_output.put_line('  Last BCK: ' || to_char(r_old_incarnation.last_bck,'YYYY-MM-DD'));
                BEGIN
                   dbms_rcvcat.unregisterdatabase(DB_KEY => r_old_incarnation.db_key, DB_ID => r_old_incarnation.db_id);
                EXCEPTION
                    WHEN e_dbatabase_not_found THEN
                    dbms_output.put_line('Database already unregistered');
                END;
        END LOOP;

        CLOSE c_old_incarnations;
	
END;
/

I have used this procedure today for the first time and it worked like a charm.

However, if you have any adjustment or suggestion, don’t hesitate to comment it

HTH

↧

Another problem with “KSV master wait” and “ASM file metadata operation”

March 24, 2017, 6:52 am

≫ Next: Which Oracle Databases use most CPU on my server?

≪ Previous: RMAN Catalog Housekeeping: how to purge the old incarnations

My customer today tried to do a duplicate on a cluster. When preparing the auxiliary instance, she noticed that the startup nomount was hanging forever: Nothing in the alert, nothing in the trace files.

Because the database and the spfile were stored inside ASM, I’ve been quite suspicious…

The ASM trace files had the following entries:

kfgbDiscoverNow: called for group 1/0x9f5bfe53 (ACFS)

*** 2017-03-24 12:42:13.327
2017-03-24 12:42:13.327: [    GPNP]clsgpnp_dbmsGetItem_profile: [at clsgpnp_dbms.c:345] Result: (0) CLSGPNP_OK. (:GPNP00401:)got ASM-Profile.DiscoveryString='/dev/mapper/asm_*,/dev/asm_*'

*** 2017-03-24 12:42:15.386
kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53)

*** 2017-03-24 12:42:18.387
kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53)

*** 2017-03-24 12:42:21.393
kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53)

*** 2017-03-24 12:42:24.398
kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53)

*** 2017-03-24 12:42:27.403
kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53)

The ASM instance had the following sessions waiting:

SQL>  select inst_id, sid, serial#, status, event, wait_class, wait_time, logon_time , program, machine from gv$session where wait_class!='Idle' order by sid;

INST_ID  SID SERIAL# STATUS  EVENT                        WAIT_CLASS WAIT_TIME LOGON_TIME          PROGRAM                             MACHINE
------- ---- ------- ------- ---------------------------- ---------- --------- ------------------- ----------------------------------- --------
      2   36   41916 ACTIVE  ASM file metadata operation  Other              0 24.03.2017 13:47:28 oracle@clusrv02 (O001)              clusrv02
      2  266   64885 ACTIVE  KSV master wait              Other              0 24.03.2017 13:47:25 oracletorcl01v@clusrv02 (TNS V1-V3) clusrv02
      1  483   63446 ACTIVE  KSV master wait              Other              0 24.03.2017 13:31:14 oracletorcl01v@clusrv01 (TNS V1-V3) clusrv01
      1  497   31202 ACTIVE  ASM file metadata operation  Other              0 24.03.2017 13:39:07 oracletorcl01v@clusrv01 (TNS V1-V3) clusrv01
      3  708     484 ACTIVE  ASM file metadata operation  Other              0 24.03.2017 12:38:56 OMS                                 omssrv01

OMS?

Around 12:38:56, another colleague in the office added a disk to one of the disk groups, through Enterprise Manager 12c!

But there were no rebalance operations:

SQL> select * from gv$asm_operation;

no rows selected

It’s not the first time that I hit this type of problems. Sadly, sometimes it requires a full restart of the cluster or of ASM (because of different bugs).

This time, however, I have tried to kill only the foreground sessions waiting on “ASM file metadata operation”, starting with the one coming from the OMS.

Surprisingly, after killing that session, everything was fine again:

-- on +ASM3
SQL> alter system kill session '708,484';

System altered.

SQL>

SQL>  select inst_id, sid, serial#, status, event, wait_class, wait_time, logon_time , program, machine from gv$session where wait_class!='Idle' order by sid;

no rows selected

SQL>

I never add disks via OMS (I’m a sqlplus guy ;-)) , I wonder what went wrong with it

—

Ludovico

↧

Which Oracle Databases use most CPU on my server?

May 24, 2017, 7:31 am

≫ Next: Italian Oracle User Group: from nothing to something

≪ Previous: Another problem with “KSV master wait” and “ASM file metadata operation”

Assumptions

You have many (hundreds) of instances and more than a couple of servers
One of your servers have high CPU Load
You have Enterprise Manager 12c but the Database Load does not filter by server
You want to have an historical representation of the user CPU utilization, per instance

Getting the data from the EM Repository

With the following query, connected to the SYSMAN schema of your EM repository, you can get the hourly max() and/or avg() of user CPU by instance and time.

SELECT entity_name,
  ROUND(collection_time,'HH') AS colltime,
  ROUND(avg_value,2)/16*100   AS avgv, -- 16 is my number of CPU
  ROUND(max_value,2)/16*100   AS maxv  -- same here
FROM gc$metric_values_hourly mv
JOIN em_targets t
ON (t.target_name         =mv.entity_name)
WHERE t.host_name         ='myserver1'  -- myserver1 is the server that has high CPU Usage
AND mv.metric_column_name = 'user_cpu_time_cnt' -- let's get the user cpu time
AND collection_time>sysdate-14  -- for the lase 14 days
ORDER BY entity_name,
  ROUND(collection_time,'HH');

Suppose you select just the max value: the result will be similar to this:

ENTITY_ COLLTIME          MAXV
------- ----------------  ------
mydbone	10.05.2017 16:00  0.3125
mydbone	10.05.2017 17:00  0.1875
mydbone	10.05.2017 18:00  0.1875
mydbone	10.05.2017 19:00  0.1875
mydbone	10.05.2017 20:00  0.25
mydbone	10.05.2017 21:00  0.125
mydbone	10.05.2017 22:00  0.125
mydbone	10.05.2017 23:00  0.125
mydbone	11.05.2017 00:00  0.1875
mydbone	11.05.2017 01:00  0.125
mydbone	11.05.2017 02:00  0.1875
mydbone	11.05.2017 03:00  0.1875
....                      
mydbone	23.05.2017 20:00  0.125
mydbone	23.05.2017 21:00  0.125
mydbone	23.05.2017 22:00  0.125
mydbone	23.05.2017 23:00  0.0625
mydbtwo	10.05.2017 16:00  0.3125
mydbtwo	10.05.2017 17:00  0.25
mydbtwo	10.05.2017 18:00  0.1875
mydbtwo	10.05.2017 19:00  0.1875
mydbtwo	10.05.2017 20:00  0.3125
mydbtwo	10.05.2017 21:00  0.125
mydbtwo	10.05.2017 22:00  0.125
mydbtwo	10.05.2017 23:00  0.125
.....                     
mydbtwo	14.05.2017 19:00  0.125
mydbtwo	14.05.2017 20:00  0.125
mydbtwo	14.05.2017 21:00  0.125
mydbtwo	14.05.2017 22:00  0.125
mydbtwo	14.05.2017 23:00  0.125
dbthree	10.05.2017 16:00  1.1875
dbthree	10.05.2017 17:00  0.6875
dbthree	10.05.2017 18:00  0.625
dbthree	10.05.2017 19:00  0.5625
dbthree	10.05.2017 20:00  0.8125
dbthree	10.05.2017 21:00  0.5
dbthree	10.05.2017 22:00  0.4375
dbthree	10.05.2017 23:00  0.4375
...

Putting it into excel

There are one million ways to do something more reusable than excel (like rrdtool scripts, gnuplot, R, name it), but Excel is just right for most people out there (including me when I feel lazy).

Configure an Oracle Client and add the ODBC data source to the EM repository:

Open Excel, go to “Data” – “Connections” and add a new connection:
- Search…
- New Source
- DSN ODBC
Select your new ODBC data source, user, password
Uncheck “Connection to a specific table”
Give a name and click Finish
On the DSN -> Properties -> Definition, enter the SQL text I have provided previously

The result should be something similar: ( but much longer :-))

Pivoting the results

Create e new sheet and name it “pivot”, Click on “Create Pivot Table”, select your data and your dimensions:

The result:

Creating the Graph

Now that the data is correctly formatted, it’s easyy to add a graph:

just select the entire pivot table and create a new stacked area graph.

The result will be similar to this:

With such graph, it is easy to spot which databases consumed most CPU on the system in a defined period, and to track the progress if you start a “performance campaign”.

For example, you can see that the “green” and “red” databases were consuming constantly some CPU up to 17.05.2017 and then some magic solved the CPU problem for those instances.

It is also quite convenient for checking the results of new instance caging settings…

The resulting CPU will not necessarily be 100%: the SYS CPU time is not included, as well as the user CPU of all the other processes that are either not DB or not monitored with Enterprise Manager.

HTH

—

Ludovico

↧

Italian Oracle User Group: from nothing to something

June 19, 2017, 10:09 am

≫ Next: Time for an additional RDBMS platform in this blog?

≪ Previous: Which Oracle Databases use most CPU on my server?

I am pretty sure that every user group has had its own difficulties when starting. Because it happened that we started the ITOUG just a couple of years ago, I think it is worth to tell the story

I have been told about a team of italian DBAs willing to create a new user group, back in 2013. I decided to join the team because I was starting my journey in the Oracle Community.

We were coming from different cities (Lausanne, Vicenza, Milano, Roma)… the only solution was to meet up online, through Google Hangouts.

The first meetings were incredibly boring and not concluding (sorry ITOUG guys if you read this :-)):

You cannot start something if you do not know what you want to achieve

Of course, everybody was agreeing that it would have been nice to become the new UKOUG in the southern Europe. But being realistic: no budget, no spare time, nothing at all, we needed a starting point. We though that the starting point was a website. But even something easy like a basic website forks a lot of additional questions:

Should it allow to publish content?
Should it have a bulletin board?
What about user subscription?
What should be the content? Articles, webinars?

We created something with the idea to publish our content, but after a while it was mostly an empty container.

It took me a while to learn a first big lesson:

Democracy does not work for small User Groups

The original founder of the group decided to quit the ITOUG because nothing concrete was happening. Everybody was putting ideas on the table but nothing was happening, really.

When my friend Björn Rost proposed me to candidate Milano for the OTN Tour, I jumped on the train: it was the best way to start something concrete. I somehow “forced” the OTN Tour candidature to happen, saying to my peers: “I will do it, if you support me, thanks; otherwise I will do it alone”.

And the response was great!

Do not wait for something to happen. If you want it, take the lead and make it.

This is the biggest lesson I have learned from my involvement with user groups. People sometimes cannot contribute because they do not what to do, how to do it, or simply they do not have time because there are a gazillion of things more important than the user groups: family, work, health… even hobbies sometimes are more important

If you have an idea, it’s up to YOU to transform it in something concrete.

After the acceptance, we had to prepare the event. We proposed a few dates and Björn prepared the calendar taking into account the other user groups.

Organizing an event is not easy but not so complex either

Set a reasonable target number of participants
Fix a date when most contributors are available
Get offers from different venues (pricing for the venue and the catering)
Get an idea of the budget
Ask different companies to sponsor the event
Eventually ask Oracle
Once the sponsors are committed to pay, block the venue
Prepare and publish the Call for Paper
Eventually start some advertising
Select the papers
Prepare the agenda
Ask the speakers for confirmation about the proposed date/time
Prepare the event registration form
Publish the agenda
Broadcast it everywhere (Social media, contacts, website, Oracle through their channels)
Interact with the hotel and the sponsor to have the proper setup, billing addresses, invoices, etc.
Host the event
Relax

Finding the sponsors is the most difficult part

It has been easy for the first two events to find a sponsor (just a database stream, event held in Milano), but it was not the same for the last one.

Our aim was to do a double event (Milano + Roma) with two streams in each location (DB + BI & Analytics). In Roma we have been unable to find a sponsor (if you read this AND your company may be interested in sponsoring such event in Roma, please contact me :-)), we decided then to continue with the event in Milano.

Finding the speakers is easier than you can imagine

Unless you want non-english sessions held by native speakers, there is a huge community of speakers willing to share their knowledge. For Oracle, the most obvious source of speakers is the ACE Program, and twitter is probably the best channel for finding them.

Now it has been the third time that we organized an event, and every time we have been surprised by the good attendance and feedback.

A few images from the last event

↧

Time for an additional RDBMS platform in this blog?

July 26, 2017, 7:05 am

≫ Next: PostgreSQL Large Objects and space usage (part 1)

≪ Previous: Italian Oracle User Group: from nothing to something

Since its creation (9 years ago), this blog has been almost only Oracle-oriented. But during my career I worked a lot with other RDBMS technologies… SQL Server, MySQL (and forks), Sybase, PostgreSQL, Progres. Some posts in this blog prove it.

The last two years especially, I have worked a lot with PostgreSQL. In the last few months I have seen many friends and technologists increasing their curiosity in this product. So I think that I will, gently, start blogging also about my experiences with PostgreSQL.

Stay tuned if you are interested!

↧

PostgreSQL Large Objects and space usage (part 1)

July 27, 2017, 8:21 am

≫ Next: PostgreSQL Large Objects and space usage (part 2)

≪ Previous: Time for an additional RDBMS platform in this blog?

PostgreSQL uses a nice, non standard mechanism for big columns called TOAST (hopefully will blog about it in the future) that can be compared to extended data types in Oracle (TOAST rows by the way can be much bigger). But traditional large objects exist and are still used by many customers.

If you are new to large objects in PostgreSQL, read here. For TOAST, read here.

Inside the application tables, the columns for large objects are defined as OIDs that point to data chunks inside the pg_largeobject table.

Because the large objects are created independently from the table columns that reference to it, when you delete a row from the table that points to the large object, the large object itself is not deleted.

Moreover, pg_largeobject stores by design all the large objects that exist in the database.

This makes housekeeping and maintenance of this table crucial for the database administration. (we will see it in a next post)

How is space organized for large objects?

We will see it by examples. Let’s start with an empty database with empty pg_largeobject:

lob_test=# select count(*) from pg_largeobject;
 count
-------
     0
(1 row)

lob_test=# vacuum full pg_largeobject;
VACUUM

lob_test=# select pg_total_relation_size('pg_largeobject');
 pg_total_relation_size
------------------------
                   8192
(1 row)

Just one block. Let’s see its file on disk:

lob_test=# SELECT pg_relation_filepath('pg_largeobject');
 pg_relation_filepath
----------------------
 base/16471/16487
(1 row)

# ls -l base/16471/16487
-rw------- 1 postgres postgres 0 Jul 26 16:58 base/16471/16487

First evidence: the file is empty, meaning that the first block is not created physically until there’s some data in the table (like deferred segment creation in Oracle, except that the file exists).

Now, let’s create two files big 1MB for our tests, one zero-padded and another random-padded:

$ dd if=/dev/zero    of=/tmp/zeroes  bs=1024 count=1024
$ dd if=/dev/urandom of=/tmp/randoms bs=1024 count=1024
$ ls -l /tmp/zeroes /tmp/randoms
-rw-r--r-- 1 postgres postgres 1048576 Jul 26 16:56 /tmp/randoms
-rw-r--r-- 1 postgres postgres 1048576 Jul 26 16:23 /tmp/zeroes

Let’s import the zero-padded one:

lob_test=# \lo_import '/tmp/zeroes';
lo_import 16491
lob_test=# select count(*) from pg_largeobject_metadata;
 count
-------
     1
(1 row)

lob_test=# select count(*) from pg_largeobject;
 count
-------
   512
(1 row)

The large objects are split in chunks big 2048 bytes each one, hence we have 512 pieces. What about the physical size?

lob_test=# select pg_relation_size('pg_largeobject');
 pg_total_relation_size
------------------------
                  40960
(1 row)


bash-4.1$ ls -l 16487*
-rw------- 1 postgres postgres 40960 Jul 26 17:18 16487

Just 40k! This means that the chunks are compressed (like the TOAST pages). PostgreSQL uses the pglz_compress function, its algorithm is well explained in the source code src/common/pg_lzcompress.c.

What happens when we insert the random-padded file?

lob_test=# \lo_import '/tmp/randoms';
lo_import 16492

lob_test=# select count(*) from pg_largeobject where loid=16492;
 count
-------
   512
(1 row)

lob_test=# select pg_relation_size('pg_largeobject');
 pg_relation_size
------------------
          1441792
(1 row)

$ ls -l 16487
-rw------- 1 postgres postgres 1441792 Jul 26 17:24 16487

The segment increased of much more than 1Mb! precisely, 1441792-40960 = 1400832 bytes. Why?

The large object is splitted again in 512 data chinks big 2048 bytes each, and again, PostgreSQL tries to compress them. But because a random string cannot be compressed, the pieces are still (average) 2048 bytes big.

Now, a database block size is 8192 bytes. If we subtract the size of the bloch header, there is not enough space for 4 chunks of 2048 bytes. Every block will contain just 3 non-compressed chunks.

So, 512 chunks will be distributed over 171 blocks (CEIL(512/3.0)), that gives:

lob_test=# select ceil(1024*1024/2048/3.0)*8192;
 ?column?
----------
  1400832
(1 row)

1400832 bytes!

Depending on the compression rate that we can apply to our large objects, we might expect much more or much less space used inside the pg_largeobject table.

↧

PostgreSQL Large Objects and space usage (part 2)

August 9, 2017, 4:49 am

≫ Next: PostgreSQL Large Objects and space usage (part 3)

≪ Previous: PostgreSQL Large Objects and space usage (part 1)

In my previous post I showed how large objects use space inside the table pg_largeobject when inserted.

Let’s see something more:

The table had 2 large objects (for a total of 1024 records):

lob_test=# select pg_relation_size('pg_largeobject');
pg_relation_size
------------------
          1441792
(1 row)

Let’s try to add another random-padded file:

lob_test=# \lo_import '/tmp/randoms';
lo_import 16493
lob_test=# select pg_relation_size('pg_largeobject');
 pg_relation_size
------------------
          2842624
(1 row)

lob_test=# select oid, * from  pg_largeobject_metadata;
  oid  | lomowner | lomacl
-------+----------+--------
 16491 |       10 |
 16492 |       10 |
 16493 |       10 |
(3 rows)

As expected, because a random sequence of characters cannot be compressed, the size increased again by 171 blocks (see my previous post for the explanation)

If you read this nice series of blog posts by Frits Hoogland, you should know about the pageinspect extension and the t_infomask 16-bit mask.

Let’s install it and check the content of the pg_largeobjects pages:

lob_test=# select * from page_header(get_raw_page('pg_largeobject',0));
     lsn     | checksum | flags | lower | upper | special | pagesize | version | prune_xid
-------------+----------+-------+-------+-------+---------+----------+---------+-----------
 18/38004C10 |        0 |     0 |   452 |   488 |    8192 |     8192 |       4 |         0
(1 row)

-- same result (lower 452, upper 488) for blocks 1...3

lob_test=# select * from page_header(get_raw_page('pg_largeobject',4));
     lsn     | checksum | flags | lower | upper | special | pagesize | version | prune_xid
-------------+----------+-------+-------+-------+---------+----------+---------+-----------
 18/380179F8 |        0 |     0 |   360 |  2144 |    8192 |     8192 |       4 |         0
(1 row)


lob_test=# select * from page_header(get_raw_page('pg_largeobject',5));
     lsn     | checksum | flags | lower | upper | special | pagesize | version | prune_xid
-------------+----------+-------+-------+-------+---------+----------+---------+-----------
 18/381386E0 |        0 |     0 |    36 |  1928 |    8192 |     8192 |       4 |         0
(1 row)-- same result for the remaining blocks

We already know the mathematics, but we love having all the pieces come together

We know that: The page header is 24 bytes, and that the line pointers use 4 bytes for each tuple.

The first 4 pages have the lower offset to 452 bytes means that we have (452-24)/4 = 107 tuples.

The 5th page (page number 4) has the lower to 360: (360-24)/4=84 tuples.

The remaining pages have the lower to 36: (36-24)/4 = 3 tuples.

Let’s check if we are right:

lob_test=# select generate_series as page,
 (select count(*) from heap_page_items(get_raw_page('pg_largeobject',generate_series)))  as tuples
 from generate_series(0,5);
 page | tuples
------+--------
    0 |    107
    1 |    107
    2 |    107
    3 |    107
    4 |     84
    5 |      3
(6 rows)

Now, let’s delete the 1Mb file and check the space again:

lob_test=# \lo_unlink 16492
lo_unlink 16492


lob_test=# select pg_relation_size('pg_largeobject');
 pg_relation_size
------------------
          2842624
(1 row)

lob_test=# select oid, * from  pg_largeobject_metadata;
  oid  | lomowner | lomacl
-------+----------+--------
 16491 |       10 |
 16493 |       10 |
(2 rows)

lob_test=# select generate_series as pageno, (select count(*) from heap_page_items(get_raw_page('pg_largeobject',generate_series))  ) from generate_series(0,12);                  pageno | count
--------+-------
      0 |   107
      1 |   107
      2 |   107
      3 |   107
      4 |    84
      5 |     3
      6 |     3
      7 |     3
      8 |     3
      9 |     3
     10 |     3
     11 |     3
     12 |     3

The space is still used and the tuples are still there.

However, we can check that the tuples are no longer used by checking the validity of their t_xmax. In fact, according to the documentation, if the XMAX is invalid the row is at the latest version:

[…] a tuple is the latest version of its row iff XMAX is invalid or t_ctid points to itself (in which case, if XMAX is valid, the tuple is either locked or deleted). […]

(from htup_details.h lines 87-89).

We have to check the infomask against the 12th bit (2048, or 0x0800)

#define HEAP_XMAX_INVALID 0x0800 /* t_xmax invalid/aborted */

lob_test=# select generate_series as pageno, 
  (select count(*) from heap_page_items(get_raw_page('pg_largeobject',generate_series))
  where t_infomask::bit(16) & x'0800'::bit(16) = x'0800'::bit(16)) from generate_series(0,12);
 pageno | count
--------+-------
      0 |   107
      1 |   107
      2 |   107
      3 |   107
      4 |    84
      5 |     0
      6 |     0
      7 |     0
      8 |     0
      9 |     0
     10 |     0
     11 |     0
     12 |     0

Here we go. The large objects are split in compressed chunks that internally behave the same way as regular rows!

If we import another lob we will see that the space is not reused:

lob_test=# \lo_import '/tmp/randoms';
lo_import 16520
lob_test=# select pg_relation_size('pg_largeobject');
 pg_relation_size
------------------
          4235264
(1 row)

Flagging the tuples as reusable is the vacuum’s job:

lob_test=# vacuum pg_largeobject;
VACUUM

lob_test=# select pg_relation_size('pg_largeobject');
 pg_relation_size
------------------
          4235264
(1 row)

The normal vacuum does not release the empty space, but it can be reused now:

lob_test=# select generate_series as pageno,
 (select count(*) from heap_page_items(get_raw_page('pg_largeobject',generate_series))
 where t_infomask::bit(16) & x'0800'::bit(16) = x'0800'::bit(16)) from generate_series(0,12);
 pageno | count
--------+-------
      0 |   107
      1 |   107
      2 |   107
      3 |   107
      4 |    84
      5 |     0
      6 |     0
      7 |     0
      8 |     0
      9 |     0
     10 |     0
     11 |     0
     12 |     0

lob_test=# \lo_import '/tmp/randoms';
lo_import 16521
lob_test=#

lob_test=#  select pg_relation_size('pg_largeobject');
 pg_relation_size
------------------
          4235264
(1 row)

-- same size as before!

lob_test=#  select generate_series as pageno, 
(select count(*) from heap_page_items(get_raw_page('pg_largeobject',generate_series)) 
 where t_infomask::bit(16) & x'0800'::bit(16) = x'0800'::bit(16)) from generate_series(0,12);
 pageno | count
--------+-------
      0 |   107
      1 |   107
      2 |   107
      3 |   107
      4 |    84
      5 |     3
      6 |     3
      7 |     3
      8 |     3
      9 |     3
     10 |     3
     11 |     3
     12 |     3

If we unlink the lob again and we do a vacuum full, the empty space is released:

lob_test=# \lo_unlink 16521
lo_unlink 16521
lob_test=#  select pg_relation_size('pg_largeobject');
 pg_relation_size
------------------
          4235264
(1 row)

lob_test=# vacuum full pg_largeobject;
VACUUM
lob_test=#  select pg_relation_size('pg_largeobject');
 pg_relation_size
------------------
          2842624
(1 row)

↧

PostgreSQL Large Objects and space usage (part 3)

August 10, 2017, 6:17 am

≫ Next: trivadis sessions at Oracle Open World 2017

≪ Previous: PostgreSQL Large Objects and space usage (part 2)

A blog post series would not be complete without a final post about vacuumlo.

In the previous post we have seen that the large objects are split in tuples containing 2048 bytes each one, and each chunk behaves in the very same way as regular tuples.

What distinguish large objects?
NOTE: in PostgreSQL, IT IS possible to store a large amount of data along with the table, thanks to the TOAST technology. Read about TOAST here.

Large objects are not inserted in application tables, but are threated in a different way. The application using large objects usually has a table with columns of type OID. When the application creates a new large objects, a new OID number is assigned to it, and this number is inserted into the application table.
Now, a common mistake for people who come from other RDBMS (e.g. Oracle), think that a large object is unlinked automatically when the row that references
it is deleted. It is not, and we need to unlink it explicitly from the application.

Let’s see it with a simple example, starting with an empty pg_largeobject table:

lob_test=# vacuum full pg_largeobject;
VACUUM
lob_test=# select count(*) from pg_largeobject_metadata;
 count
-------
     0
(1 row)

lob_test=# select pg_relation_size('pg_largeobject')/8192 as pages;
 pages
-------
     0
(1 row)

Let’s insert a new LOB and reference it in the table t:

lob_test=# CREATE TABLE t (id integer, file oid);
CREATE TABLE
lob_test=# \lo_import /tmp/zeroes
lo_import 16546
lob_test=# INSERT INTO t VALUES  (1, 16546);
INSERT 0 1

lob_test=# select generate_series as pageno,
  (select count(*) from heap_page_items(get_raw_page('pg_largeobject',generate_series))
  where t_infomask::bit(16) & x'0800'::bit(16) = x'0800'::bit(16)) from generate_series(0,4);
 pageno | count
--------+-------
      0 |   107
      1 |   107
      2 |   107
      3 |   107
      4 |    84

Another one:

lob_test=# \lo_import /tmp/zeroes
lo_import 16547
lob_test=# INSERT INTO t VALUES  (2, 16547);
INSERT 0 1

lob_test=# select generate_series as pageno,
  (select count(*) from heap_page_items(get_raw_page('pg_largeobject',generate_series))
  where t_infomask::bit(16) & x'0800'::bit(16) = x'0800'::bit(16)) from generate_series(0,9);
 pageno | count
--------+-------
      0 |   107
      1 |   107
      2 |   107
      3 |   107
      4 |   107
      5 |   107
      6 |   107
      7 |   107
      8 |   107
      9 |    61
(10 rows)

lob_test=# select * from t;
 id | file
----+-------
  1 | 16546
  2 | 16547
(2 rows)

If we delete the first one, the chunks of its LOB are still there, valid:

lob_test=# DELETE FROM t WHERE id=1;
DELETE 1
lob_test=# select * from t;
 id | file
----+-------
  2 | 16547
(1 row)

lob_test=# select generate_series as pageno,
  (select count(*) from heap_page_items(get_raw_page('pg_largeobject',generate_series))
  where t_infomask::bit(16) & x'0800'::bit(16) = x'0800'::bit(16)) from generate_series(0,9);
 pageno | count
--------+-------
      0 |   107
      1 |   107
      2 |   107
      3 |   107
      4 |   107
      5 |   107
      6 |   107
      7 |   107
      8 |   107
      9 |    61
(10 rows)

If we want to get the rid of the LOB, we have to unlink it, either explicitly or by using triggers that unlink the LOB when a record in the application table is deleted.
Another way is to use the binary vacuumlo included in PostgreSQL.
It scans the pg_largeobject_metadata and search through the tables that have OID columns to find if there are any references to the LOBs. The LOB that are not referenced, are unlinked.
ATTENTION: this means that if you use ways to reference LOBs other than OID columns, vacuumlo might unlink LOBs that are still needed!

# vacuumlo -U postgres lob_test

# p_ lob_test
psql.bin (9.6.2)
Type "help" for help.

lob_test=# select generate_series as pageno,
  (select count(*) from heap_page_items(get_raw_page('pg_largeobject',generate_series))
  where t_infomask::bit(16) & x'0800'::bit(16) = x'0800'::bit(16)) from generate_series(0,9);
 pageno | count
--------+-------
      0 |     0
      1 |     0
      2 |     0
      3 |     0
      4 |    23
      5 |   107
      6 |   107
      7 |   107
      8 |   107
      9 |    61
(10 rows)

vacuumlo has indeed unlinked the first LOB, but the deleted tuples are not freed until a vacuum is executed:

lob_test=# \lo_import /tmp/zeroes
lo_import 16551
lob_test=# INSERT INTO t VALUES  (3, 16551);
INSERT 0 1
lob_test=# select generate_series as pageno,
  (select count(*) from heap_page_items(get_raw_page('pg_largeobject',generate_series))
  where t_infomask::bit(16) & x'0800'::bit(16) = x'0800'::bit(16)) from generate_series(0,14);
 pageno | count
--------+-------
      0 |     0
      1 |     0
      2 |     0
      3 |     0
      4 |    23
      5 |   107
      6 |   107
      7 |   107
      8 |   107
      9 |   107
     10 |   107
     11 |   107
     12 |   107
     13 |   107
     14 |    38
(15 rows)

lob_test=# vacuum pg_largeobject;
VACUUM
lob_test=# \lo_import /tmp/zeroes
lo_import 16552
lob_test=# INSERT INTO t VALUES  (4, 16552);
INSERT 0 1
lob_test=# select generate_series as pageno,
  (select count(*) from heap_page_items(get_raw_page('pg_largeobject',generate_series))
  where t_infomask::bit(16) & x'0800'::bit(16) = x'0800'::bit(16)) from generate_series(0,14);
 pageno | count
--------+-------
      0 |   107
      1 |   107
      2 |   107
      3 |   107
      4 |   107
      5 |   107
      6 |   107
      7 |   107
      8 |   107
      9 |   107
     10 |   107
     11 |   107
     12 |   107
     13 |   107
     14 |    38
(15 rows)

So vacuumlo does not do any vacuuming on pg_largeobject table.

↧

trivadis sessions at Oracle Open World 2017

September 6, 2017, 5:49 am

≫ Next: 12.1.0.2 Bundle Patch 170718 breaks Data Guard and Duplicate from active database

≪ Previous: PostgreSQL Large Objects and space usage (part 3)

This year Trivadis will be again at Oracle Open World (and Oak Table World!) in San Francisco, with a few sessions (including mine!)

If you are going to Oracle Open World and you want to say hello to the Trivadis speakers, make sure you attend them!

Get the Most Out of Oracle Data Guard
Ludovico Caldara – ACE Director, Senior Consultant – Trivadis
When: Sunday, Oct 01, 12:45 PM
Where: Marriott Marquis (Yerba Buena Level) – Nob Hill A/B

EOUC Database ACES Share Their Favorite Database Things
Christian Antognini – ACE Director, OAK Table Member, Senior Principal Consultant, Partner – Trivadis
When: Sunday, Oct 01, 10:45 AM
Where: Marriott Marquis (Golden Gate Level) – Golden Gate C1/C2

Application Containers: Multitenancy for Database Applications
Markus Flechtner – Principal Consultant – Trivadis
When: Sunday, Oct 01, 2:45 PM
Where: Marriott Marquis (Yerba Buena Level) – Nob Hill A/B

TBA
Christian Antognini – ACE Director, OAK Table Member, Senior Principal Consultant, Partner – Trivadis
When: Monday Oct 02, 1:00 PM
Where: Oak Table World, Children Creativity Museum

Apache Kafka: Scalable Message Processing and More
Guido Schmutz – ACE Director, Senior Principal Consultant, Partner – Trivadis
When: Monday Oct 02, 4:30 PM
Where: Moscone West – Room 2004

You can find trivadis’s sessions in the session catalog here.

See you there!

↧

12.1.0.2 Bundle Patch 170718 breaks Data Guard and Duplicate from active database

September 14, 2017, 6:30 am

≫ Next: Get the Most out of Oracle Data Guard – The material

≪ Previous: trivadis sessions at Oracle Open World 2017

Recently my customer patched its 12.1.0.2 databases with the Bundle Patch 170718 on the new servers (half of the customer’s environment). The old servers are still on 161018 Bundle Patch.

We realized that we could not move anymore the databases from the old servers to the new ones because the duplicate from active database was failing with this error:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 09/11/2017 15:59:32
RMAN-05501: aborting duplication of target database
RMAN-03015: error occurred in stored script Memory Script
RMAN-03009: failure of backup command on prmy1 channel at 09/11/2017 15:59:32
ORA-17629: Cannot connect to the remote database server
ORA-17630: Mismatch in the remote file protocol version client 2 server 3

The last lines shows the same error that Franck blogged about some months ago.

Oracle 12.2 had introduced incompatibility with previous releases in remote file transfer via SQL*Net. At least this is what it seems. According to Oracle, this is due to a bugfix present in Oracle 12.2

Now, the bundle patch that we installed on BP 170718 contains the same bugfix (Patch for bug 18633374).

So, the incompatibility happens now between databases of the same “Major Release” (12.1.0.2).

There are two possible workarounds:

Apply the same patch level on both sides (BP170718 in my case)
Apply just the patch 18633374 on top of your current PSU/DBBP (a merge might be necessary).

We used the second approach and now we can setup Data Guard again to move our databases without downtime:

oracle@oldserver $ opatch lspatches
18633374;   <<<<<< FIX!
24340679;DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679)

oracle@newserver $ opatch lspatches
22652097;
22243983;
25869760;DATABASE BUNDLE PATCH: 12.1.0.2.170718 (25869760)

HTH

—

Ludovico

↧

Get the Most out of Oracle Data Guard – The material

September 29, 2017, 1:28 pm

≫ Next: My own Dbvisit Replicate integration with Grid Infrastructure

≪ Previous: 12.1.0.2 Bundle Patch 170718 breaks Data Guard and Duplicate from active database

Here we go: as usual, the feedback that I usually get after my talks (specifically, after POUG High Five conference), is if I will share my demo scripts and material.

Sadly, the demos I am doing for my presentation “Get the most out of Oracle Data Guard” are quite tied to an environment built for the purpose of the demos. So, do not expect to get scripts easy to use as is, but rather to get some ideas beyond the demo themselves.

I hope they will help to get the whole picture.

Of course, if you need to implement a cloning strategy based on Data Guard or any other solution that I describe in this post, please feel free to contact me, I will be glad to help you implement it in your environment.

Slides

Demo 1

Video:

Scripts:

#!/bin/bash

function tt () {
  title=$@
  pad=$(printf '%0.1s' "-"{1..60})
  echo
  echo
  echo $pad
  echo "- $title"
  echo $pad
}

. .bash_profile

PAUSE=/home/oracle/pause.sh
SYSPWD=Vagrant1_

clear

sid sour_ludo


sudo sed -i -e '/sour-s/d' /var/named/trivadistraining.com
sudo sed -i '$ a\
sour-s1 IN CNAME ludo01\
sour-s2 IN CNAME ludo01' /var/named/trivadistraining.com

sudo systemctl reload named.service

tt "Naming resolution"
tnsping sour_smart

nslookup sour-s1
nslookup sour-s2


$PAUSE

tt "Connect to sour_smart in another terminal"

$PAUSE
clear

tt "Creating Data Guard Configuration resolution"
dgmgrl -echo <<EOF
  connect sys/$SYSPWD
  show configuration;
EOF

$PAUSE
dgmgrl -echo <<EOF
  connect sys/$SYSPWD
  create configuration sour as primary database is sour_ludo connect identifier is sour_ludo.trivadistraining.com;
  add database sour_vico as connect identifier is sour_vico.trivadistraining.com;
  enable database sour_vico;
  enable configuration;
  host sleep 5;
  show configuration;
EOF

$PAUSE
clear

tt "Modifying the DNS configuration"

sudo sed -i -e '/sour-s2/d' /var/named/trivadistraining.com

sudo sed -i '$ a\
sour-s2 IN CNAME vico01' /var/named/trivadistraining.com

sudo systemctl reload named.service

tt "Naming resolution"
tnsping sour_smart

nslookup sour-s1
nslookup sour-s2

$PAUSE
clear
tt "Switchover to sour_vico"
dgmgrl -echo <<EOF
  connect sys/$SYSPWD
  switchover to sour_vico;
EOF

$PAUSE
tt "Did the session fail over?"
$PAUSE

clear

tt "Modifying the DNS configuration"

sudo sed -i -e '/sour-s1/d' /var/named/trivadistraining.com

sudo sed -i '$ a\
sour-s1 IN CNAME vico01' /var/named/trivadistraining.com

sudo systemctl reload named.service

tt "Naming resolution"
tnsping sour_smart

nslookup sour-s1
nslookup sour-s2

$PAUSE

tt "Removing Data Guard configuration"

dgmgrl -echo <<EOF
  connect sys/$SYSPWD
  remove configuration;
  show configuration;
EOF

Demo 2

Video:

Scripts:

#!/bin/bash

function tt () {
  title=$@
  pad=$(printf '%0.1s' "-"{1..60})
  echo
  echo
  echo $pad
  echo "- $title"
  echo $pad
}

. .bash_profile

clear

sid stout_vico
SYSPWD=Vagrant1_

PAUSE=/home/oracle/pause.sh

tt "Current configuration"
dgmgrl -echo <<EOF
  connect sys/$SYSPWD
  show configuration;
EOF

$PAUSE

clear

tt "Instance and redo apply status"
sqlplus / as sysdba <<EOF
  select instance_name, status from v\$instance;
  select db_unique_name, database_role from v\$database;
  select process, status, client_process, sequence#, block#, delay_mins from v\$managed_standby order by process;
EOF

$PAUSE
clear 
tt "Inserting something in the primary"
sqlplus ludo/ludo@stout_ludo <<EOF
  DROP TABLE demo1;
  CREATE TABLE demo1 ( id NUMBER GENERATED AS IDENTITY 
     , foo DATE DEFAULT (sysdate)
     , CONSTRAINT demo1_pk PRIMARY KEY (id)
  );

  INSERT INTO demo1 (foo) VALUES(sysdate);
  INSERT INTO demo1 (foo) VALUES(sysdate);
  INSERT INTO demo1 (foo) VALUES(sysdate);
  INSERT INTO demo1 (foo) VALUES(sysdate);
  INSERT INTO demo1 (foo) VALUES(sysdate);
  COMMIT;
  ALTER SESSION SET NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS';
  SELECT * FROM demo1 ORDER BY id;
  exit
EOF


$PAUSE
clear
tt "Converting physical standby to snapshot standby"
dgmgrl -echo <<EOF
  connect sys/$SYSPWD
  show configuration;
  convert database stout_vico to snapshot standby;
  show configuration;
EOF


$PAUSE
tt "Let's check the alert log (another window)"

$PAUSE
clear
tt "Instance and redo apply status"
sqlplus / as sysdba <<EOF
  SELECT instance_name, status FROM v\$instance;
  SELECT db_unique_name, database_role FROM v\$database;
  set lines 180
  col name for a80
  SELECT scn, name FROM v\$restore_point;
  SELECT process, status, client_process, sequence#, block#, delay_mins FROM v\$managed_standby ORDER BY process;
  set feedback off
  SELECT process, status, client_process, sequence#, block#, delay_mins FROM v\$managed_standby WHERE client_process='LGWR';
  EXEC dbms_lock.sleep(2);
  SELECT process, status, client_process, sequence#, block#, delay_mins FROM v\$managed_standby WHERE client_process='LGWR';
  EXEC dbms_lock.sleep(2);
  SELECT process, status, client_process, sequence#, block#, delay_mins FROM v\$managed_standby WHERE client_process='LGWR';
EOF


$PAUSE
clear
tt "Let's do something in the PRIMARY database!"
sqlplus ludo/ludo@stout_ludo <<EOF
  ALTER TABLE demo1 ADD test VARCHAR(20) DEFAULT ('PRIMARY'); 
  INSERT INTO demo1 (foo) VALUES(sysdate);
  INSERT INTO demo1 (foo) VALUES(sysdate);
  COMMIT;
  ALTER SESSION SET NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS';
  SELECT * FROM demo1 ORDER BY id;
  exit
EOF


$PAUSE
clear
tt "Let's do something in the snapshot standby!"
sqlplus ludo/ludo@stout_vico <<EOF
  ALTER TABLE demo1 ADD test VARCHAR(20) DEFAULT ('STANDBY'); 
  INSERT INTO demo1 (foo) VALUES(sysdate);
  INSERT INTO demo1 (foo) VALUES(sysdate);
  COMMIT;
  ALTER SESSION SET NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS';
  SELECT * FROM demo1 ORDER BY id;
  exit
EOF

$PAUSE
clear

tt "Convert back to physical standby"
dgmgrl -echo <<EOF
  connect sys/$SYSPWD
  show configuration;
  convert database stout_vico to physical standby;
  show configuration;
EOF

$PAUSE
clear
tt "Instance and redo apply status"
sqlplus / as sysdba <<EOF
  SELECT instance_name, status FROM v\$instance;
  SELECT db_unique_name, database_role FROM v\$database;
  set lines 180
  col name for a80
  SELECT scn, name FROM v\$restore_point;
  SELECT process, status, client_process, sequence#, block#, delay_mins FROM v\$managed_standby ORDER BY process;
EOF

Demo 3

Video:

Scripts:

Preparation:

#!/bin/bash

NUM=`echo $$ | cut -c 1-4`
export NEWNAME=${1:-poug$NUM}
export ORACLE_SID=$NEWNAME

export ORACLE_HOME=/u01/app/oracle/product/12.2.0.1/dbhome_1

[[ -L /u02/$NEWNAME ]] && rm $/u02/$NEWNAME
ln -s /u02/acfs/.ACFS/snaps/$NEWNAME /u02/$NEWNAME

set -x
$ORACLE_HOME/bin/srvctl add database -db $NEWNAME -oraclehome $ORACLE_HOME -dbtype SINGLE -instance $NEWNAME -spfile /u02/$NEWNAME/spfile$NEWNAME.ora -dbname $NEWNAME -policy MANUAL -acfspath "/u02/acfs,/u02/fra" -node $HOSTNAME

set +x

snap_acfs.pl

#!/u01/app/oracle/tvdtoolbox/tvdperl-Linux-x86-64-02.04.00-05.08.04/bin/tvd_perl
#
# Purpose..........: Create a new snapshot with rotating name
# 
# snap_acfs.pl 
#        -p <parent> : name of the parent snapshot
#        -n <name>   : prefix of the snapshot
#        -s <suffix> : optional, use "weekday" to have the day name as suffix (Sun - Sat)
#
# e.g. snap_acfs.pl -p stout -n stout  -s "weekday"
#      will clone from /u02/acfs/.ACFS/snaps/stout
#                   to /u02/acfs/.ACFS/snaps/stout.Tue (or whatever the day is)
#      
# e.g. snap_acfs.pl -n stout -p stout.Mon 
#      will clone from /u02/acfs/.ACFS/snaps/stout.Mon
#                   to /u02/acfs/.ACFS/snaps/stout
#      
# e.g. snap_acfs.pl -n stout2 -p stout
#      will clone from /u02/acfs/.ACFS/snaps/stout
#                   to /u02/acfs/.ACFS/snaps/stout2
#      
# EXISTING SNAPSHOT WILL BE DROPPED!!
#
#
#

use strict;
use File::Copy;
use Net::SMTP;
use Sys::Hostname;
use Getopt::Std 'getopts';
use File::Basename;

my $CloneDIR;                             # predefine rootDir variable
BEGIN {
  use FindBin qw($Bin);                   # get the current path of script
  use Cwd 'abs_path';
  $CloneDIR    = abs_path("$Bin/..");     # get the absolut rood path to clone directory
}

my $CloneLOGDir = $CloneDIR."/log";       # LOG Directory
my $baseACFS = "/u02/acfs/";
my $ORA_CRS_HOME = "/u01/app/grid/12.2.0.1";
my $acfsutil = "/usr/sbin/acfsutil";
my $basename    = basename($0, ".pl");
my $ParentSnapName;
my $ParentSnap=0; ## no parent snapshots by default
my $PrefixName;
my $NewName;
my $SuffixName;
my %opts;
my $MountPoint;
my $SnapCreate;

################################################################################
#  Main
################################################################################
my $StartDate = localtime;
&DoMsg ("Start of $basename.pl");
unless ( open (MAINLOG, ">>$CloneLOGDir/$basename.log") ) {
	&DoMsg ("Can't open Main Logfile $CloneLOGDir/$basename.log");
    exit 1;
}

# Process command line arguments
if  ( ! defined @ARGV ) { &Usage; exit 1; } 
getopts('n:p:s:b:', \%opts);

if ($opts{"p"}) {
   $ParentSnapName    = lc($opts{"p"});
} else {
   &DoMsg ("Parent snapshot name not given!");
   &Usage;
   exit 1;
}
if ($opts{"n"}) {
   $PrefixName    = lc($opts{"n"});
} else {
   &DoMsg ("New snapshot prefix not given! Defaults to ${ParentSnapName}");
   $PrefixName    = "${ParentSnapName}";
}

if ($opts{"s"}) {
   $SuffixName    = lc($opts{"s"});
   if ( $SuffixName eq "weekday" ) {
      $SuffixName    = lc(&getWeekDay);
   }
   $SuffixName  = "." . $SuffixName;
} else {
   $SuffixName = "";
}

$NewName = "${PrefixName}${SuffixName}";


&DoMsg ("Parent: $ParentSnapName");
&DoMsg ("Prefix: $PrefixName");
&DoMsg ("Suffix: $SuffixName");
&DoMsg ("New Name: $NewName");


$MountPoint = $baseACFS;
$SnapCreate = "$acfsutil snap create -w -p $ParentSnapName $NewName $MountPoint";
&DoMsg ("Create Command: $SnapCreate ");


my $cmd = "$acfsutil snap info $NewName $MountPoint";
&DoMsg ($cmd);
open( CMD, $cmd . " |");
&DoMsg (join("", <CMD>));
close CMD;
if ( $? != 0 ) {
   &DoMsg ("Snapshot $NewName does not exist inside mount point $MountPoint. Continuing.");
} else {
   &DoMsg ("Snapshot $NewName already exists inside mount point $MountPoint. Now it will be deleted.");
   $cmd = "$acfsutil snap delete $NewName $MountPoint";
   &DoMsg ($cmd);
   open( CMD, $cmd . " |");
   &DoMsg (join("", <CMD>));
   close CMD;
   if ( $? != 0 ) {
      &DoMsg ("Cannot delete Snapshot $NewName in mount point $MountPoint. Script will exit.");
      exit 1;
   }
}

&DoMsg ("Creating the new snapshot:");
&DoMsg ($SnapCreate);
open( CMD, $SnapCreate . " |");
&DoMsg (join("", <CMD>));
close CMD;
if ( $? != 0 ) {
   &DoMsg ("Cannot create Snapshot $NewName in mount point $MountPoint. Script will exit.");
   exit 1;
} #else {
   #&DoMsg ("Current snapshots:");
   #open( CMD, "$acfsutil snap info $MountPoint |");
   #&DoMsg (join("", <CMD>));
   #close CMD;
#}



#-------------------------------------------------------------------------------
# DoMsg
#
# PURPOSE    : echo with timestamp YYYY-MM-DD_H24:MI:SS
# PARAMS     : $*: the messages
# GLOBAL VARS: none
#-------------------------------------------------------------------------------   
sub DoMsg {

   my $msg = shift;
   my $timestamp = &getTimestamp;
   
   print ("$timestamp $msg\n");
   if (fileno(MAINLOG)) {print MAINLOG "$timestamp $msg\n";}
}


#-------------------------------------------------------------------------------
# getTimestamp
#
# PURPOSE    : returns timestamp in different formats
# PARAMS     : format_parm
# GLOBAL VARS: none
#-------------------------------------------------------------------------------
sub getTimestamp {
   #
   # Format 1:  dd-mm-yyyy_hh24:mi:ss
   # Format 2:  dd.mm.yyyy_hh24miss
   # Format 3:  dd.mm.yyyy
   # Format 4:  hh24:mi:ss
   # Rest:      dd.mm.yyyy hh24:mi:ss  (default)
   #
   my $Parm = shift;
   my $date;
   my $date2;
   my $heure;
   my $heure2;
   my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst);

   if ( length($Parm) > 1 ) {
      ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime($Parm);
   }
   else {
      ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime;
   }
   
   $date = (sprintf "%2.0d",($mday)).".".(sprintf "%2.0d",($mon+1)).".".($year+1900);
   $date =~ s/ /0/g;
   $date2 = (sprintf "%2.0d",($mday))."-".(sprintf "%2.0d",($mon+1))."-".($year+1900);
   $date2 =~ s/ /0/g;
   $heure = (sprintf "%2.0d",($hour)).":".(sprintf "%2.0d",($min)).":".(sprintf "%2.0d",($sec));
   $heure =~ s/ /0/g;
   $heure2 = (sprintf "%2.0d",($hour)).(sprintf "%2.0d",($min)).(sprintf "%2.0d",($sec));
   $heure2 =~ s/ /0/g;
   
   if    ($Parm eq "1") { return ($date2."_".$heure) }
   elsif ($Parm eq "2") { return ($date."_".$heure2) }
   elsif ($Parm eq "3") { return ($date) }
   elsif ($Parm eq "4") { return ($heure) }
   else { return ($date." ".$heure) };

}


#-------------------------------------------------------------------------------
# getWeekDay
#
# PURPOSE    : returns weekday (Sun - Sat)
# GLOBAL VARS: none
#-------------------------------------------------------------------------------
sub getWeekDay{
   my @date = split(" ", localtime(time));
   my $day = $date[0];
   return ($day);
}


#-------------------------------------------------------------------------------
# Usage
#
# PURPOSE    : print the Usage
# PARAMS     : none
# GLOBAL VARS: none
#-------------------------------------------------------------------------------
sub Usage {

   print <<EOF
   
Usage:  $basename -b <base>  [Optional Arguments]
          -p <parent> : name of the parent snapshot
       
           Optional Arguments:
          -n <prefix_name> : prefix of the new snapshot name (defaults to parent.18H)
          -s <suffix>      : use "weekday" to have the day name as suffix (Sun - Sat)


 e.g. snap_acfs.pl -p scprod -n stout  -s "weekday"
      will clone from /u02/acfs/.ACFS\snaps\stout
                   to /u02/acfs/.ACFS\snaps\stout.Tue (or whatever the day is)
      
 e.g. snap_acfs.pl -n stout -p stout.Mon 
      will clone from /u02/acfs/.ACFS\snaps\stout.Mon
                   to /u02/acfs/.ACFS\snaps\stout
      
 e.g. snap_acfs.pl -n stout2 -p stout
      will clone from /u02/acfs/.ACFS\snaps\stout
                   to /u02/acfs/.ACFS\snaps\stout2
           
  EXISTING SNAPSHOT WILL BE DROPPED!!
EOF

}

snap_databasae.pl

#!/u01/app/oracle/tvdtoolbox/tvdperl-Linux-x86-64-02.04.00-05.08.04/bin/tvd_perl
#
# Purpose..........: Create a new snapshot of a standby database by apply-off, backup controlfile to trace, copy init, acfs snap, apply-on
# 
# snap_database.pl 
#        -b <base>
#        -n <name>   : prefix of the snapshot
#        -s <suffix> : optional, use "weekday" to have the day name as suffix (Sun - Sat)
#
# e.g. snap_database.pl -b stout -n stout_save  -s "weekday"
#      will clone from /u02/acfs/.ACFS/snaps/stout
#                   to /u02/acfs/.ACFS/snaps/stout_save.Tue (or whatever the day is)
#      
# EXISTING SNAPSHOT WILL BE DROPPED!!
#

#use strict;
use File::Copy;
use Net::SMTP;
use Sys::Hostname;
use Getopt::Std 'getopts';
use File::Basename;
use DBI;
use DBD::Oracle qw(:ora_session_modes);

my $CloneDIR;                             # predefine rootDir variable
BEGIN {
  use FindBin qw($Bin);                   # get the current path of script
  use Cwd 'abs_path';
  $CloneDIR    = abs_path("$Bin/..");     # get the absolut rood path to clone directory
}

my $CloneLOGDir = $CloneDIR."/log";       # LOG Directory
my $baseACFS = "/u02/acfs";
my $basename    = basename($0, ".pl");
my $PrefixName;
my $BaseDB;
my $SuffixName;
my $SnapshotName;
my %opts;
my $dbh;
my $db_create_file_dest;
my $db_unique_name;
my $cmd;
my $syspwd="Vagrant1_";
my $SnapError=0;
my $SnapDir;
my $ControlfileTrace = "control.trc";
my $ORACLE_HOME = "/u01/app/oracle/product/12.2.0.1/dbhome_1";
my $InitName = "init.ora";
my $warnings = 0;

################################################################################
#  Main
################################################################################
my $StartDate = localtime;
&DoMsg ("Start of $basename.pl");
unless ( open (MAINLOG, ">>$CloneLOGDir/$basename.log") ) {
	&DoMsg ("Can't open Main Logfile $CloneLOGDir/$basename.log");
    exit 1;
}

# Process command line arguments
if  ( ! defined @ARGV ) { &Usage; exit 1; } 
getopts('b:n:s:', \%opts);

if ($opts{"b"}) {
   $BaseDB = lc($opts{"b"});
} else {
   &DoMsg ("Base DB not given!");
   &Usage;
   exit 1;
}
if ($opts{"n"}) {
   $PrefixName    = lc($opts{"n"});
} else {
   $PrefixName    = "${BaseDB}_save";
}
if ($opts{"s"}) {
   $SuffixName    = lc($opts{"s"});
   if ( $SuffixName eq "weekday" ) {
      $SuffixName    = lc(&getWeekDay);
   }
   $SuffixName  = "." . $SuffixName;
} else {
   $SuffixName = "";
}

$SnapshotName = "${PrefixName}${SuffixName}";


&DoMsg ("Base: $BaseDB");
&DoMsg ("SnapshotName: $SnapshotName");

&ConnectDB ;

### checking that the database is mounted and physical standby

my $DBstatus= &QueryOneValue('select status from v$instance');
unless ( $DBstatus eq "MOUNTED" ) {
   &DoMsg ("Database is not in MOUNTED status, this is unexpected. Exiting.");
   exit 1
}

my $DBrole= &QueryOneValue('SELECT database_role FROM v$database');
unless ( $DBrole eq "PHYSICAL STANDBY" ) {
   &DoMsg ("Database role is not PHYSICAL STANDBY, this is unexpected. Exiting.");
   exit 1
}


$db_create_file_dest= &QueryOneValue(qq{SELECT value FROM v\$parameter2 WHERE name='db_create_file_dest'});
 &DoMsg ("db_create_file_dest: $db_create_file_dest");

$db_unique_name= &QueryOneValue(qq{SELECT value FROM v\$parameter2 WHERE name='db_unique_name'});
 &DoMsg ("db_unique_name: $db_unique_name");

#unless ($dbh->do(qq{ALTER SESSION SYNC WITH PRIMARY}) ) {
#   &DoMsg ("Error in syncing the session with the primary");
#   $warnings++;
#}

$cmd = qq{dgmgrl -echo sys/$syspwd "edit database $db_unique_name set state=\\\"APPLY-OFF\\\";"};
&DoMsg ($cmd);
open( CMD, $cmd . " |");
&DoMsg (join("",<CMD>));
close (CMD);
my $a=$?;
#if ( $? != 0 ) {
#   &DoMsg ("Error in stopping apply on standby $BaseDB. Exiting.");
#   exit 1
#}


$cmd = $CloneDIR."/bin/snap_acfs.pl -p $BaseDB -n $SnapshotName";
&DoMsg($cmd);
open( CMD, $cmd . " |");
print (join("", <CMD>)); ## only print here as it logs and echoes its time as well
close CMD;
#if ( $? != 0 ) {
#   # track if error in creating the snapshot: we continue and do the apply-on anyway!
#   $SnapError=1;
#}

$SnapDir = $baseACFS . "/.ACFS/snaps/" . $SnapshotName;
$ControlfileTrace = $SnapDir . "/" . $ControlfileTrace;
$InitName = $SnapDir . "/" . $InitName;

unless ($dbh->do(qq{ ALTER DATABASE BACKUP CONTROLFILE TO TRACE AS '$ControlfileTrace' REUSE RESETLOGS}) ) {
   &DoMsg ("Error in taking the controlfile trace $ControlfileTrace.");
   $warnings++;
}

unless ($dbh->do(qq{ CREATE PFILE='$InitName' FROM SPFILE }) ) {
   &DoMsg ("Error in creating the pfile $InitName.");
   $warnings++;
}

$cmd = qq{dgmgrl -echo sys/$syspwd "edit database $db_unique_name set state=\"APPLY-ON\""};
&DoMsg ($cmd);
open( CMD, $cmd . " |");
&DoMsg (join("", <CMD>));
close CMD;
#if ( $? != 0 ) {
#   &DoMsg ("Error in starting apply on standby $BaseDB. MANUAL INTERVENTION REQUIRED");
#   exit 1
#}

if ( $SnapError == 1 ) {
	&DoMsg ("There was an error in creating the snapshot. Exiting.");
        exit 1;
}



if ( $warnings != 0 ) {
   &DoMsg("There have been some warnings, but the procedure completed.");
} else {
   &DoMsg("The procedure completed successfully.");
}

&DisconnectDB ;


#-------------------------------------------------------------------------------
# DoMsg
#
# PURPOSE    : echo with timestamp YYYY-MM-DD_H24:MI:SS
# PARAMS     : $*: the messages
# GLOBAL VARS: none
#-------------------------------------------------------------------------------   
sub DoMsg {

   my $msg = shift;
   my $timestamp = &getTimestamp;
   
   print ("$timestamp $msg\n");
   if (fileno(MAINLOG)) {print MAINLOG "$timestamp $msg\n";}
}


#-------------------------------------------------------------------------------
# getTimestamp
#
# PURPOSE    : returns timestamp in different formats
# PARAMS     : format_parm
# GLOBAL VARS: none
#-------------------------------------------------------------------------------
sub getTimestamp {
   #
   # Format 1:  dd-mm-yyyy_hh24:mi:ss
   # Format 2:  dd.mm.yyyy_hh24miss
   # Format 3:  dd.mm.yyyy
   # Format 4:  hh24:mi:ss
   # Rest:      dd.mm.yyyy hh24:mi:ss  (default)
   #
   my $Parm = shift;
   my $date;
   my $date2;
   my $heure;
   my $heure2;
   my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst);

   if ( length($Parm) > 1 ) {
      ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime($Parm);
   }
   else {
      ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime;
   }
   
   $date = (sprintf "%2.0d",($mday)).".".(sprintf "%2.0d",($mon+1)).".".($year+1900);
   $date =~ s/ /0/g;
   $date2 = (sprintf "%2.0d",($mday))."-".(sprintf "%2.0d",($mon+1))."-".($year+1900);
   $date2 =~ s/ /0/g;
   $heure = (sprintf "%2.0d",($hour)).":".(sprintf "%2.0d",($min)).":".(sprintf "%2.0d",($sec));
   $heure =~ s/ /0/g;
   $heure2 = (sprintf "%2.0d",($hour)).(sprintf "%2.0d",($min)).(sprintf "%2.0d",($sec));
   $heure2 =~ s/ /0/g;
   
   if    ($Parm eq "1") { return ($date2."_".$heure) }
   elsif ($Parm eq "2") { return ($date."_".$heure2) }
   elsif ($Parm eq "3") { return ($date) }
   elsif ($Parm eq "4") { return ($heure) }
   else { return ($date." ".$heure) };

}


#-------------------------------------------------------------------------------
# getWeekDay
#
# PURPOSE    : returns weekday (Sun - Sat)
# GLOBAL VARS: none
#-------------------------------------------------------------------------------
sub getWeekDay{
   my @date = split(" ", localtime(time));
   my $day = $date[0];
   return ($day);
}



#-------------------------------------------------------------------------------
# Usage
#
# PURPOSE    : print the Usage
# PARAMS     : none
# GLOBAL VARS: none
#-------------------------------------------------------------------------------
sub Usage {

   print <<EOF
   
Usage:  $basename -b <base>  [Optional Arguments]
           -b <base>       : name of the base database
       
        Purpose:
          Create a new snapshot of a standby database by apply-off, acfs snap, backup controlfile to trace, copy init, apply-on.

        Optional Arguments:
          -n <prefix_name> : prefix of the new snapshot name
          -s <suffix>      : use "weekday" to have the day name as suffix (Sun - Sat)

        examples:
            snap_database.pl -b stout -n stout.18h  -s "weekday"
            will clone from /u02/acfs/.ACFS/snaps/stout
                         to /u02/acfs/.ACFS/snaps/stout.18h.Tue (or whatever the day is)

      
            $basename -b stout -s "weekday"
            will clone from /u02/acfs/.ACFS/snaps/stout
                         to /u02/acfs/.ACFS/snaps/stout_save.Wed  (or whatever the day is)
      
  EXISTING SNAPSHOT WILL BE DROPPED!!

EOF

}


sub ConnectDB {

   # DB connection #
   $ENV{ORACLE_SID}=$BaseDB;
   $ENV{ORACLE_HOME}=$ORACLE_HOME;
   delete $ENV{TWO_TASK};

   &DoMsg ("Connecting to DB $BaseDB");
   unless ($dbh = DBI->connect('dbi:Oracle:', "sys", $syspwd, {PrintError=>0, AutoCommit => 0, ora_session_mode => ORA_SYSDBA}))  {
      &DoMsg ("Error connecting to DB: ". $DBI::errstr);
      exit(1);
   }

   #&DoMsg ("Connected to DB $BaseDB");

}

sub QueryOneValue {

   my $sth;
   my $query = shift;

   unless ($sth = $dbh->prepare ($query)) {
      &DoMsg ("Error preparing statement $query: ".$dbh->errstr);
   }
   $sth->execute;
   my ($result) = $sth->fetchrow_array;

   return $result;
}

sub DisconnectDB {
   $dbh->disconnect;
}

clone_from_snap.pl

#!/u01/app/oracle/tvdtoolbox/tvdperl-Linux-x86-64-02.04.00-05.08.04/bin/tvd_perl

use File::Copy;
use File::Path qw(mkpath rmtree);
use Net::SMTP;
use Sys::Hostname;
use Getopt::Std 'getopts';
use File::Basename;
use DBI;
use DBD::Oracle qw(:ora_session_modes);

my $CloneDIR;                             # predefine rootDir variable
BEGIN {
  use FindBin qw($Bin);                   # get the current path of script
  use Cwd 'abs_path';
  $CloneDIR    = abs_path("$Bin/..");     # get the absolut rood path to clone directory
}

my $CloneLOGDir = $CloneDIR."/log";       # LOG Directory
my $baseACFS = "/u02/acfs";
my $basename    = basename($0, ".pl");
my $BaseDB;
my $SnapshotName;
my $DestDB;
my $DestPath; # contains the final snapshot destination
my $oraenv = '/usr/local/bin/oraenv';
my $crsctl = '/u01/app/grid/12.2.0.1/bin/crsctl';
my $ORACLE_HOME = '/u01/app/oracle/product/12.2.0.1/dbhome_1';
my %opts;
my $dbh;
my $db_create_file_dest;
my $db_unique_name;
my $cmd;
my $SnapError=0;
my $SnapDir;
my $ControlfileTrace = "control.trc";
my $InitName = "init.ora";
my $warnings = 0;
my $foo;
my $dbUniqueName;

################################################################################
#  Main
################################################################################
my $StartDate = localtime;
&DoMsg ("Start of $basename.pl");
unless ( open (MAINLOG, ">>$CloneLOGDir/$basename.log") ) {
	&DoMsg ("Can't open Main Logfile $CloneLOGDir/$basename.log");
    exit 1;
}

# b: base db
# u: source database db_unique_name. if empty, will try to get it dynamically
# s: snapshot name
# d: destination name

# Process command line arguments
if  ( ! defined @ARGV ) { &Usage; exit 1; } 
getopts('b:s:d:u:', \%opts);

if ($opts{"b"}) {
   $BaseDB = $opts{"b"};
} else {
   &DoMsg ("Base DB not given!");
   &Usage;
   exit 1;
}
if ($opts{"s"}) {
   $SnapshotName = $opts{"s"};
} else {
   &DoMsg ("Snapshot Name not given!");
   &ListSnapshots;
   exit 1;
}
if ($opts{"d"}) {
   $DestDB = $opts{"d"};
} else {
   &DoMsg ("Dest DB not given!");
   &Usage;
   exit 1;
}


if ($opts{"u"}) {
   $dbUniqueName = $opts{"u"};
} else {
   &DoMsg ("db_unique_name not given, try to get it dynamically");
   
   &ConnectDB ;
   $dbUniqueName= &QueryOneValue(qq{SELECT value FROM v\$parameter2 WHERE name='db_unique_name'});
   &DisconnectDB ;
}

# show the parameters
&DoMsg ("Base: $BaseDB");
&DoMsg ("SnapshotName: $SnapshotName");
&DoMsg ("Dest: $DestDB");
&DoMsg ("db_unique_name: $dbUniqueName");


# try to get the ORACLE_HOME of the resource
my $cmd = "$crsctl status resource ora.".$DestDB.".db -f";
&DoMsg ($cmd);
open( CMD, $cmd . " |");
my @output = <CMD>;
close CMD;
#if ( $? != 0 ) {
#   &DoMsg ("Destination database does not exist, please configure it with srvctl");
#   exit 1;
#} 
foreach (@output) {
   chomp($_);
   if ($_ =~ /^ORACLE_HOME=/) {
      ($foo, $ORACLE_HOME) = split (/=/);
      $ENV{ORACLE_HOME}=$ORACLE_HOME;
      &DoMsg ("OH = $ORACLE_HOME");
   }
} 

# try to get the status of the resource using srvctl
my $cmd = "$ORACLE_HOME/bin/srvctl status database -d $DestDB";
&DoMsg ($cmd);
open( CMD, $cmd . " |");
&DoMsg (join("", <CMD>));
close CMD;
#if ( $? != 0 ) {
#   &DoMsg ("Destination database does not exist, please configure it");
#   exit 1;
#} 

# try to stop the dest db (will ignore errors)
my $cmd = "$ORACLE_HOME/bin/srvctl stop database -d $DestDB -o abort -f";
&DoMsg ($cmd);
open( CMD, $cmd . " |");
&DoMsg (join("", <CMD>));
close CMD;


# drop/recreate the snapshot using snap_acfs.pl
$cmd = "tvd_perl ".$CloneDIR."/bin/snap_acfs.pl -p $SnapshotName -n $DestDB";
&DoMsg($cmd);
open( CMD, $cmd . " |");
print (join("", <CMD>)); ## only print here as it logs and echoes its time as well
close CMD;
#if ( $? != 0 ) {
#   &DoMsg("Error creating the new snapshot for $DestDB. Exiting.");
#   exit(1);
#}

$DestPath = $baseACFS . '/.ACFS/snaps/' . $DestDB;
$ControlfileTrace = $DestPath.'/'.$ControlfileTrace;
$InitName = $DestPath.'/'.$InitName;

&DoMsg("Control file trace: $ControlfileTrace");
&DoMsg("Init file: $InitName");

### remove old archives, redo_logs and control files!
rmtree($baseACFS . '/fra/' . $DestDB , 1, 1 );
mkpath($baseACFS . '/fra/' . $DestDB );

## HERE WE HAVE THE CONTROL AND INIT READY TO BE MODIFIED

open(FILE, "<$ControlfileTrace");
my @ControlLines = <FILE>;
close(FILE);

# sed controlfile
my @NewControlLines;
push(@NewControlLines,"SET ECHO ON;\n");
push(@NewControlLines,"WHENEVER SQLERROR EXIT FAILURE;\n");
push(@NewControlLines,"CREATE SPFILE FROM PFILE='$InitName';\n");

foreach(@ControlLines) {
   # change the snapshot name in the paths
   $_ =~ s/u02\/$BaseDB/u02\/$DestDB/gi;
   # change the db_unique_name in the REDO paths
   $_ =~ s/fra\/$dbUniqueName/fra\/$DestDB/gi;


   # change the dbname in the create controlfile line
   $_ =~ s/CREATE CONTROLFILE.*$/CREATE CONTROLFILE REUSE SET DATABASE "$DestDB" RESETLOGS NOARCHIVELOG/;
   # everything after and including "recover database" can be skipped
   if ($_ =~ /^RECOVER DATABASE /) {
      last;
   }
   print ($_);
   push(@NewControlLines, $_);
}
push(@NewControlLines,"ALTER DATABASE OPEN RESETLOGS;\n");
push(@NewControlLines,"ALTER TABLESPACE TEMP ADD TEMPFILE SIZE 1G;\n");
push(@NewControlLines,"SELECT status FROM v\$instance;\n");
push(@NewControlLines,"QUIT;\n");

# write the new controlfile:
open(FILE, ">$ControlfileTrace");
print FILE @NewControlLines;
close(FILE);

# delete old controlfile
# no more necessary, deleted above  unlink ($DestPath.'/control01.ctl');

# sed init file
open(FILE, "<$InitName");
my @InitLines = <FILE>;
close(FILE);

@InitLines = grep(!/^$BaseDB/i, @InitLines);
@InitLines = grep(!/^\*\.db_name/, @InitLines);
@InitLines = grep(!/^\*\.db_unique_name/, @InitLines);
@InitLines = grep(!/^\*\.dispatchers/, @InitLines);
@InitLines = grep(!/^\*\.audit_file_dest/, @InitLines);
@InitLines = grep(!/^\*\.fal_server/, @InitLines);
@InitLines = grep(!/^\*\.fal_client/, @InitLines);
@InitLines = grep(!/^\*\.log_archive_config/, @InitLines);
@InitLines = grep(!/^\*\.log_archive_dest/, @InitLines);
@InitLines = grep(!/^\*\.memory_target/, @InitLines);
@InitLines = grep(!/^\*\.sga_target/, @InitLines);
@InitLines = grep(!/^\*\.pga_aggregate_target/, @InitLines);
@InitLines = grep(!/^\*\.service_names/, @InitLines);
@InitLines = grep(!/^\*\.dg_broker_start/, @InitLines);

my @NewInitLines;
foreach(@InitLines ) {
   # change only the snapshot name in the paths
   $_ =~ s/u02\/$BaseDB/u02\/$DestDB/gi;
   $_ =~ s/fra\/$dbUniqueName/fra\/$DestDB/gi;
   print ($_);
   push(@NewInitLines, $_);
}   

push(@NewInitLines, "*.db_name='$DestDB'\n");
push(@NewInitLines, "*.db_unique_name='$DestDB'\n");
push(@NewInitLines, "*.dispatchers='(PROTOCOL=TCP)(SERVICE=${DestDB}XDB)'\n");
push(@NewInitLines, "*.log_archive_dest_1='location=USE_DB_RECOVERY_FILE_DEST'\n");
push(@NewInitLines, "*.sga_target=1G\n");
push(@NewInitLines, "*.pga_aggregate_target=100M\n");
push(@NewInitLines, "*.service_names='$DestDB'\n");
#push(@NewInitLines, "*.\n");

# write the new init file
open(FILE, ">$InitName");
print FILE @NewInitLines;
close(FILE);

$ENV{ORACLE_SID}=$DestDB;
$cmd = "$ORACLE_HOME/bin/sqlplus / as sysdba \@$ControlfileTrace";
&DoMsg($cmd);
open( CMD, $cmd . " |");
print (join("", <CMD>)); ## only print here as it logs and echoes its time as well
close CMD;
#if ( $? != 0 ) {
#   &DoMsg("Error creating the new snapshot for $DestDB. Exiting.");
#   exit(1);
#}

&DoMsg("New database snapshot $DestDB created successfully!");
&DoMsg("Starting using srvctl:");

my $cmd = "$ORACLE_HOME/bin/srvctl start database -d $DestDB";
&DoMsg ($cmd);
open( CMD, $cmd . " |");
&DoMsg (join("", <CMD>));
close CMD;
#if ( $? != 0 ) {
#   &DoMsg ("Destination database cannot be started using srvctl");
#   exit 1;
#} 

# 

#-------------------------------------------------------------------------------
# DoMsg
#
# PURPOSE    : echo with timestamp YYYY-MM-DD_H24:MI:SS
# PARAMS     : $*: the messages
# GLOBAL VARS: none
#-------------------------------------------------------------------------------   
sub DoMsg {

   my $msg = shift;
   my $timestamp = &getTimestamp;
   
   print ("$timestamp $msg\n");
   if (fileno(MAINLOG)) {print MAINLOG "$timestamp $msg\n";}
}


#-------------------------------------------------------------------------------
# getTimestamp
#
# PURPOSE    : returns timestamp in different formats
# PARAMS     : format_parm
# GLOBAL VARS: none
#-------------------------------------------------------------------------------
sub getTimestamp {
   #
   # Format 1:  dd-mm-yyyy_hh24:mi:ss
   # Format 2:  dd.mm.yyyy_hh24miss
   # Format 3:  dd.mm.yyyy
   # Format 4:  hh24:mi:ss
   # Rest:      dd.mm.yyyy hh24:mi:ss  (default)
   #
   my $Parm = shift;
   my $date;
   my $date2;
   my $heure;
   my $heure2;
   my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst);

   if ( length($Parm) > 1 ) {
      ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime($Parm);
   }
   else {
      ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime;
   }
   
   $date = (sprintf "%2.0d",($mday)).".".(sprintf "%2.0d",($mon+1)).".".($year+1900);
   $date =~ s/ /0/g;
   $date2 = (sprintf "%2.0d",($mday))."-".(sprintf "%2.0d",($mon+1))."-".($year+1900);
   $date2 =~ s/ /0/g;
   $heure = (sprintf "%2.0d",($hour)).":".(sprintf "%2.0d",($min)).":".(sprintf "%2.0d",($sec));
   $heure =~ s/ /0/g;
   $heure2 = (sprintf "%2.0d",($hour)).(sprintf "%2.0d",($min)).(sprintf "%2.0d",($sec));
   $heure2 =~ s/ /0/g;
   
   if    ($Parm eq "1") { return ($date2."_".$heure) }
   elsif ($Parm eq "2") { return ($date."_".$heure2) }
   elsif ($Parm eq "3") { return ($date) }
   elsif ($Parm eq "4") { return ($heure) }
   else { return ($date." ".$heure) };

}


#-------------------------------------------------------------------------------
# getWeekDay
#
# PURPOSE    : returns weekday (Sun - Sat)
# GLOBAL VARS: none
#-------------------------------------------------------------------------------
sub getWeekDay{
   my @date = split(" ", localtime(time));
   my $day = $date[0];
   return ($day);
}


#-------------------------------------------------------------------------------
# callSQLPLUS
#
# PURPOSE    : calls the rman utility
# PARAMS     : rman script name
# GLOBAL VARS: ReturnStatus, LogFile
#-------------------------------------------------------------------------------
#sub callSQLPLUS {
#    my $script = shift;
#	open( SQL, "$ORACLE_HOME/bin/sqlplus /nolog  \@$script |");  
#    &DoMsg (join("", <SQL>));
#    if ( $? != 0 ) { $rc = 1; } # RC if last call create an error
#    close SQL;
#}



#-------------------------------------------------------------------------------
# Usage
#
# PURPOSE    : print the Usage
# PARAMS     : none
# GLOBAL VARS: none
#-------------------------------------------------------------------------------
sub Usage {

   print <<EOF
   
Usage:  $basename -b <base>  [Optional Arguments]
           -b <base>       : db_name of the source database 
           -d <base>       : name of the destination database
           -s <snapshot>   : name of the snapshot to be used

        Purpose:
          Create a new snapshot of a standby database by apply-off, backup controlfile to trace, copy init, acfs snap, apply-on.


        Optional Arguments:
           -u <db_unique_name>   : name of the db_unique_name of the source database. if not specified, it will be taked from the source db, but it must be mounted!
                                   this parameter is used only for pattern replacement inside control file trace and init file.

        examples:
            $basename -b stout -s stout_save.Wed -d poug2648
            will clone stout from snapshot $baseACFS/.ACFS/snaps/stout_save.Wed to poug2648 
      
  THE EXISTING DESTINATION DATABASE SNAPSHOT WILL BE DROPPED!!
EOF

}


sub ConnectDB {

   # DB connection #
   $ENV{ORACLE_HOME}=$ORACLE_HOME;
   $ENV{ORACLE_SID}=$BaseDB;
   delete $ENV{TWO_TASK};

   &DoMsg ("Connecting to DB $BaseDB");
   &DoMsg ("OH: $ORACLE_HOME");
   &DoMsg ("SID: $BaseDB");
   unless ($dbh = DBI->connect('dbi:Oracle:', "sys", "Vagrant1_", {PrintError=>0, AutoCommit => 0, ora_session_mode => ORA_SYSDBA}))  {
      &DoMsg ("Error connecting to DB: ". $DBI::errstr);
      exit(1);
   }

   #&DoMsg ("Connected to DB $BaseDB");

}

sub QueryOneValue {

   my $sth;
   my $query = shift;

   unless ($sth = $dbh->prepare ($query)) {
      &DoMsg ("Error preparing statement $query: ".$dbh->errstr);
   }
   $sth->execute;
   my ($result) = $sth->fetchrow_array;

   return $result;
}

sub DisconnectDB {
   $dbh->disconnect;
}

Cheers

—

Ludovico

↧

My own Dbvisit Replicate integration with Grid Infrastructure

October 30, 2017, 9:46 am

≫ Next: The story of ACME and its CRM with serious SQL injections problems

≪ Previous: Get the Most out of Oracle Data Guard – The material

I am helping my customer for a PoC of Dbvisit Replicate as a logical replication tool. I will not discuss (at least, not in this post) about the capabilities of the tool itself, its configuration or the caveats that you should beware of when you do logical replication. Instead, I will concentrate on how we will likely integrate it in the current environment.

My role in this PoC is to make sure that the tool will be easy to operate from the operational point of view, and the database operations, here, are supported by Oracle Grid Infrastructure and cold failover clusters.

Note: there are official Dbvisit online resources about how to configure Dbvisit Replicate in a cluster. I aim to complement those informations, not copy them.

Quick overview

If you know Dbvisit replicate, skip this paragraph.

There are three main components of Dbvisit Replicate: The FETCHER, the MINE and the APPLY processes. The FETCHER gets the redo stream from the source and sends it to the MINE process. The MINE process elaborates the redo streams and converts it in proprietary transaction log files (named plog). The APPLY process gets the plog files and applies the transactions on the destination database.

From an architectural point of view, MINE and APPLY do not need to run close to the databases that are part of the configuration. The FETCHER process, by opposite, needs to be local to the source database online log files (and archived logs).

Because the MINE process is the most resource intensive, it is not convenient to run it where the databases reside, as it might consume precious CPU resources that are licensed for Oracle Database. So, first step in this PoC: the FETCHER processes will run on the cluster, while MINE and APPLY will run on a dedicated Virtual Machine.

Clustering considerations

the FETCHER does NOT need to run on the server of the source database: having access to the online logs through the ASM instance is enough
to avoid SPoF, the fetcher should be a cluster resource that can relocate without problems
to simplify the configuration, the FETCHER configuration and the Dbvisit binaries should be on a shared filesystem (the FETCHER does not persist any data, just the logs)
the destination database might be literally anywhere: the APPLY connects via SQL*Net, so a correct name resolution and routing to the destination database are enough

so the implementation steps are:

create a shared filesystem
install dbvisit in the shared filesystem
create the Dbvisit Replicate configuration on the dedicated VM
copy the configuration files on the cluster
prepare an action script
configure the resource
test!

Convention over configuration: the importance of a strong naming convention

Before starting the implementation, I decided to put all the caveats related to the FETCHER resource relocation on paper:

Where will the configuration files reside? Dbvisit has an important variable: the Configuration Name. All the operations are done by passing a configuration file named /{PATH}/{CONFIG_NAME}/{CONFIG_NAME}-{PROCESS_TYPE}.ddc to the dbvrep binary. So, I decided to put ALL the configuration directories under the same path: given the Configuration Name, I will always be able to get the configuration file path.
How will the configuration files relocate from one node to the other? Easy here: they won’t. I will use an ACFS filesystem
How can I link the cluster resource with its configuration name? Easy again: I call my resources dbvrep.CONFIGNAME.PROCESS_TYPE. e.g. dbvrep.FROM_A_TO_B.fetcher
How will I manage the need to use a new version of dbvisit in the future? Old and new versions must coexist: Instead of using external configuration files, I will just use a custom resource attribute named DBVREP_HOME inside my resource type definition. (see later)
What port number should I use? Of course, many fetchers started on different servers should not have conflicts. This is something that might be either planned or made dynamic. I will opt for the first one. But instead of getting the port number inside the Dbvisit configuration, I will use a custom resource attribute: DBVREP_PORT.

Considerations on the FETCHER listen address

This requires a dedicated paragraph. The Dbvisit documentation suggest to create a VIP, bind on the VIP address and create a dependency between the FETCHER resource and the VIP. Here is where my configuration will differ.

Having a separate VIP per FETCHER resource might, potentially, lead to dozens of VIPs in the cluster. Everything will depend on the success of the PoC and on how many internal clients will decide to ask for such implementation. Many VIPs == many interactions with network admins for address reservation, DNS configurations, etc. Long story short, it might slow down the creation and maintenance of new configurations.

Instead, each FETCHER will listen to the local server address, and the action script will take care of:

getting the current host name
getting the current ASM instance
changing the settings of the specific Dbvisit Replicate configuration (ASM instance and FETCHER listen address)
starting the FETCHER

Implementation

Now that all the caveats and steps are clear, I can show how I implemented it:

Create a shared filesystem

asmcmd volcreate -G ACFS -s 10G dbvisit --column 1
/sbin/mkfs -t acfs /dev/asm/dbvisit-293
sudo /u01/app/grid/product/12.1.0.2/grid/bin/srvctl add filesystem -d /dev/asm/dbvisit-293 -m /u02/data/oracle/dbvisit -u oracle -fstype ACFS -autostart ALWAYS
srvctl start filesystem -d /dev/asm/dbvisit-293

Install dbvisit in the shared filesystem

out of scope!

Create the Dbvisit Replicate configuration on the dedicated VM

out of scope!

Copy the configuration files from the Dbvisit VM to the cluster

scp /u02/data/oracle/dbvisit/FROM_A_TO_B/FROM_A_TO_B-FETCHER.ddc \ 
 cluster-scan:/u02/data/oracle/dbvisit/FROM_A_TO_B

Prepare an action script

$ cat dbvrep.sh
#!/bin/ksh
########################################
# Name   : dbvrep.sh
# Author : Ludovico Caldara, Trivadis AG

# the DBVISIT FETCHER process needs to know 2 attributes: DBVREP_HOME and DBVREP_PORT.
# If you want to call the action script directly set:
# _CRS_NAME=<resource name in format dbvrep.CONFIGNAME.fetcher>
# _CRS_DBVREP_HOME=<dbvrep installation path>
# _CRS_DBVREP_PORT=<listening port>

DBVREP_RES_NAME=${_CRS_NAME}
DBVREP_CONFIG_NAME=`echo $DBVREP_RES_NAME | awk -F. '{print $2}'`

# MINE, FETCHER or APPLY?
DBVREP_PROCESS_TYPE=`echo $DBVREP_RES_NAME | awk -F. '{print toupper($3)}'`

DBVREP_HOME=${_CRS_DBVREP_HOME}
DBVREP=${DBVREP_HOME}/dbvrep
DBVREP_PORT=${_CRS_DBVREP_PORT}
DBVREP_CONFIG_PATH=/u02/data/oracle/dbvisit

DBVREP_CONFIG_FILE=${DBVREP_CONFIG_PATH}/${DBVREP_CONFIG_NAME}/${DBVREP_CONFIG_NAME}-${DBVREP_PROCESS_TYPE}.ddc

function F_verify_dbvrep_up {
        ps -eaf | grep "[d]bvrep ${DBVREP_PROCESS_TYPE} $DBVREP_CONFIG_NAME" > /dev/null
        if [ $? -eq 0 ] ; then
                echo "OK"
        else
                echo "KO"
                exit 1
        fi
}

ACTION="${1}"
case "$ACTION" in

        'start')
        LOCAL_ASM="+"`ps -eaf | grep [a]sm_pmon | awk -F+ '{print $NF}'`;

        if [ "${DBVREP_PROCESS_TYPE}" == "FETCHER" ] ; then
                $DBVREP --daemon --ddcfile ${DBVREP_CONFIG_FILE} --silent <<EOF
set FETCHER.FETCHER_REMOTE_INTERFACE=${HOSTNAME}:${DBVREP_PORT}
set FETCHER.FETCHER_LISTEN_INTERFACE=${HOSTNAME}:${DBVREP_PORT}
set FETCHER.MINE_ASM=${LOCAL_ASM}
start FETCHER
EOF
        fi
;;

        'stop')
        $DBVREP --daemon --ddcfile ${DBVREP_CONFIG_FILE} shutdown ${DBVREP_PROCESS_TYPE}

;;

        'check')
        F_verify_dbvrep_up
;;

        'clean')
        sleep 1
        exit 0
;;

        *)
usage
;;

esac

Configure the resource

$ cat dbvrep.type
ATTRIBUTE=ACTION_SCRIPT
DEFAULT_VALUE=/path_to_action_script/dbvrep.ksh
TYPE=STRING
FLAGS=CONFIG

ATTRIBUTE=SCRIPT_TIMEOUT
DEFAULT_VALUE=120
TYPE=INT
FLAGS=CONFIG

ATTRIBUTE=DBVREP_PORT
DEFAULT_VALUE=
TYPE=INT
FLAGS=CONFIG

ATTRIBUTE=DBVREP_HOME
DEFAULT_VALUE=/u02/data/oracle/dbvisit/replicate
TYPE=STRING
FLAGS=CONFIG

ATTRIBUTE=SERVER_POOLS
DEFAULT_VALUE=*
TYPE=STRING
FLAGS=CONFIG|HOTMOD

ATTRIBUTE=START_DEPENDENCIES
DEFAULT_VALUE=hard() weak(type:ora.listener.type,global:type:ora.scan_listener.type) pullup()
TYPE=STRING
FLAGS=CONFIG

ATTRIBUTE=STOP_DEPENDENCIES
DEFAULT_VALUE=hard()
TYPE=STRING
FLAGS=CONFIG


ATTRIBUTE=RESTART_ATTEMPTS
DEFAULT_VALUE=2
TYPE=INT
FLAGS=CONFIG

ATTRIBUTE=CHECK_INTERVAL
DEFAULT_VALUE=60
TYPE=INT
FLAGS=CONFIG

ATTRIBUTE=FAILURE_THRESHOLD
DEFAULT_VALUE=2
TYPE=INT
FLAGS=CONFIG

ATTRIBUTE=UPTIME_THRESHOLD
DEFAULT_VALUE=8h
TYPE=STRING
FLAGS=CONFIG

ATTRIBUTE=FAILURE_INTERVAL
DEFAULT_VALUE=3600
TYPE=INT
FLAGS=CONFIG

$ crsctl add type dbvrep.type -basetype cluster_resource -file dbvrep.type
$ crsctl add resource dbvrep.FROM_A_TO_B.fetcher -type dbvrep.type \
  -attr "START_DEPENDENCIES=hard(db.source) pullup:always(db.source),STOP_DEPENDENCIES=hard(db.source),DBVREP_PORT=7901"

Test!

$ crsctl start res dbvrep.FROM_A_TO_B.fetcher
CRS-2672: Attempting to start 'dbvrep.FROM_A_TO_B.fetcher' on 'server1'
CRS-2676: Start of 'dbvrep.FROM_A_TO_B.fetcher' on 'server1' succeeded

..in the logs..
2017-10-30 15:24:34.992478 :    AGFW:1127589632: {1:30181:30166} Agent received the message: RESOURCE_START[dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4098:5175912
2017-10-30 15:24:34.992512 :    AGFW:1127589632: {1:30181:30166} Preparing START command for: dbvrep.FROM_A_TO_B.fetcher 1 1
2017-10-30 15:24:34.992521 :    AGFW:1127589632: {1:30181:30166} dbvrep.FROM_A_TO_B.fetcher 1 1 state changed from: OFFLINE to: STARTING
2017-10-30 15:24:34.993195 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] Executing action script: dbvrep.ksh[start]
2017-10-30 15:24:41.254703 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] Variable FETCHER_REMOTE_INTERFACE set to server1:7901 for process
2017-10-30 15:24:41.254726 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] FETCHER.
2017-10-30 15:24:41.354916 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] Variable FETCHER_LISTEN_INTERFACE set to server1:7901 for process
2017-10-30 15:24:41.354935 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] FETCHER.
2017-10-30 15:24:41.405052 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] Variable MINE_ASM set to +ASM1 for process FETCHER.
2017-10-30 15:24:41.605423 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] Starting process FETCHER...started
2017-10-30 15:24:41.655660 :    AGFW:1106577152: {1:30181:30166} Command: start for resource: dbvrep.FROM_A_TO_B.fetcher 1 1 completed with status: SUCCESS
2017-10-30 15:24:41.656100 :CLSDYNAM:1081362176: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [check] Executing action script: dbvrep.ksh[check]
2017-10-30 15:24:41.658242 :    AGFW:1127589632: {1:30181:30166} Agent sending reply for: RESOURCE_START[dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4098:5175912
2017-10-30 15:24:41.908256 :CLSDYNAM:1081362176: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [check] OK
2017-10-30 15:24:41.908440 :    AGFW:1127589632: {1:30181:30166} dbvrep.FROM_A_TO_B.fetcher 1 1 state changed from: STARTING to: ONLINE
2017-10-30 15:24:41.908486 :    AGFW:1127589632: {1:30181:30166} Started implicit monitor for [dbvrep.FROM_A_TO_B.fetcher 1 1] interval=60000 delay=60000
2017-10-30 15:24:41.908696 :    AGFW:1127589632: {1:30181:30166} Agent sending last reply for: RESOURCE_START[dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4098:5175912


$ crsctl stop res dbvrep.FROM_A_TO_B.fetcher
CRS-2673: Attempting to stop 'dbvrep.FROM_A_TO_B.fetcher' on 'server1'
CRS-2677: Stop of 'dbvrep.FROM_A_TO_B.fetcher' on 'server1' succeeded

..in the logs..
2017-10-30 15:22:14.891730 :    AGFW:1127589632: {1:30181:30156} Agent received the message: RESOURCE_STOP[dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4099:5175818
2017-10-30 15:22:14.891762 :    AGFW:1127589632: {1:30181:30156} Preparing STOP command for: dbvrep.FROM_A_TO_B.fetcher 1 1
2017-10-30 15:22:14.891772 :    AGFW:1127589632: {1:30181:30156} dbvrep.FROM_A_TO_B.fetcher 1 1 state changed from: ONLINE to: STOPPING
2017-10-30 15:22:14.892400 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] Executing action script: dbvrep.ksh[stop]
2017-10-30 15:22:20.957375 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] DDC loaded from database (458 variables).
2017-10-30 15:22:21.007939 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] Dbvisit Replicate version 2.9.04
2017-10-30 15:22:21.007963 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] Copyright (C) Dbvisit Software Limited. All rights reserved.
2017-10-30 15:22:21.007976 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] DDC file
2017-10-30 15:22:21.007994 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] /u02/data/oracle/dbvisit/FROM_A_TO_B/FROM_A_TO_B
2017-10-30 15:22:21.008005 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] -FETCHER.ddc loaded.
2017-10-30 15:22:21.108340 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] Dbvisit Replicate FETCHER process shutting down.
2017-10-30 15:22:21.108361 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] OK-0: Completed successfully.
2017-10-30 15:22:45.747531 :    AGFW:1091868416: {1:30181:30156} Command: stop for resource: dbvrep.FROM_A_TO_B.fetcher 1 1 completed with status: SUCCESS
2017-10-30 15:22:45.747898 :    AGFW:1127589632: {1:30181:30156} Agent sending reply for: RESOURCE_STOP[dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4099:5175818
2017-10-30 15:22:45.747902 :CLSDYNAM:1123387136: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [check] Executing action script: dbvrep.ksh[check]
2017-10-30 15:22:45.949702 :CLSDYNAM:1123387136: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [check] KO
2017-10-30 15:22:45.949913 :    AGFW:1127589632: {1:30181:30156} dbvrep.FROM_A_TO_B.fetcher 1 1 state changed from: STOPPING to: OFFLINE
2017-10-30 15:22:45.950014 :    AGFW:1127589632: {1:30181:30156} Agent sending last reply for: RESOURCE_STOP[dbvrep.dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4098:5175818

Also the relocation worked as expected: when the settings are modified through:

set FETCHER.FETCHER_REMOTE_INTERFACE=${HOSTNAME}:${DBVREP_PORT}
set FETCHER.FETCHER_LISTEN_INTERFACE=${HOSTNAME}:${DBVREP_PORT}
set FETCHER.MINE_ASM=${LOCAL_ASM}

The MINE process get the change dynamically, so no need to restart it.

Last consideration

Adding a hard dependency between the DB and the FETCHER will require to stop the DB with the force option or to always stop the fetcher before the database. Also, the start of the DB will pullup the FETCHER (pullup:always) and the opposite as well. We will consider furtherly if we will use this dependency or if we will manage it differently (e.g. through the action script).

The hard dependency declared without the global keyword, will always start the fetcher on the server where the database runs. This is not required, but it might be nice to see the fetcher on the same node. Again, a consideration that we will discuss furtherly.

HTH

—

Ludovico

↧

The story of ACME and its CRM with serious SQL injections problems

February 16, 2018, 5:29 am

≫ Next: BP and Patch 22652097: set optimizer_adaptive_statistics to FALSE explicitly or it might not work!

≪ Previous: My own Dbvisit Replicate integration with Grid Infrastructure

Preface/Disclaimer

This story is real, but I had to mask some names and introduce some minor changes so that real people are not easy to recognize and the whole story does not sound offensive to anyone. This post is not technic, so my non-technical English will be fully exposed. Sorry for the many errors

ACME, The Company

ACME is a big, global company. It has a huge revenue and there are almost no competitors on the market that get close to it in terms of fame and earnings.

Its core business is heavily supported by its CRM system, that holds all the customers, contracts, prospects, suppliers…

FOOBAR CRM, The CRM system

Despite the CRM is not ACME’s core business, the data in there is really, really precious. Without prospects and customer data, the sales cannot close the deals.

The CRM application (let’s call it FOOBAR CRM) runs on a legacy architecture and it is as old as the company itself.

The architecture is the “old good style” web application that was common in the early 2000’s… : browser front-end (OK, you might think that it is not so old, huh?) , PHP application backed by Apache, MySQL database.

As you can see, quite old but not so uncommon.

One of the big concerns, as in every application lifecycle, is to maintain good code quality. At the beginning of the PHP era, when PHP was still popular, there was a lack of good frameworks (I’m not even sure if there are now, I’m sure Zend Framework was a cool MVC framework but it came out many years later). The result is that now the code maintenance of the application is literally a pain in the a**.

The customer is a noob in development, so when it has been founded and needed a CRM system, the management delegated the development to an external company (let’s call it FOOBAR).

FOOBAR, The software house

The company FOOBAR is as old as the ACME company. Respective founders were relatives: they started the business together and now that the founders left, the partnership is working so well that FOOBAR is also one the biggest resellers of ACME products (despite its business is loosely related to ACME’s business). FOOBAR is also at the same time a partner and a customer, and some member of its board are also part of ACME’s board.

What is important here, is that the advices coming from the “common board members” are considered much more important than the advices coming from ACME’s employees, customers and marketing department.

The code maintenability

ACME has started small, with a small “oldish” CRM system. But some years later ACME experienced a huge increase of customers, product portfolio, employees, revenues etc..

In order to cope with the increasing workload of the application, they scaled everything up/out: there are now tens of web servers nicely load balanced, some webcache servers, and they introduced Galera cluster in conjunction with some replicated servers to scale out the database workload.

The global business of ACME also required to open the FOOBAR CRM application to the internet, exposing it to a wide range of potential attacks.

In order to cope with increasing needs, FOOBAR proposed an increasing number of modules, pieces of code, tools to expand the CRM system. To maximize the profits, FOOBAR decided to employ only junior developers, unexperienced and not familiar at all with development of applications using big RDBMS systems and a very scarse sense of secure programming.

That’s not all!

In order to develop new features faster, ACME and FOOBAR have an agreement that let the end users develop their own modules in PHP code and plug them in the application, most of the times directly in production (you may think: that’s completely crazy, this should NEVER happen in a serious company! You know what? I agree 100%).

Uh, I forgot to mention, the employees that use the CRM application and have some development skills are VERY, VERY happy to have the permission to code on their own, because they can develop features or solve bugfixes on their own, depending on their needs.

Result: the code is completely out of control: few or no unit tests, no integration tests at all, poor security, tons of bugs.

The big SQL Injection problem

Among many bugs, the SQL injection is the most common. It started with some malicious users trying to play around with injection techniques, but now the attacks are happening more and more frequently:

The attacks come from many hackers (not related to each other)
Some hackers try to get money for that, some other just steal data, some other want just to mess up and low down ACME’s reputation…

everytime an attack is successful, ACME looses more and more contracts (and money).

The fix, up to now, was to track the hacker IP address AFTER the attack and add it to the firewall blacklist (not so clever, huh?).

Possible Solutions (according to the security experts)

ACME mandated an external company to do an assessment. The external company proposed a few things:

SOLUTION 1: Change completely the CRM software and use something more modern, modular, secure and developed by a company that hires top talents. There are tons of cloud vendors that offer CRM software as a Service, and other big companies with proven on-premises CRM solutions.
SOLUTION 2: Keep the current solution, but with a few caveats:
- All the code accessing the database must be reviewed to avoid injections
- only the experienced developers should have the right to write new code (possibly employees of the software house, that will be accountable for new vulnerabilities)
SOLUTION 3: Install content-sensitive firewalls and IDS that detect SQL Injection patterns and block them before they reach the web server and/or the database layer.

What the CRM users think

User ALPHA (the shadow IT guy): “We cannot afford to implement any of the solutions: we, as users, need the agility to develop new things for ourselves! And what if there is a bug? If I have to wait a fix from the software house, I might loose customers or contracts before the CRM is available again!”

User BRAVO (the skeptical): “SQL Injection is a complex problem, you cannot solve it just by fixing the current bugs and revoke the grants to develop new code to the non-developers”

User CHARLIE (the lawyer): “When I’ve been hired, I’ve been told that I had the right to drink coffee and develop my own modules. I would never work for a company that would not allow me to drink coffee! Drinking coffee and creating vulnerabilities, are both rights!”

User DELTA (the average non-sense): “The problem is not the vulnerable code, but all those motherf****** of hackers that try to inject malicious code. We should cure mental illness of geeks so they do not transform themselves in hackers.”

User ECHO (the hacker specialist): “If we ask stackoverflow to provide the IP addresses of the people that search for SQL injection code examples, we might preventively block their IP addresses on our external firewall!”

User FOXTROT (the false realist): “Hacker attacks happen, and there’s not much we can do against them. If we fix the code and implement security constraints, there will always be hackers trying to find vulnerabilities. You miss the real problem! We must cure this geeks/hackers insanity first!”

User GOLF (the non-sense paragon): “You concentrate on contracts lost because of SQL Injections, but the food in our restaurant sucks, and our sales also lose contracts because they struggle to fight stomach ache”.

User HOTEL (the denier): “I’ve never seen the logs that show the SQL Injections, I am sure it is a complot of the no-code organizations meant to sell us some WYSIWIG products”.

User INDIA (the unheard): “Why can’t we just follow what the Security Experts suggest and see if it fixes the problem?”

What the management thinks

“We send thought and prayers to all our sales, you are not alone and you’ll never be. (… and thanks for the amazing party, FOOBAR, the wine was delicious!)”

What ACME did to solve the problem

Absolutely nothing.

Forecast

More SQL Injections.

UPDATE 20.02.2018

Many people asked me who was the ACME customer that had the SQL injection problem. None. It is an analogy to the US mass shootings that happen more and more frequently, the last one at the time of writing: https://en.wikipedia.org/wiki/Stoneman_Douglas_High_School_shooting

This post is intended to show that, if explained as it was an IT problem, the solution would sound so easy that nobody would have any doubts about the steps that must be done.

Unfortunately, it is not the case, and the US is condamned to have more and more mass shootings because nobody wants to fix the problem.

↧