Querying the dba_hist_sys_time_model to get historical data

October 23, 2015, 1:33 pm

≫ Next: Get information about Cursor Sharing for a SQL_ID

≪ Previous: Grid Infrastructure 12c: Recovering the GRID Disk Group and recreating the GIMR

This quick post is mainly for myself… I will certainly use it for reference in the future.

Debugging problems due to adaptive dynamic sampling and in general adaptive features sometimes needs to get historical data about, e.g., parse time.

In order to get this information you may need to query the view DBA_HIST_SYS_TIME_MODEL (take care, it needs Diagnostic Pack license!)

You can use this query as an example.

with h as (
select s.snap_id, s.BEGIN_INTERVAL_TIME,
        --s.END_INTERVAL_TIME,
        g.STAT_ID,
        g.stat_name,
        nvl(
          decode(
            greatest(
              VALUE,
              nvl(lag(VALUE) over (partition by s.dbid, s.instance_number, g.stat_name order by s.snap_id),0)
             ),
            VALUE,
            VALUE - lag(VALUE)
               over (partition by s.dbid,
                                    s.instance_number,
                                    g.stat_name
                    order by s.snap_id
                ),
            VALUE
           ),
           0
        ) VALUE
from DBA_HIST_SNAPSHOT s,
    DBA_HIST_SYS_TIME_MODEL g,
    v$instance i
where s.SNAP_ID=g.SNAP_ID
and s.BEGIN_INTERVAL_TIME >=
    trunc(to_timestamp(nvl('&startdate',to_char(sysdate,'YYYYMMDD')),'YYYYMMDD'))
and s.BEGIN_INTERVAL_TIME < =
    trunc(to_timestamp(nvl('&enddate',to_char(sysdate,'YYYYMMDD')),'YYYYMMDD')+1)
and s.instance_number=i.instance_number
and s.instance_number=g.instance_number
)
select p.begin_interval_time, p.value as "parse time elapsed", t.value as "DB time",
round(p.value/t.value,2)*100 as "parse pct", par.value as opt_adapt_feat
from h p, h t , dba_hist_parameter par
where p.snap_id=t.snap_id
and p.snap_id=par.snap_id
and p.stat_name='parse time elapsed'
and t.stat_name='DB time'
and par.parameter_name='optimizer_adaptive_features'
and t.value>0
order by p.begin_interval_time
/

In this specific example, it shows the “parse time elapsed”, the “DB time” and the percentage parse/dbtime, along with the value of the parameter “optimizer_adaptive_features“. You can use it to check if changing the parameters related to adaptive dynamic sampling improves or not the parse time.

The output will be something like this:

BEGIN_INTERVAL_TIME    	  parse time elapsed     DB time  parse pct OPT_ADAPT_FEAT
-------------- ---------- ------------------ ----------- ---------- ----------
23-OCT-15 03.00.36.569 AM       3235792   	57030479      	5.67 TRUE
23-OCT-15 03.30.38.712 AM       3438093   	60262996       	5.71 TRUE
23-OCT-15 04.00.40.709 AM       4622998   	69813760       	6.62 TRUE
23-OCT-15 04.30.42.776 AM       4590463   	56441202       	8.13 TRUE
23-OCT-15 05.00.44.735 AM      13772357        113741371      	12.11 TRUE
23-OCT-15 05.30.46.722 AM       3448944   	49807800       	6.92 TRUE
23-OCT-15 06.00.48.664 AM       4792886   	54235691       	8.84 TRUE
23-OCT-15 06.30.50.713 AM       8527305   	58775613      	14.51 TRUE
23-OCT-15 07.00.52.667 AM       8518273   	75248056      	11.32 TRUE
23-OCT-15 07.30.54.622 AM       9800048  	17381081       1.07 TRUE
23-OCT-15 08.00.56.609 AM       6986551       1629027583      .43 TRUE
23-OCT-15 08.30.58.568 AM       8414695       2493025822      .34 TRUE
23-OCT-15 09.00.00.457 AM      13648260       2412333113      .57 TRUE
23-OCT-15 09.30.02.384 AM      15186610       4635080356      .33 TRUE
23-OCT-15 10.00.04.298 AM      23465769  	39080849       3.17 FALSE
23-OCT-15 10.30.06.421 AM      12152991       2654461964      .46 FALSE
23-OCT-15 11.00.08.444 AM      24901111        549936076       4.53 FALSE
23-OCT-15 11.30.10.485 AM       8080236        354568317       2.28 FALSE
23-OCT-15 12.00.12.453 PM       4291839   	91028268       	4.71 FALSE
23-OCT-15 12.30.14.430 PM       3675163        177312397       2.07 FALSE
23-OCT-15 01.00.16.468 PM       9184841        231138367       3.97 FALSE
23-OCT-15 01.30.18.438 PM       8132397        162607229       5 FALSE
23-OCT-15 02.00.20.707 PM      13375709        210251458       6.36 FALSE
23-OCT-15 02.30.23.740 PM      10116413        285114368       3.55 FALSE
23-OCT-15 03.00.25.699 PM       8067777        123864339       6.51 FALSE
23-OCT-15 03.30.27.641 PM       5787931        110621767       5.23 FALSE

HTH

—

Ludo

↧

Get information about Cursor Sharing for a SQL_ID

October 29, 2015, 6:32 am

≫ Next: Migrating Oracle RAC from SuSE to OEL (or RHEL) live

≪ Previous: Querying the dba_hist_sys_time_model to get historical data

Yesterday I’ve got a weird problem with Adaptive Cursor Sharing. I’m not sure yet about the issue, but it seems to be related to cursor sharing histograms. Hopefully one day I will blog about what I’ve learnt from this experience.

To better monitor the problem on that specific query, I’ve prepared this script (tested on 12.1.0.2):

COLUMN Shareable HEADING 'S|H|A|R|E|A|B|L|E'
COLUMN "Bind-Aware" HEADING 'B|I|N|D| |A|W|A|R|E'
COLUMN Sensitive HEADING 'S|E|N|S|I|T|I|V|E'
COLUMN Reoptimizable HEADING 'R|E|O|P|T|I|M|I|Z|A|B|L|E'
BREAK on child_number on Execs on "Gets/Exec" on "Ela/Exec" on "Sensitive" on "Shareable" on "Bind-Aware" on bucket0 on bucket1 on bucket2 on cnt on "Reoptimizable" on is_resolved_adaptive_plan

select * from (select *
  from (
select 
s.child_number,
  s.plan_hash_value,
  executions as Execs, 
  round(buffer_gets/executions) as "Gets/Exec",
  round(elapsed_time/executions) as "Ela/Exec",
  is_bind_sensitive as "Sensitive",
  is_shareable as "Shareable",
  is_bind_aware as "Bind-Aware",
  to_char(h.bucket_id) as bucket, h.count as cnt,
  is_reoptimizable as "Reoptimizable",
  is_resolved_adaptive_plan,
  "UNBOUND_CURSOR",  "SQL_TYPE_MISMATCH",  "OPTIMIZER_MISMATCH",
  "OUTLINE_MISMATCH", "STATS_ROW_MISMATCH", "LITERAL_MISMATCH",
  "FORCE_HARD_PARSE", "EXPLAIN_PLAN_CURSOR", "BUFFERED_DML_MISMATCH",
  "PDML_ENV_MISMATCH", "INST_DRTLD_MISMATCH", "SLAVE_QC_MISMATCH",
  "TYPECHECK_MISMATCH", "AUTH_CHECK_MISMATCH", "BIND_MISMATCH",
  "DESCRIBE_MISMATCH", "LANGUAGE_MISMATCH", "TRANSLATION_MISMATCH",
  "BIND_EQUIV_FAILURE", "INSUFF_PRIVS", "INSUFF_PRIVS_REM",
  "REMOTE_TRANS_MISMATCH", "LOGMINER_SESSION_MISMATCH", "INCOMP_LTRL_MISMATCH",
  "OVERLAP_TIME_MISMATCH", "EDITION_MISMATCH", "MV_QUERY_GEN_MISMATCH",
  "USER_BIND_PEEK_MISMATCH", "TYPCHK_DEP_MISMATCH", "NO_TRIGGER_MISMATCH",
  "FLASHBACK_CURSOR", "ANYDATA_TRANSFORMATION", "PDDL_ENV_MISMATCH",
  "TOP_LEVEL_RPI_CURSOR", "DIFFERENT_LONG_LENGTH", "LOGICAL_STANDBY_APPLY",
  "DIFF_CALL_DURN", "BIND_UACS_DIFF", "PLSQL_CMP_SWITCHS_DIFF",
  "CURSOR_PARTS_MISMATCH", "STB_OBJECT_MISMATCH", "CROSSEDITION_TRIGGER_MISMATCH",
  "PQ_SLAVE_MISMATCH", "TOP_LEVEL_DDL_MISMATCH", "MULTI_PX_MISMATCH",
  "BIND_PEEKED_PQ_MISMATCH", "MV_REWRITE_MISMATCH", "ROLL_INVALID_MISMATCH",
  "OPTIMIZER_MODE_MISMATCH", "PX_MISMATCH", "MV_STALEOBJ_MISMATCH",
  "FLASHBACK_TABLE_MISMATCH", "LITREP_COMP_MISMATCH", "PLSQL_DEBUG",
  "LOAD_OPTIMIZER_STATS", "ACL_MISMATCH", "FLASHBACK_ARCHIVE_MISMATCH",
  "LOCK_USER_SCHEMA_FAILED", "REMOTE_MAPPING_MISMATCH", "LOAD_RUNTIME_HEAP_FAILED",
  "HASH_MATCH_FAILED", "PURGED_CURSOR", "BIND_LENGTH_UPGRADEABLE",
  "USE_FEEDBACK_STATS"
from v$sql s
  join v$sql_cs_histogram h
    on (s.sql_id=h.sql_id and
	s.child_number=h.child_number and
	s.con_id=h.con_id
	)
  join v$sql_shared_cursor shc
    on (shc.sql_id=h.sql_id and 
	shc.child_number=h.child_number and
	s.con_id=shc.con_id
	)
	where s.sql_id='&sql_id'
)
pivot (sum(cnt) for (bucket) IN ('0' AS Bucket0,'1' AS Bucket1,'2' AS Bucket2))
)
unpivot (result FOR reason_type IN ("UNBOUND_CURSOR",
  "SQL_TYPE_MISMATCH", "OPTIMIZER_MISMATCH",
  "OUTLINE_MISMATCH", "STATS_ROW_MISMATCH", "LITERAL_MISMATCH",
  "FORCE_HARD_PARSE", "EXPLAIN_PLAN_CURSOR", "BUFFERED_DML_MISMATCH",
  "PDML_ENV_MISMATCH", "INST_DRTLD_MISMATCH", "SLAVE_QC_MISMATCH",
  "TYPECHECK_MISMATCH", "AUTH_CHECK_MISMATCH", "BIND_MISMATCH",
  "DESCRIBE_MISMATCH", "LANGUAGE_MISMATCH", "TRANSLATION_MISMATCH",
  "BIND_EQUIV_FAILURE", "INSUFF_PRIVS", "INSUFF_PRIVS_REM",
  "REMOTE_TRANS_MISMATCH", "LOGMINER_SESSION_MISMATCH", "INCOMP_LTRL_MISMATCH",
  "OVERLAP_TIME_MISMATCH", "EDITION_MISMATCH", "MV_QUERY_GEN_MISMATCH",
  "USER_BIND_PEEK_MISMATCH", "TYPCHK_DEP_MISMATCH", "NO_TRIGGER_MISMATCH",
  "FLASHBACK_CURSOR", "ANYDATA_TRANSFORMATION", "PDDL_ENV_MISMATCH",
  "TOP_LEVEL_RPI_CURSOR", "DIFFERENT_LONG_LENGTH", "LOGICAL_STANDBY_APPLY",
  "DIFF_CALL_DURN", "BIND_UACS_DIFF", "PLSQL_CMP_SWITCHS_DIFF",
  "CURSOR_PARTS_MISMATCH", "STB_OBJECT_MISMATCH", "CROSSEDITION_TRIGGER_MISMATCH",
  "PQ_SLAVE_MISMATCH", "TOP_LEVEL_DDL_MISMATCH", "MULTI_PX_MISMATCH",
  "BIND_PEEKED_PQ_MISMATCH", "MV_REWRITE_MISMATCH", "ROLL_INVALID_MISMATCH",
  "OPTIMIZER_MODE_MISMATCH", "PX_MISMATCH", "MV_STALEOBJ_MISMATCH",
  "FLASHBACK_TABLE_MISMATCH", "LITREP_COMP_MISMATCH", "PLSQL_DEBUG",
  "LOAD_OPTIMIZER_STATS", "ACL_MISMATCH", "FLASHBACK_ARCHIVE_MISMATCH",
  "LOCK_USER_SCHEMA_FAILED", "REMOTE_MAPPING_MISMATCH", "LOAD_RUNTIME_HEAP_FAILED",
  "HASH_MATCH_FAILED", "PURGED_CURSOR", "BIND_LENGTH_UPGRADEABLE",
  "USE_FEEDBACK_STATS"))
where result='Y'
order by child_number;

The result is something similar (in my case it has 26 child cursors):

R
                                                                    E
                                                                    O
                                                                  B P
                                                              S S I T
                                                              E H N I
                                                              N A D M
                                                              S R   I
                                                              I E A Z
                                                              T A W A
                                                              I B A B
                                                              V L R L
CHILD_NUMBER PLAN_HASH_VALUE      EXECS  Gets/Exec   Ela/Exec E E E E I    BUCKET0    BUCKET1    BUCKET2 REASON_TYPE                   R
------------ --------------- ---------- ---------- ---------- - - - - - ---------- ---------- ---------- ----------------------------- -
           0      2293695281        455       2466      14464 Y Y Y N            0        455          0 ROLL_INVALID_MISMATCH         Y
                  2293695281                                                                             BIND_EQUIV_FAILURE            Y
           1      1690560038         99      13943     103012 Y Y Y N            0         99          0 ROLL_INVALID_MISMATCH         Y
                  1690560038                                                                             BIND_EQUIV_FAILURE            Y
           2      3815006743        541      43090     230245 Y Y Y N            0        541          0 BIND_EQUIV_FAILURE            Y
                  3815006743                                                                             ROLL_INVALID_MISMATCH         Y
           3      1483632464        251       4111      18940 Y Y Y N           49        202          0 ROLL_INVALID_MISMATCH         Y
                  1483632464                                                                             BIND_EQUIV_FAILURE            Y
           4      3815006743       1152      42632     220730 Y Y Y N            0       1000          0 BIND_EQUIV_FAILURE            Y
                  3815006743                                                                             ROLL_INVALID_MISMATCH         Y
           5      3922835573        150      39252     184176 Y Y Y N            0        150          0 ROLL_INVALID_MISMATCH         Y
                  3922835573                                                                             BIND_EQUIV_FAILURE            Y
           6       767857637          3       4731     124707 Y Y Y N            0          3          0 ROLL_INVALID_MISMATCH         Y
                   767857637                                                                             BIND_EQUIV_FAILURE            Y
           7       767857637         11       4739      71119 Y Y Y N            0         11          0 BIND_EQUIV_FAILURE            Y
           8      2800467281          1        307     249727 Y Y Y N            0          1          0 BIND_EQUIV_FAILURE            Y
           9      3123241890        536       2982      14428 Y Y Y N            6        530          0 ROLL_INVALID_MISMATCH         Y
                  3123241890                                                                             BIND_EQUIV_FAILURE            Y
          10      3125518635         17        315      16492 Y Y Y N           16          1          0 ROLL_INVALID_MISMATCH         Y
                  3125518635                                                                             BIND_EQUIV_FAILURE            Y
          11      2184442252        130       4686      40188 Y Y Y N            0        130          0 ROLL_INVALID_MISMATCH         Y
                  2184442252                                                                             BIND_EQUIV_FAILURE            Y
          12      3815006743        553      42765     231391 Y Y Y N            0        553          0 ROLL_INVALID_MISMATCH         Y
                  3815006743                                                                             BIND_EQUIV_FAILURE            Y
          13      1166983254         47      14193     111256 Y Y Y N            0         47          0 BIND_EQUIV_FAILURE            Y
                  1166983254                                                                             ROLL_INVALID_MISMATCH         Y
          14      2307602173          2         38      45922 Y Y Y N            2          0          0 BIND_EQUIV_FAILURE            Y
                  2307602173                                                                             ROLL_INVALID_MISMATCH         Y
          15       767857637         11       4304      59617 Y Y Y N            0         11          0 BIND_EQUIV_FAILURE            Y
                   767857637                                                                             ROLL_INVALID_MISMATCH         Y
          16      3108045525          2      34591     176749 Y N N N            1          1          0 ROLL_INVALID_MISMATCH         Y
                  3108045525                                                                             LOAD_OPTIMIZER_STATS          Y
                  3108045525                                                                             BIND_EQUIV_FAILURE            Y
          17      3108045525          6       1794      33335 Y Y Y N            4          2          0 BIND_EQUIV_FAILURE            Y
                  3108045525                                                                             ROLL_INVALID_MISMATCH         Y
          18      2440443365        470       2009      13361 Y Y Y N            0        470          0 ROLL_INVALID_MISMATCH         Y
                  2440443365                                                                             BIND_EQUIV_FAILURE            Y
          19      4079924956         15       2032      19773 Y Y Y N            8          7          0 ROLL_INVALID_MISMATCH         Y
                  4079924956                                                                             BIND_EQUIV_FAILURE            Y
          20       777919270         32       2675      18260 Y Y Y N           11         21          0 BIND_EQUIV_FAILURE            Y
                   777919270                                                                             ROLL_INVALID_MISMATCH         Y
          21      1428146033         63      13929     111116 Y Y Y N            0         63          0 ROLL_INVALID_MISMATCH         Y
                  1428146033                                                                             BIND_EQUIV_FAILURE            Y
          22      3815006743        218      43673     234642 Y Y Y N            0        218          0 BIND_EQUIV_FAILURE            Y
                  3815006743                                                                             ROLL_INVALID_MISMATCH         Y
          23       277802667          1         62      99268 Y Y Y N            1          0          0 BIND_EQUIV_FAILURE            Y
                   277802667                                                                             ROLL_INVALID_MISMATCH         Y
          24      3898025231          3       2364     111231 Y Y Y N            0          3          0 BIND_EQUIV_FAILURE            Y
                  3898025231                                                                             ROLL_INVALID_MISMATCH         Y
          25       767857637          2       6495     169363 Y Y Y N            0          2          0 ROLL_INVALID_MISMATCH         Y
                   767857637                                                                             BIND_EQUIV_FAILURE            Y
          26      3690167092        100       2998      20138 Y Y Y N            0        100          0 BIND_EQUIV_FAILURE            Y
                  3690167092                                                                             ROLL_INVALID_MISMATCH         Y

It’s a quick way to get the relevant information in a single result.

Off course, if you need deeper details, you should consider something more powerful like SQLd360 from Mauro Pagano.

Credits: I’ve got the unpivot idea (and copied that part of the code) from this post by Timur Akhmadeev.

—

Ludo

↧

Migrating Oracle RAC from SuSE to OEL (or RHEL) live

November 9, 2015, 2:37 pm

≫ Next: Oracle Database on ACFS: a perfect marriage?

≪ Previous: Get information about Cursor Sharing for a SQL_ID

I have a customer that needs to migrate its Oracle RAC cluster from SuSE to OEL.

I know, I know, there is a paper from Dell and Oracle named:

How Dell Migrated from SUSE Linux to Oracle Linux

That explains how Dell migrated its many RAC clusters from SuSE to OEL. The problem is that they used a different strategy:

– backup the configuration of the nodes
– then for each node, one at time
– stop the node
– reinstall the OS
– restore the configuration and the Oracle binaries
– relink
– restart

What I want to achieve instead is:
– add one OEL node to the SuSE cluster as new node
– remove one SuSE node from the now-mixed cluster
– install/restore/relink the RDBMS software (RAC) on the new node
– move the RAC instances to the new node (taking care to NOT run more than the number of licensed nodes/CPUs at any time)
– repeat (for the remaining nodes)

because the customer will also migrate to new hardware.

In order to test this migration path, I’ve set up a SINGLE NODE cluster (if it works for one node, it will for two or more).

oracle@sles01:~> crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       sles01                   STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       sles01                   STABLE
ora.asm
               ONLINE  ONLINE       sles01                   Started,STABLE
ora.net1.network
               ONLINE  ONLINE       sles01                   STABLE
ora.ons
               ONLINE  ONLINE       sles01                   STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       sles01                   STABLE
ora.cvu
      1        ONLINE  ONLINE       sles01                   STABLE
ora.oc4j
      1        OFFLINE OFFLINE                               STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       sles01                   STABLE
ora.sles01.vip
      1        ONLINE  ONLINE       sles01                   STABLE
--------------------------------------------------------------------------------
oracle@sles01:~> cat /etc/issue

Welcome to SUSE Linux Enterprise Server 11 SP4  (x86_64) - Kernel \r (\l).

I have to setup the new node addition carefully, mainly as I would do with a traditional node addition:

Add new ip addresses (public, private, vip) to the DNS/hosts
Install the new OEL server
Keep the same user and groups (uid, gid, etc)
Verify the network connectivity and setup SSH equivalence
Check that the multicast connection is ok
Add the storage, configure persistent naming (udev) and verify that the disks (major, minor, names) are the very same
The network cards also must be the very same

Once the new host ready, the cluvfy stage -pre nodeadd will likely fail due to

Kernel release mismatch
Package mismatch

Here’s an example of output:

oracle@sles01:~> cluvfy stage -pre nodeadd -n rhel01

Performing pre-checks for node addition

Checking node reachability...
Node reachability check passed from node "sles01"


Checking user equivalence...
User equivalence check passed for user "oracle"
Package existence check passed for "cvuqdisk"

Checking CRS integrity...

CRS integrity check passed

Clusterware version consistency passed.

Checking shared resources...

Checking CRS home location...
Location check passed for: "/u01/app/12.1.0/grid"
Shared resources check for node addition passed


Checking node connectivity...

Checking hosts config file...

Verification of the hosts config file successful

Check: Node connectivity using interfaces on subnet "192.168.56.0"
Node connectivity passed for subnet "192.168.56.0" with node(s) sles01,rhel01
TCP connectivity check passed for subnet "192.168.56.0"


Check: Node connectivity using interfaces on subnet "172.16.100.0"
Node connectivity passed for subnet "172.16.100.0" with node(s) rhel01,sles01
TCP connectivity check passed for subnet "172.16.100.0"

Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "192.168.56.0".
Subnet mask consistency check passed for subnet "172.16.100.0".
Subnet mask consistency check passed.

Node connectivity check passed

Checking multicast communication...

Checking subnet "172.16.100.0" for multicast communication with multicast group "224.0.0.251"...
Check of subnet "172.16.100.0" for multicast communication with multicast group "224.0.0.251" passed.

Check of multicast communication passed.
Total memory check passed
Available memory check passed
Swap space check passed
Free disk space check passed for "sles01:/usr,sles01:/var,sles01:/etc,sles01:/u01/app/12.1.0/grid,sles01:/sbin,sles01:/tmp"
Free disk space check passed for "rhel01:/usr,rhel01:/var,rhel01:/etc,rhel01:/u01/app/12.1.0/grid,rhel01:/sbin,rhel01:/tmp"
Check for multiple users with UID value 1101 passed
User existence check passed for "oracle"
Run level check passed
Hard limits check passed for "maximum open file descriptors"
Soft limits check passed for "maximum open file descriptors"
Hard limits check passed for "maximum user processes"
Soft limits check passed for "maximum user processes"
System architecture check passed

WARNING:
PRVF-7524 : Kernel version is not consistent across all the nodes.
Kernel version = "3.0.101-63-default" found on nodes: sles01.
Kernel version = "3.8.13-16.2.1.el6uek.x86_64" found on nodes: rhel01.
Kernel version check passed
Kernel parameter check passed for "semmsl"
Kernel parameter check passed for "semmns"
Kernel parameter check passed for "semopm"
Kernel parameter check passed for "semmni"
Kernel parameter check passed for "shmmax"
Kernel parameter check passed for "shmmni"
Kernel parameter check passed for "shmall"
Kernel parameter check passed for "file-max"
Kernel parameter check passed for "ip_local_port_range"
Kernel parameter check passed for "rmem_default"
Kernel parameter check passed for "rmem_max"
Kernel parameter check passed for "wmem_default"
Kernel parameter check passed for "wmem_max"
Kernel parameter check passed for "aio-max-nr"
Package existence check passed for "make"
Package existence check passed for "libaio"
Package existence check passed for "binutils"
Package existence check passed for "gcc(x86_64)"
Package existence check passed for "gcc-c++(x86_64)"
Package existence check passed for "glibc"
Package existence check passed for "glibc-devel"
Package existence check passed for "ksh"
Package existence check passed for "libaio-devel"
Package existence check failed for "libstdc++33"
Check failed on nodes:
        rhel01
Package existence check failed for "libstdc++43-devel"
Check failed on nodes:
        rhel01
Package existence check passed for "libstdc++-devel(x86_64)"
Package existence check failed for "libstdc++46"
Check failed on nodes:
        rhel01
Package existence check failed for "libgcc46"
Check failed on nodes:
        rhel01
Package existence check passed for "sysstat"
Package existence check failed for "libcap1"
Check failed on nodes:
        rhel01
Package existence check failed for "nfs-kernel-server"
Check failed on nodes:
        rhel01
Check for multiple users with UID value 0 passed
Current group ID check passed

Starting check for consistency of primary group of root user

Check for consistency of root user's primary group passed
Group existence check passed for "asmadmin"
Group existence check passed for "asmoper"
Group existence check passed for "asmdba"

Checking ASMLib configuration.
Check for ASMLib configuration passed.

Checking OCR integrity...

OCR integrity check passed

Checking Oracle Cluster Voting Disk configuration...

Oracle Cluster Voting Disk configuration check passed
Time zone consistency check passed

Starting Clock synchronization checks using Network Time Protocol(NTP)...

NTP Configuration file check started...
No NTP Daemons or Services were found to be running

Clock synchronization check using Network Time Protocol(NTP) passed


User "oracle" is not part of "root" group. Check passed
Checking integrity of file "/etc/resolv.conf" across nodes

"domain" and "search" entries do not coexist in any  "/etc/resolv.conf" file
All nodes have same "search" order defined in file "/etc/resolv.conf"
PRVF-5636 : The DNS response time for an unreachable node exceeded "15000" ms on following nodes: sles01,rhel01

Check for integrity of file "/etc/resolv.conf" failed


Checking integrity of name service switch configuration file "/etc/nsswitch.conf" ...
Check for integrity of name service switch configuration file "/etc/nsswitch.conf" passed


Pre-check for node addition was unsuccessful on all the nodes.

So the problem is not if the check succeed or not (it will not), but what fails.

Solving all the problems not related to the difference SuSE-OEL is crucial, because the addNode.sh will fail with the same errors. I need to run it using -ignorePrereqs and -ignoreSysPrereqs switches. Let’s see how it works:

oracle@sles01:/u01/app/12.1.0/grid/addnode> ./addnode.sh -silent "CLUSTER_NEW_NODES={rhel01}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={rhel01-vip}" -ignorePrereq -ignoreSysPrereqs
Starting Oracle Universal Installer...

Checking Temp space: must be greater than 120 MB. Actual 27479 MB Passed
Checking swap space: must be greater than 150 MB. Actual 2032 MB Passed

Prepare Configuration in progress.

Prepare Configuration successful.
.................................................. 9% Done.
You can find the log of this install session at:
/u01/app/oraInventory/logs/addNodeActions2015-11-09_09-57-16PM.log

Instantiate files in progress.

Instantiate files successful.
.................................................. 15% Done.

Copying files to node in progress.

Copying files to node successful.
.................................................. 79% Done.

Saving cluster inventory in progress.
.................................................. 87% Done.

Saving cluster inventory successful.
The Cluster Node Addition of /u01/app/12.1.0/grid was successful.
Please check '/tmp/silentInstall.log' for more details.

As a root user, execute the following script(s):
1. /u01/app/oraInventory/orainstRoot.sh
2. /u01/app/12.1.0/grid/root.sh

Execute /u01/app/oraInventory/orainstRoot.sh on the following nodes:
[rhel01]
Execute /u01/app/12.1.0/grid/root.sh on the following nodes:
[rhel01]

The scripts can be executed in parallel on all the nodes. If there are any policy managed databases managed by cluster, proceed with the addnode procedure without executing the root.sh script. Ensure that root.sh script is executed after all the policy managed databases managed by clusterware are extended to the new nodes.
..........
Update Inventory in progress.
.................................................. 100% Done.

Update Inventory successful.
Successfully Setup Software.

Then, as stated by the addNode.sh, I run the root.sh and I expect it to work:

[oracle@rhel01 install]$ sudo /u01/app/12.1.0/grid/root.sh
Performing root user operation for Oracle 12c

The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /u01/app/12.1.0/grid
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Relinking oracle with rac_on option
Using configuration parameter file: /u01/app/12.1.0/grid/crs/install/crsconfig_params
2015/11/09 23:18:42 CLSRSC-363: User ignored prerequisites during installation

OLR initialization - successful
2015/11/09 23:19:08 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.conf'

CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'rhel01'
CRS-2672: Attempting to start 'ora.evmd' on 'rhel01'
CRS-2676: Start of 'ora.mdnsd' on 'rhel01' succeeded
CRS-2676: Start of 'ora.evmd' on 'rhel01' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rhel01'
CRS-2676: Start of 'ora.gpnpd' on 'rhel01' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'rhel01'
CRS-2676: Start of 'ora.gipcd' on 'rhel01' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rhel01'
CRS-2676: Start of 'ora.cssdmonitor' on 'rhel01' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rhel01'
CRS-2672: Attempting to start 'ora.diskmon' on 'rhel01'
CRS-2676: Start of 'ora.diskmon' on 'rhel01' succeeded
CRS-2789: Cannot stop resource 'ora.diskmon' as it is not running on server 'rhel01'
CRS-2676: Start of 'ora.cssd' on 'rhel01' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rhel01'
CRS-2672: Attempting to start 'ora.ctssd' on 'rhel01'
CRS-2676: Start of 'ora.ctssd' on 'rhel01' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rhel01' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rhel01'
CRS-2676: Start of 'ora.asm' on 'rhel01' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'rhel01'
CRS-2676: Start of 'ora.storage' on 'rhel01' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'rhel01'
CRS-2676: Start of 'ora.crsd' on 'rhel01' succeeded
CRS-6017: Processing resource auto-start for servers: rhel01
CRS-2672: Attempting to start 'ora.ons' on 'rhel01'
CRS-2676: Start of 'ora.ons' on 'rhel01' succeeded
CRS-6016: Resource auto-start has completed for server rhel01
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.
2015/11/09 23:22:06 CLSRSC-343: Successfully started Oracle clusterware stack

clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 12c Release 1.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Preparing packages for installation...
cvuqdisk-1.0.9-1
2015/11/09 23:22:23 CLSRSC-325: Configure Oracle Grid Infrastructure for a Cluster ... succeeded

Bingo! Let’s check if everything is up and running:

[oracle@rhel01 ~]$ /u01/app/12.1.0/grid/bin/crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       rhel01                   STABLE
               ONLINE  ONLINE       sles01                   STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       rhel01                   STABLE
               ONLINE  ONLINE       sles01                   STABLE
ora.asm
               ONLINE  ONLINE       rhel01                   Started,STABLE
               ONLINE  ONLINE       sles01                   Started,STABLE
ora.net1.network
               ONLINE  ONLINE       rhel01                   STABLE
               ONLINE  ONLINE       sles01                   STABLE
ora.ons
               ONLINE  ONLINE       rhel01                   STABLE
               ONLINE  ONLINE       sles01                   STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       sles01                   STABLE
ora.cvu
      1        ONLINE  ONLINE       sles01                   STABLE
ora.oc4j
      1        OFFLINE OFFLINE                               STABLE
ora.rhel01.vip
      1        ONLINE  ONLINE       rhel01                   STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       sles01                   STABLE
ora.sles01.vip
      1        ONLINE  ONLINE       sles01                   STABLE
--------------------------------------------------------------------------------

[oracle@rhel01 ~]$ olsnodes -s
sles01  Active
rhel01  Active

[oracle@rhel01 ~]$ ssh rhel01 uname -r
3.8.13-16.2.1.el6uek.x86_64
[oracle@rhel01 ~]$ ssh sles01 uname -r
3.0.101-63-default

[oracle@rhel01 ~]$ ssh rhel01 cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.5 (Santiago)
[oracle@rhel01 ~]$ ssh sles01 cat /etc/issue
Welcome to SUSE Linux Enterprise Server 11 SP4  (x86_64) - Kernel \r (\l).

So yes, it works, but remember that it’s not a supported long-term configuration.

In my case I expect to migrate the whole cluster from SLES to OEL in one day.

NOTE: using OEL6 as new target is easy because the interface names do not change. The new OEL7 interface naming changes, if you need to migrate without cluster downtime you need to setup the new OEL7 nodes following this post: http://ask.xmodulo.com/change-network-interface-name-centos7.html

Otherwise, you need to configure a new interface name for the cluster with oifcfg.

HTH

—

Ludovico

↧

Oracle Database on ACFS: a perfect marriage?

November 26, 2015, 1:20 pm

≫ Next: Oracle Active Data Guard and Global Data Services in Action!

≪ Previous: Migrating Oracle RAC from SuSE to OEL (or RHEL) live

This presentation has had a very poor score in selections for conferences (no OOW, no DOAG, no UKOUG) but people liked it very much at Paris Oracle Meetup. The Database on ACFS is mainstream now, thanks to the new ODA releases. Having some knowledge about why and how you should run (not) Databases on ACFS is definitely worth a read.

Oracle Database on ACFS: a perfect marriage? from Ludovico Caldara

Comments are, as always, very appreciated :-)

—

Ludo

↧

Oracle Active Data Guard and Global Data Services in Action!

December 4, 2015, 6:21 am

≫ Next: Rapid Home Provisioning

≪ Previous: Oracle Database on ACFS: a perfect marriage?

In a few days I will give a presentation at UKOUG Tech15 about Global Data Services, it will be the first time that I present this session.

I usually like to give the link to the material to my audience, so here we go:

Credits

I have to give special credits to my colleague Robert Bialek. I’ve got a late confirmation for this session and my slide deck was not ready at all, so I have used a big part of his original work. Most of the content included in the slides has been created by Robert, not me. (Thank you for your help! :-))

Slides

Demo recording

Demo script

clear

function db {
export ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1
export PATH=$ORACLE_HOME/bin:$ORACLE_HOME/OPatch:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin
}

function gsm {
export ORACLE_HOME=/u01/app/oracle/product/12.1.0/gsmhome_1
export PATH=$ORACLE_HOME/bin:$ORACLE_HOME/OPatch:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin
}

db

echo "#### CURRENT CONFIGURATION: CLASSIC DATA GUARD, 3 DATABASES ####"
dgmgrl -echo sys/password1@oltp_de <<EOF
show configuration
EOF
echo "next: GSM config"
read -p ""

gsm
echo "#### GSM CONFIGURATION ####"
echo "GDS COMMAND:
config"
gdsctl <<EOF
connect gsm_admin/password1@gsm1
config
exit
EOF
echo "next: ADD GDSPOOL"
read -p ""


echo "#### ADD GDSPOOL ####"
echo "GDS COMMAND:
add gdspool -gdspool sales"
gdsctl <<EOF
connect gsm_admin/password1@gsm1
add gdspool -gdspool sales
exit
EOF
echo "next: ADD BROKERCONFIG"
read -p ""


echo "#### ADD BROKERCONFIG ####"
echo "GDS COMMAND:
add brokerconfig -connect gsm02.trivadistraining.com:1521/oltp_de -pwd password1 -gdspool sales -region germany"
gdsctl <<EOF
connect gsm_admin/password1@gsm1
add brokerconfig -connect gsm02.trivadistraining.com:1521/oltp_de -pwd password1 -gdspool sales -region germany
exit
EOF
echo "next: config databases"
read -p ""


echo "#### CONFIG DATABASES ####"
echo "GDS COMMAND:
config database"
gdsctl <<EOF
connect gsm_admin/password1@gsm1
config database
exit
EOF
echo "next: modify databases"
read -p ""

echo "#### MODIFY DATABASES ####"
echo "GDS COMMAND: 
modify database -database oltp_ch1 -region switzerland
modify database -database oltp_ch2 -region switzerland
"
gdsctl <<EOF
connect gsm_admin/password1@gsm1
modify database -database oltp_ch1 -region switzerland
modify database -database oltp_ch2 -region switzerland
config database
exit
EOF
echo "next: add service read/write"
read -p ""


echo "#### ADD SERVICE R/W ####"
echo "GDS COMMAND: 
add service -gdspool sales -service gsales_rw -role primary -preferred_all -failovertype SELECT -failovermethod BASIC -failoverretry 5 -failoverdelay 3 -locality LOCAL_ONLY -region_failover
start service -service gsales_rw
services"
gdsctl <<EOF
connect gsm_admin/password1@gsm1
add service -gdspool sales -service gsales_rw -role primary -preferred_all -failovertype SELECT -failovermethod BASIC -failoverretry 5 -failoverdelay 3 -locality LOCAL_ONLY -region_failover
start service -service gsales_rw
services
exit
EOF
echo "next: ADD SERVICE R/O"
read -p ""

echo "#### ADD SERVICE R/O ####"
echo "GDS COMMAND: 
add service -gdspool sales -service gsales_ro -role PHYSICAL_STANDBY -failover_primary -lag 20 -preferred_all -failovertype SELECT -failovermethod BASIC -failoverretry 5 -failoverdelay 3 -locality LOCAL_ONLY -region_failover
start service -service gsales_ro
services
"
gdsctl <<EOF
connect gsm_admin/password1@gsm1
add service -gdspool sales -service gsales_ro -role PHYSICAL_STANDBY -failover_primary -lag 20 -preferred_all -failovertype SELECT -failovermethod BASIC -failoverretry 5 -failoverdelay 3 -locality LOCAL_ONLY -region_failover
start service -service gsales_ro
services
exit
EOF
echo "next: stop apply ch1 (run cli_ro_short.sh first)"
read -p ""

db
echo "#### STOP APPLY DATA GUARD ON OLTP_CH1 ####"
dgmgrl -echo sys/password1@oltp_de <<EOF
edit database oltp_ch1 set state='apply-off';
EOF
echo "next: gds services"
read -p ""


gsm
echo "#### GDS SERVICES ####"
echo "GDS COMMAND: 
services
"
gdsctl <<EOF
connect gsm_admin/password1@gsm1
services
exit
EOF
echo "next: stop apply ch2 (run cli_ro_short.sh first)"
read -p ""

db
echo "#### STOP APPLY DATA GUARD ON OLTP_CH2 ####"
dgmgrl -echo sys/password1@oltp_de <<EOF
edit database oltp_ch2 set state='apply-off';
EOF
echo "next: gds services"
read -p ""

gsm
echo "#### GDS SERVICES ####"
echo "GDS COMMAND: 
services
"
gdsctl <<EOF
connect gsm_admin/password1@gsm1
services
exit
EOF
echo "next: gds services"
read -p ""

gsm
echo "#### GDS SERVICES ####"
echo "GDS COMMAND: 
services
"
gdsctl <<EOF
connect gsm_admin/password1@gsm1
services
exit
EOF
echo "next: start apply ch1  and ch2"
read -p ""

db
echo "#### START APPLY DATA GUARD ON OLTP_CH1 and OLTP_CH2 ####"
dgmgrl -echo sys/password1@oltp_de <<EOF
edit database oltp_ch1 set state='apply-on';
EOF
echo "sleeping 5"
sleep 5
dgmgrl -echo sys/password1@oltp_de <<EOF
edit database oltp_ch2 set state='apply-on';
EOF
echo "next: gds services"
read -p ""

gsm
echo "#### GDS SERVICES ####"
echo "GDS COMMAND: 
services
"
gdsctl <<EOF
connect gsm_admin/password1@gsm1
services
exit
EOF
echo "next: gds services"
read -p ""

gsm
echo "#### GDS SERVICES ####"
echo "GDS COMMAND: 
services
"
gdsctl <<EOF
connect gsm_admin/password1@gsm1
services
exit
EOF
echo "next: switchover to CH1 (run cli_ro_long.sh and cli_rw_long.sh first)"
read -p ""


db
echo "#### VALIDATE DATABASE OLTP_CH1 ####"
dgmgrl -echo sys/password1@oltp_de <<EOF
validate database oltp_ch1;
EOF
echo "next: switchover"
read -p ""
echo "#### SWITCHOVER TO OLTP_CH1 ####"
dgmgrl -echo sys/password1@oltp_de <<EOF
switchover to oltp_ch1;
EOF
echo "next: gds services"
read -p ""

And the script to revert the demo:

clear

function db {
export ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1
export PATH=$ORACLE_HOME/bin:$ORACLE_HOME/OPatch:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin
}

function gsm {
export ORACLE_HOME=/u01/app/oracle/product/12.1.0/gsmhome_1
export PATH=$ORACLE_HOME/bin:$ORACLE_HOME/OPatch:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin
}

db
dgmgrl -echo sys/password1@oltp_de <<EOF
switchover to oltp_de;
EOF

gsm
echo "#### STOP and DELETE SERVICE, REMOVE BROKERCONFIG, REMOVE POOL ####"
 
gdsctl <<EOF
connect gsm_admin/password1@gsm1
stop service -service gsales_ro
stop service -service gsales_rw
remove service -service gsales_ro
remove service -service gsales_rw
remove brokerconfig
remove gdspool -gdspool sales
config
exit
EOF

db
dgmgrl -echo sys/password1@oltp_de <<EOF
show configuration
EOF

echo "DEMO reverted."
read -p ""

Cheers

—

Ludovico

↧

Rapid Home Provisioning

December 4, 2015, 6:22 am

≫ Next: Recording of “Rapid Home Provisioning” webinar for the RAC SIG

≪ Previous: Oracle Active Data Guard and Global Data Services in Action!

In a few days I will give a presentation at UKOUG Tech15 about Rapid Home Provisioning, it will be the first time that I present this session in public.

I usually like to give the link to the material to my audience, so here we go:

Slides:

Demo:

Enjoy
—
Ludovico

↧

Recording of “Rapid Home Provisioning” webinar for the RAC SIG

January 13, 2016, 11:33 pm

≫ Next: Configuring the MySQL Database Plug-In for Oracle Enterprise Manager 12c

≪ Previous: Rapid Home Provisioning

Yesterday I have presented the Oracle Rapid Home Provisioning technology for the RAC SIG, you can find the recording on YouTube:

Cheers

—

Ludo

↧

Configuring the MySQL Database Plug-In for Oracle Enterprise Manager 12c

January 31, 2016, 1:07 am

≫ Next: 2015 in numbers…

≪ Previous: Recording of “Rapid Home Provisioning” webinar for the RAC SIG

I have blogged in the past about MySQL Enterprise Monitor 3.0 and I was quite happy at the very beginning, but after a while I have to admit that I was missing many of the Oracle Enterprise Manager 12c features.

In particular, MEM 3.0 does not have a usable database. In MEM all the tables are crypted and it is not possible to list, for example, all the targets monitored, nor it is possible via API or REST web services because MEM 3.0 lacks these features.

What makes EM12c a GREAT product comparing to MEM, are many features like blackouts, a usable command line interface (emcli), integrated reporting, scheduler, automatic groups… the list would be just huge.

Luckily, Oracle has officially released a MySQL plugin for EM12c, provided that the EM is at least in version 12.1.0.4.

So I’ve upgraded (a while ago) my customer’s EM12c to 12.1.0.5. and decided to try the plugin.

The first step is to download the last version of plugin for MySQL.

I can verify that you have the last version by going to

Setup -> Extensibility -> Self-Update -> Plugins:

The agent has been downloaded, but in order to make it available on the targets, I first need to deploy it on the management servers (2 OMSes in my case):

Check the plugin name and version:

Verify the prerequisites check (here I have one column per OMS):
2015-08-19 13_55_26-Deploy Plug-ins on Management Servers

Specify the credentials for the Management Repository:
Execute the deploy:

If everything went OK, I’m able to check the status of the deployment:

Now I can see that the plugin is correctly deployed on the OMSes, I can do the same for the agents:

I must select one by one the agents that run on the hosts where I have MySQL running. I may select all agents as well, but it’s better to be neat…

again, there are prerequisite checks and confirmations:

The plugin deployment went well:

Now I can run the target discovery on the agent:

But the discovery does not find my MySQL targets. What went wrong?

Each agent has a default list of “discovery modules” used for the discovery, but by default the MySQL one is not enabled after I install the plugin:

so it is necessary to activate it and deactivate the discovery modules I do not need:

Tada! at the next discovery, I have my target available:

The target name is automatically set to hostname:mysqlport:

as all discovered targets, I need to promote it to have it available for monitoring with EM12c:

The target is available, now I can use most of the EM12c features to monitor my MySQL environment.

HTH

—

Ludovico

↧

2015 in numbers…

January 31, 2016, 1:27 am

≫ Next: Getting the DBID and Incarnation from the RMAN Catalog

≪ Previous: Configuring the MySQL Database Plug-In for Oracle Enterprise Manager 12c

I am spending good times since I have moved in Switzerland 3 years ago. 2015 has been as good as 2014. Now that January ends, I am officially late in publishing this information, but it’s MY blog, so who cares?

A few numbers:

25 blog posts (+1)
~52000 page views (+40%), ~43000 visits (+48%)
Speaker at 2 major conferences (#C15LV, #UKOUG_TECH15)
Speaker at 2 Trivadis internal conferences
Speaker at 1 local user group event
Delegate of EOUC at DOAG 2015
A total of 14 public speeches (same as 2014)
I’ve been elected RAC SIG Vice President
2 RAC Attack workshops organized
1 roundtable as organizer, 1 panel
2 T-shirt designed as gifts for 2 RAC Attack workshops
2 articles published
Launched the RAC SIG website and the new session agenda
Countless new friends and/or contacts

I hope that 2016 will be as good as 2015

↧

Getting the DBID and Incarnation from the RMAN Catalog

February 15, 2016, 1:27 am

≫ Next: How cold incremental recovery saved me once

≪ Previous: 2015 in numbers…

Using the RMAN catalog is an option. There is a long discussion between DBAs on whether should you use the catalog or not.

But because I like (a lot) the RMAN catalog and I generally use it, I assume that most of you do it

When you want to restore from the RMAN catalog, you need to get the DBID of the database you want to restore and, sometimes, also the incarnation key.

The DBID is used to identify the database you want to restore. The DBID is different for every newly created / duplicated database, but beware that if you duplicate your database manually (using restore/recover), you actually need to change your DBID using the nid tool, otherwise you will end up by having more than one database registered in the catalog with the very same DBID. This is evil! The DB_NAME is also something that you may want to make sure is unique within your database farm.

The Incarnation Key changes whenever you do an “open resetlogs”, following for example a flashback database, an incomplete recovery, or just a “open resetlogs” without any specific need.

In the image, you can see that you may want to restore to a point in time after the open resetlogs (blue incarnation) or before it (red incarnation). Depending on which one you need to restore, you may need to use the command RESET DATABASE TO INCARNATION.

https://docs.oracle.com/database/121/RCMRF/rcmsynta2007.htm#RCMRF148

If you have a dynamic and big environment, you probably script your restores procedures, that’s why getting the DBID and incarnation key using the RMAN commands may be more complex than just querying the catalog using sqlplus.

How do I get the history of my database incarnations?

You can get it easily for all your databases using the handy hierarchical queries on the RMAN catalog (db names and ids are obfuscated for obvious reasons):

SQL> SELECT lpad(' ',2*(level-1))
  || TO_CHAR(DBINC_KEY) AS DBINC_KEY,
  db_key,
  db_name,
  TO_CHAR(reset_time,'YYYY-MM-DD HH24:MI:SS'),
  dbinc_status
FROM rman.dbinc
  START WITH PARENT_DBINC_KEY IS NULL
  CONNECT BY prior DBINC_KEY   = PARENT_DBINC_KEY ;

DBINC_KEY                     DB_KEY DB_NAME    TO_CHAR(RESET_TIME, DBINC_ST
------------------------- ---------- ---------- ------------------- --------
356247416                  356247380 A9EE272A   2011-09-24 18:22:58 PARENT
  356247387                356247380 A9EE272A   2012-10-24 08:41:41 PARENT
    1149458631             356247380 A9EE272A   2014-10-10 08:30:57 CURRENT
360319357                  360319322 F5FD787F   2011-10-14 15:39:19 PARENT
  360319323                360319322 F5FD787F   2012-11-08 18:57:26 PARENT
    547928008              360319322 F5FD787F   2013-09-10 10:57:44 PARENT
      576592237            360319322 F5FD787F   2013-11-20 14:54:05 ORPHAN
      576613820            360319322 F5FD787F   2013-11-20 15:57:03 ORPHAN
      584503796            360319322 F5FD787F   2013-11-27 13:57:53 CURRENT
364099232                  364099231 25E64A7F   2012-11-20 08:01:49 PARENT
  415031968                364099231 25E64A7F   2013-02-15 12:16:15 PARENT
    456099512              364099231 25E64A7F   2013-05-03 12:19:52 CURRENT
366065362                  366065336 3AE45141   2011-09-24 18:22:58 PARENT
  366065337                366065336 3AE45141   2012-11-26 17:14:14 CURRENT
394067322                  394067321 C34FFA7E   2013-01-10 17:18:11 CURRENT
402469086                  402469073 D164DDB8   2011-09-24 18:22:58 PARENT
  402469074                402469073 D164DDB8   2013-01-29 11:20:19 CURRENT
410147332                  410147283 27984513   2011-09-24 18:22:58 PARENT
  410147284                410147283 27984513   2013-02-08 11:12:38 CURRENT
...
...

What about getting the correct DBID/DBINC_KEY pair for a specific database/time?

You can get the time windows for each incarnation using the lead() analytical function:

SQL> WITH dbids AS
  (SELECT TO_CHAR(dbinc.DBINC_KEY) AS DBINC_KEY,
    dbinc.db_key,
    dbinc.db_name,
    dbinc.reset_time,
    dbinc.dbinc_status,
    db.db_id
  FROM rman.dbinc dbinc
  JOIN rman.db db
  ON ( 
  dbinc.db_key   =db.db_key)
  )
select * from (
SELECT DBINC_KEY,
  db_name,
  db_id,
  reset_time,
  nvl(lead (reset_time) over (partition BY db_name order by reset_time),sysdate) AS next_reset
FROM dbids
)
ORDER BY db_name ,
  reset_time ;  

DBINC_KEY                 DBNAME          DB_ID RESET_TIME          NEXTRESET
------------------------- ---------- ---------- ------------------- -------------------
1173852671                1DF63C30   2507085371 2014-07-07 05:38:47 2015-01-16 07:29:01
1173852635                1DF63C30   2507085371 2015-01-16 07:29:01 2015-02-27 16:25:13
1244346785                1DF63C30   2531796824 2015-02-27 16:25:13 2015-02-27 16:25:13
1281775847                1DF63C30   2541221473 2015-02-27 16:25:13 2015-02-27 16:25:13
1233975755                1DF63C30   2528008262 2015-02-27 16:25:13 2015-02-27 16:25:13
1220896058                1DF63C30   2523244390 2015-02-27 16:25:13 2015-03-16 16:06:00
1188550385                1DF63C30   2507085371 2015-03-16 16:06:00 2015-07-17 08:06:00
1220896028                1DF63C30   2523244390 2015-07-17 08:06:00 2015-09-10 11:23:53
1233975725                1DF63C30   2528008262 2015-09-10 11:23:53 2015-10-23 07:46:34
1244346755                1DF63C30   2531796824 2015-10-23 07:46:34 2016-02-08 09:44:03
1281775817                1DF63C30   2541221473 2016-02-08 09:44:03 2016-02-15 10:13:49
1201139592                1D0776F6   2025503263 2014-07-07 05:38:47 2015-05-04 17:08:50
1201139578                1D0776F6   2025503263 2015-05-04 17:08:50 2015-06-02 08:48:07
1213295265                1D0776F6   2029287211 2015-06-02 08:48:07 2015-06-02 08:48:07
1256000477                1D0776F6   2044568865 2015-06-02 08:48:07 2015-06-02 08:48:07
1235940868                1D0776F6   2037421528 2015-06-02 08:48:07 2015-06-17 12:14:38
1213295230                1D0776F6   2029287211 2015-06-17 12:14:38 2015-09-18 15:46:34
1235940852                1D0776F6   2037421528 2015-09-18 15:46:34 2015-12-08 09:08:52
1256000461                1D0776F6   2044568865 2015-12-08 09:08:52 2016-02-15 10:13:49
1173653066                2D828C2C   1656607497 2014-07-07 05:38:47 2015-01-15 14:06:04
1173653052                2D828C2C   1656607497 2015-01-15 14:06:04 2015-06-02 08:48:07
1247872446                2D828C2C   1682603029 2015-06-02 08:48:07 2015-06-02 08:48:07
1218354231                2D828C2C   1671898993 2015-06-02 08:48:07 2015-06-02 08:48:07
1278227063                2D828C2C   1690479985 2015-06-02 08:48:07 2015-06-02 08:48:07
1219084145                2D828C2C   1672155073 2015-06-02 08:48:07 2015-06-02 08:48:07
1228714578                2D828C2C   1675699280 2015-06-02 08:48:07 2015-06-02 08:48:07
1211451469                2D828C2C   1669565762 2015-06-02 08:48:07 2015-06-02 08:48:07
1235422982                2D828C2C   1678113471 2015-06-02 08:48:07 2015-06-02 08:48:07
1228713810                2D828C2C   1675697673 2015-06-02 08:48:07 2015-06-02 08:48:07
1240749487                2D828C2C   1680107003 2015-06-02 08:48:07 2015-06-02 08:48:07
1255743496                2D828C2C   1685361979 2015-06-02 08:48:07 2015-06-10 13:37:08
1211451453                2D828C2C   1669565762 2015-06-10 13:37:08 2015-07-06 13:44:20
1218354215                2D828C2C   1671898993 2015-07-06 13:44:20 2015-07-09 12:52:19
1219084129                2D828C2C   1672155073 2015-07-09 12:52:19 2015-08-19 12:55:40
1228713794                2D828C2C   1675697673 2015-08-19 12:55:40 2015-08-19 13:22:27
1228714562                2D828C2C   1675699280 2015-08-19 13:22:27 2015-09-16 11:58:58
1235422966                2D828C2C   1678113471 2015-09-16 11:58:58 2015-10-08 13:44:29
1240749471                2D828C2C   1680107003 2015-10-08 13:44:29 2015-11-06 11:04:55
1247872430                2D828C2C   1682603029 2015-11-06 11:04:55 2015-12-07 09:27:27
1255743480                2D828C2C   1685361979 2015-12-07 09:27:27 2016-02-04 15:07:29
1278227047                2D828C2C   1690479985 2016-02-04 15:07:29 2016-02-15 10:13:49

With this query, you can see that every incarnation has a reset time and a “next reset time”.

It’s easy then to get exactly what you need by adding a couple of where clauses:

SQL> WITH dbids AS
  (SELECT TO_CHAR(dbinc.DBINC_KEY) AS DBINC_KEY,
    dbinc.db_key,
    dbinc.db_name,
    dbinc.reset_time,
    dbinc.dbinc_status,
    db.db_id
  FROM rman.dbinc dbinc
  JOIN rman.db db
  ON ( --dbinc.dbinc_key=db.CURR_DBINC_KEY
    --AND
    dbinc.db_key =db.db_key)
  )
SELECT *
FROM
  (SELECT DBINC_KEY,
    db_name,
    db_id,
    reset_time,
    NVL(lead (reset_time) over (partition BY db_name order by reset_time),sysdate) AS next_reset
  FROM dbids
  )
WHERE TO_DATE ('2016-01-20 00:00:00','YYYY-MM-DD HH24:MI;SS') BETWEEN reset_time AND next_reset
AND db_name='1465419F'
ORDER BY db_name ,
  reset_time ; 

DBINC_KEY                 DB_NAME         DB_ID RESET_TIME          NEXT_RESET
------------------------- ---------- ---------- ------------------- -------------------
1256014297                1465419F   1048383773 2015-12-08 11:03:55 2016-02-08 07:55:05

So, if I need to restore the database 1465419F until time 2016-01-20 00:00:00, i need to set DBID=1048383773 and reset the database to incarnation 1256014297.

Cheers

—

Ludo

↧

How cold incremental recovery saved me once

March 15, 2016, 4:53 am

≫ Next: Bash tips & tricks [ep. 1]: Deal with personal accounts and file permissions

≪ Previous: Getting the DBID and Incarnation from the RMAN Catalog

UPDATE: In the original version I was missing a few keywords: “incremental level 0″ for the base backup and “resetlogs” at the database open. Thanks Gregorz for your comments.

Sorry for this “memories” post, but the technical solution at the end is worth the read, I hope

Back in 2010, I was in charge of a quite complex project and faced some difficulties that led me to recover a database in a different manner. A few years have passed, but I used again the same procedure many times with full satisfaction… I think it’s worth to publish it now.

But first, let me introduce the project details and the problem.

Scope of the project

Transport a >1TB RAC database from AIX 5 on P6 to AIX 6 on P7, from a third-party datacenter in southern Italy to our main datacenter in northern Italy.
The Database featured >1000 datafiles and a huge table (800GB) partitioned by range and sub-partitioned by list (or the opposite, can’t remember).

Challenges

For budget containement, the project owner asked to avoid the use of HACMP (and thus, avoid the use of shared JFS2). I decided then to take the risk and migrate from JFS2 to ASM.

In order to avoid a few platform-related ASM bugs, I also had to upgrade from Oracle 10.2.0.3 to Oracle 10.2.0.4.

Constraints

I had no access to the source database that was 800km far from our datacenter, and I was granted only to ask for RMAN backups.

The total time of service disruption accepted was quite short (<30 minutes) considering the size and the distance of the database, and there was no direct connectivity between the sites (for political reasons).

Globally, the network throughput for sharing files over ftp was very poor.

First solution

This kind of move was very common to me, and because I was not grated to ask for a temporary Data Guard configuration, the easy solution for me was to ask:

1 – one RMAN ONLINE full backup physically sent on disk

2 – many RMAN archive backups sent over network (via ftp)

Then, on my side, restore the full backup, recover the archives sent over time and, at the date X, ask a final archive backup, ask to close the db and send the online redo logs to do a complete recovery on my side, then startup open upgrade.

Problem

I did a first “dry run” open resetlogs in order to test the procedure and make it faster, and also asked to test the application pointing to the destination database.

The very bad surprise was that the source database was doing a huge amount of nologging inserts leading to monster index corruptions after the recovery on the destination database.

ORA-26040: Data block was loaded using the NOLOGGING option

According to the current database maintainer, setting the force logging on the source database was NOT an option because the SAN was not able to cope with the high redo rates.

Solution

By knowing the Oracle recovery mechanisms, I have proposed to the remote maintainer to change the recovery strategy, despite this solution was not clearly stated in the Oracle documentation:

1 – Take a first online incremental backup from the begin scn of the base full backup (thank God block change tracking was in place) and send it physically over disk

2 – Take other smaller online incremental backups, send them over ftp and apply them on the destination with “noredo”

3 – At the date X, shutdown the source, mount it and take a last incremental in mount state

4 – recover noredo the last incremental and open resetlogs the database.

According to the documentation, the “cold incremental strategy” applies if you take “cold full backups”. But from a technical point of view, taking a cold incremental and recovering it on top of a fuzzy online backup this is 100% equivalent of taking a full consistent backup in mount state.
Because all the blocks are consistent to a specific SCN, there are no fuzzy datafiles: they are recovered from incremental taken from a mounted database! This allows to do incremental recovery and open the databases without applying any single archived log and by shutting down the database only once.

Technical steps

First, take a full ONLINE backup on the source:

-- SOURCE
SQL> alter database backup controlfile to '/tmp/source/ludo.cf' reuse;

Database altered.

SQL> exit
$ rman target /
RMAN> backup incremental level 0 database as compressed backupset format '/tmp/source/%U';

# SOURCE
scp -rp /tmp/source/ destsrv:/tmp/dest/
ludo.cf              100% |*************************************| 40944 KB    00:00
...

Then restore it on the destination (with no recovery):

# DEST
RMAN> restore controlfile from '/tmp/ludo.cf';

Starting restore at 11-AUG-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1058 device type=DISK

channel ORA_DISK_1: copied control file copy
output file name=/.../control01.ctl
output file name=/.../control02.ctl
Finished restore at 11-AUG-15

RMAN> alter database mount;

Statement processed
released channel: ORA_DISK_1

RMAN> catalog start with '/tmp/dest/';
...
RMAN> run
2> {
3> set newname for database to '+DATA';
4>
5> restore database;
6> }
...
Finished restore at 11-AUG-15
RMAN>

Then, run a COLD incremental backup on the source:

-- SOURCE
SQL> shutdown immediate;
...
ORACLE instance shut down.

SQL> startup mount
ORACLE instance started.
...
Database mounted.
SQL> exit
$ rman target /
RMAN>  BACKUP AS COMPRESSED BACKUPSET INCREMENTAL LEVEL 1 
2> CUMULATIVE DATABASE format '/tmp/source/incr%U';
...
Finished backup at 11-AUG-15
RMAN> exit
$ scp -rp /tmp/source/incr* destsrv:/tmp/dest/

And run the incremental recovery on the source (without redo):

# DEST
RMAN> catalog start with '/tmp/dest/incr';
...
RMAN> run {
2> recover database noredo;
3> }
...
channel ORA_DISK_1: starting incremental datafile backup set restore
...
Finished recover at 11-AUG-15
RMAN> exit
$ sqlplus / as sysdba
...
SQL> alter database disable block change tracking;
Database altered.
SQL> alter database flashback off;
Database altered.
SQL> alter database flashback on;
Database altered.
SQL> create restore point PREUPG guarantee flashback database;
Restore point created.
SQL> -- open resetlogs can be avoided if I copy the online redo logs
SQL> alter database open resetlogs upgrade;
Database altered.
...
-- run catupgrd here

That’s all!

This solution gave me the opportunity to move physically the whole >1TB nologging database from one region to another one with a minimal service disruption and without touching at all the source database.

I used it many times later on, even for bigger databases and on several platforms (yes, also Windows, sigh), it works like a charm.

HTH

—

Ludovico

↧

Bash tips & tricks [ep. 1]: Deal with personal accounts and file permissions

March 16, 2016, 1:51 am

≫ Next: Bash tips & tricks [ep. 2]: Have a smart environment for personal accounts

≪ Previous: How cold incremental recovery saved me once

This is the first episode of a mini series of Bash tips for Linux (in case you are wondering, yes, they are respectively my favorite shell and my favorite OS ).

Episode 1: Deal with personal accounts and file permissions
Episode 2: Have a smart environment for personal accounts

Description:

Nowadays it is mandatory at many companies to log in on Linux servers with a personal account (either integrated with LDAP, kerberos or whatelse) to comply with strict auditing rules.

I need to be sure that I have an environment where my modifications do not conflict with my colleagues environment.

BAD:

-bash-4.1$ id
uid=20928(ludo) gid=200(dba) groups=200(dba)
-bash-4.1$ ls -lia
total 8
8196 drwxrwxr-x   2 oracle dba  4096 Mar 15 15:14 .
   2 drwxrwxrwt. 14 root   root 4096 Mar 15 15:15 ..
-bash-4.1$ vi script.sh
... edit here...
-bash-4.1$ ls -l
total 4
-rw-r--r-- 1 ludo  dba 8 Mar 15 15:15 script.sh
-bash-4.1$

the script has been created by me, but my colleagues may need to modify it! So I need to change the ownership:

$ chown oracle:dba script.sh
chown: changing ownership of `script.sh': Operation not permitted
$

But I can only change the permissions:

$ chmod 775 script.sh
$

If I really want to change the owner, I have to ask to someone that has root privileges or delete the file with my account and create it with the correct one (oracle or something else).

GOOD:

Set the setgid bit at the directory level
Define an alias for my favorite editor that use sudoedit instead:

$ chmod 2751 .
$ ls -lia
total 4
8196 drwxr-s--x 2 oracle dba  4096 Mar 15 15:26 .
$ alias vi='SUDO_EDITOR=/usr/bin/vim sudoedit -u oracle '
$ vi script.sh
[sudo] password for ludo:
... edit here ...
$ ls -l script.sh
total 8
-rw-r--r-- 1 oracle dba 6 Mar 15 15:24 script.sh
$

In case I need to modify other files with MY account, I can either use the full path (/usr/bin/vim) or define another alias:

alias vime="/usr/bin/vim"

↧

Bash tips & tricks [ep. 2]: Have a smart environment for personal accounts

March 17, 2016, 12:25 am

≫ Next: Bash tips & tricks [ep. 3]: Colour your terminal!

≪ Previous: Bash tips & tricks [ep. 1]: Deal with personal accounts and file permissions

This is the second epidose of a small series.

Description:

The main technical account (oracle here) usually has the smart environment, with aliases, scripts avilable at fingertips, correct environment variables and functions.

When working with personal accounts, it may be boring to set the new environment at each login, copy it from a golden copy or reinvent the wheel everytime.

BAD:

Login: ludo
Password:

-bash-4.1$  env
HOSTNAME=testsrv
TERM=xterm
SHELL=/bin/bash
SSH_CLIENT=w.x.y.z 65373 22
OLDPWD=/home/ludo
SSH_TTY=/dev/pts/0
USER=ludo
LS_COLORS=...
MAIL=/var/spool/mail/ludo
PATH=/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin
PWD=/home/ludo
LANG=en_US.UTF-8
HISTCONTROL=ignoredups
SHLVL=1
HOME=/home/ludo
LOGNAME=ludo
LESSOPEN=||/usr/bin/lesspipe.sh %s
_=/bin/env

-bash-4.1$ typeset -f | grep '()'
_module ()
    COMPREPLY=();
_module_avail ()
_module_long_arg_list ()
_module_not_yet_loaded ()
module ()

-bash-4.1$ vi .bash_profile
... damn, let's make this environment smarter
...

GOOD:

Distribute a standard .bash_profile that calls a central profile script valid for all the users:

# [ ludo@testsrv:/home/ludo [15:53:18] [12.1.0.2 env:orcl12c] 0 ] #
# cat .bash_profile
# .bash_profile

#################################################
# WARNING: This script is controlled by puppet.
# If you need to override or add something
# please use ~/.bash_profile_local
#################################################

if [ -f ~/.bashrc ]; then
    . ~/.bashrc
fi

# load oracle common environment
. /u01/app/oracle/scripts/sbin/ora_profile

[ -f $HOME/.bash_profile_local ] && . $HOME/.bash_profile_local

# [ ludo@testsrv:/home/ludo [15:53:21] [12.1.0.2 env:orcl12c] 0 ] #
#

Make your common environment as smart as possible. If any commands need to be run differently depending on the user (oracle or not oracle), just use a simple if:

if [ $USER != "oracle" ] ; then
        alias vioratab='sudoedit -u oracle $ORATAB'
else
        alias vioratab='vi $ORATAB'
fi

The goal of course is to avoid as many types as you can, and let all your colleagues profit of the smart environment.

↧

Bash tips & tricks [ep. 3]: Colour your terminal!

March 18, 2016, 1:44 am

≫ Next: Bash tips & tricks [ep. 4]: Use logging levels

≪ Previous: Bash tips & tricks [ep. 2]: Have a smart environment for personal accounts

This is the third epidose of a small series.

Description:

The days of monochrome green-on-black screens are over, in a remote shell terminal you can have something fancier!

BAD:

GOOD:

Define a series of variables as shortcuts for color escape codes, there are plenty of examples on internet.

colblk='\033[0;30m' # Black - Regular
        colred='\033[0;31m' # Red
        colgrn='\033[0;32m' # Green
        colylw='\033[0;33m' # Yellow
        colblu='\033[0;34m' # Blue
        colpur='\033[0;35m' # Purple
        colcyn='\033[0;36m' # Cyan
        colwht='\033[0;37m' # White
        colbblk='\033[1;30m' # Black - Bold
        colbred='\033[1;31m' # Red
        colbgrn='\033[1;32m' # Green
        colbylw='\033[1;33m' # Yellow
        colbblu='\033[1;34m' # Blue
        colbpur='\033[1;35m' # Purple
        colbcyn='\033[1;36m' # Cyan
        colbwht='\033[1;37m' # White
        colublk='\033[4;30m' # Black - Underline
        colured='\033[4;31m' # Red
        colugrn='\033[4;32m' # Green
        coluylw='\033[4;33m' # Yellow
        colublu='\033[4;34m' # Blue
        colupur='\033[4;35m' # Purple
        colucyn='\033[4;36m' # Cyan
        coluwht='\033[4;37m' # White
        colbgblk='\033[40m'   # Black - Background
        colbgred='\033[41m'   # Red
        colbggrn='\033[42m'   # Green
        colbgylw='\033[43m'   # Yellow
        colbgblu='\033[44m'   # Blue
        colbgpur='\033[45m'   # Purple
        colbgcyn='\033[46m'   # Cyan
        colbgwht='\033[47m'   # White
        colrst='\033[0m'    # Text Reset

Use them whenever you need to highlight the output of a script, and eventually integrate them in a smart prompt (like the one I’ve blogged about sometimes ago).

The echo builtin command requires -e in order to make the colours work. When reading files, cat works, less requires -r. vi may work with some hacking, but it’s not worth to spend too much time, IMHO.

↧

Bash tips & tricks [ep. 4]: Use logging levels

March 21, 2016, 1:52 am

≫ Next: Bash tips & tricks [ep. 5]: Write the output to a logfile

≪ Previous: Bash tips & tricks [ep. 3]: Colour your terminal!

This is the fourth epidose of a small series.

Description:

Support different logging levels natively in your scripts so that your code will be more stable and maintainable.

BAD:

#!/bin/bash -l
...
# for debug only, comment out when OK
echo $a 
do_something $a

# echo $? # sometimes does not work?

GOOD:

Nothing to invent, there are already a few blog posts around about the best practices for log messages. I personally like the one from Michael Wayne Goodman:

http://www.goodmami.org/2011/07/04/Simple-logging-in-BASH-scripts.html

I have reused his code in my scripts with very few modifications to fit my needs:

### verbosity levels
silent_lvl=0
crt_lvl=1
err_lvl=2
wrn_lvl=3
ntf_lvl=4
inf_lvl=5
dbg_lvl=6

## esilent prints output even in silent mode
function esilent () { verb_lvl=$silent_lvl elog "$@" ;}
function enotify () { verb_lvl=$ntf_lvl elog "$@" ;}
function eok ()    { verb_lvl=$ntf_lvl elog "SUCCESS - $@" ;}
function ewarn ()  { verb_lvl=$wrn_lvl elog "${colylw}WARNING${colrst} - $@" ;}
function einfo ()  { verb_lvl=$inf_lvl elog "${colwht}INFO${colrst} ---- $@" ;}
function edebug () { verb_lvl=$dbg_lvl elog "${colgrn}DEBUG${colrst} --- $@" ;}
function eerror () { verb_lvl=$err_lvl elog "${colred}ERROR${colrst} --- $@" ;}
function ecrit ()  { verb_lvl=$crt_lvl elog "${colpur}FATAL${colrst} --- $@" ;}
function edumpvar () { for var in $@ ; do edebug "$var=${!var}" ; done }
function elog() {
        if [ $verbosity -ge $verb_lvl ]; then
                datestring=`date +"%Y-%m-%d %H:%M:%S"`
                echo -e "$datestring - $@"
        fi
}

The edumpvar is handy to have the status of several variables at once:

#!/bin/bash -l
# code
#...

verbosity=6

edumpvar ORACLE_SID ORACLE_HOME

<output>
2016-03-15 23:06:10 - DEBUG --- ORACLE_SID=orcl12c
2016-03-15 23:06:10 - DEBUG --- ORACLE_HOME=/u01/app/oracle/product/12.1.0.2
</output>

If you couple the verbosity level with input parameters you can have something quite clever (e.g. -s for silent, -V for verbose, -G for debug). I’m putting everything into one single snippet just as example, but as you can imagine, you should seriously put all the fixed variables and functions inside an external file that you will systematically include in your scripts:

#!/bin/bash -l

colblk='\033[0;30m' # Black - Regular
colred='\033[0;31m' # Red
colgrn='\033[0;32m' # Green
colylw='\033[0;33m' # Yellow
colpur='\033[0;35m' # Purple
colrst='\033[0m'    # Text Reset

verbosity=4

### verbosity levels
silent_lvl=0
crt_lvl=1
err_lvl=2
wrn_lvl=3
ntf_lvl=4
inf_lvl=5
dbg_lvl=6

## esilent prints output even in silent mode
function esilent () { verb_lvl=$silent_lvl elog "$@" ;}
function enotify () { verb_lvl=$ntf_lvl elog "$@" ;}
function eok ()    { verb_lvl=$ntf_lvl elog "SUCCESS - $@" ;}
function ewarn ()  { verb_lvl=$wrn_lvl elog "${colylw}WARNING${colrst} - $@" ;}
function einfo ()  { verb_lvl=$inf_lvl elog "${colwht}INFO${colrst} ---- $@" ;}
function edebug () { verb_lvl=$dbg_lvl elog "${colgrn}DEBUG${colrst} --- $@" ;}
function eerror () { verb_lvl=$err_lvl elog "${colred}ERROR${colrst} --- $@" ;}
function ecrit ()  { verb_lvl=$crt_lvl elog "${colpur}FATAL${colrst} --- $@" ;}
function edumpvar () { for var in $@ ; do edebug "$var=${!var}" ; done }
function elog() {
        if [ $verbosity -ge $verb_lvl ]; then
                datestring=`date +"%Y-%m-%d %H:%M:%S"`
                echo -e "$datestring - $@"
        fi
}

OPTIND=1
while getopts ":sVG" opt ; do
        case $opt in
        s)
                verbosity=$silent_lvl
                edebug "-s specified: Silent mode"
                ;;
        V)
                verbosity=$inf_lvl
                edebug "-V specified: Verbose mode"
                ;;
        G)
                verbosity=$dbg_lvl
                edebug "-G specified: Debug mode"
                ;;
        esac
done

ewarn "this is a warning"
eerror "this is an error"
einfo "this is an information"
edebug "debugging"
ecrit "CRITICAL MESSAGE!"
edumpvar ORACLE_SID

Example:

$ example.sh -s

$ example.sh

$ example.sh -V

$ example.sh -G

It does not take into account the output file. That will be part of the next tip :-)

↧

Bash tips & tricks [ep. 5]: Write the output to a logfile

March 22, 2016, 6:18 am

≫ Next: Bash tips & tricks [ep. 6]: Check the exit code

≪ Previous: Bash tips & tricks [ep. 4]: Use logging levels

This is the fifth epidose of a small series.

Description:

Logging the output of the scripts to a file is very important. There are several ways to achieve it, I will just show one of my favorites.

BAD:

You can log badly either from the script to a log file:

#!/bin/bash -l

TODAY=`date +"%Y%m%d"
LOGDIR='/path/to/log'
OUTPUT="${LOGDIR}/output_${TODAY}.log"

# create the empty file or overwrite the existing one
> $OUTPUT

echo "Writing to the logfile" | tee -a $OUTPUT
command | tee -a $OUTPUT

echo "ops, this message and command will not be logged"
command
exit $?

or by redirecting badly the standard output of the script:

$ crontab -l
0 * * * * /path/to/script.sh > /path/to/always_the_same_log.out 2>&1

GOOD:

My favorite solution is to automatically open a pipe that will receive from the standard output and redirect to the logfile. With this solution, I can programmatically define my logfile name inside the script (based on the script name and input parameters for example) and forget about redirecting the output everytime that I run a command.

export LOGDIR=/path/to/logfiles
export DATE=`date +"%Y%m%d"`
export DATETIME=`date +"%Y%m%d_%H%M%S"`

ScriptName=`basename $0`
Job=`basename $0 .sh`"_whatever_I_want"
JobClass=`basename $0 .sh`

function Log_Open() {
        if [ $NO_JOB_LOGGING ] ; then
                einfo "Not logging to a logfile because -Z option specified." #(*)
        else
                [[ -d $LOGDIR/$JobClass ]] || mkdir -p $LOGDIR/$JobClass
                Pipe=${LOGDIR}/$JobClass/${Job}_${DATETIME}.pipe
                mkfifo -m 700 $Pipe
                LOGFILE=${LOGDIR}/$JobClass/${Job}_${DATETIME}.log
                exec 3>&1
                tee ${LOGFILE} <$Pipe >&3 &
                teepid=$!
                exec 1>$Pipe
                PIPE_OPENED=1
                enotify Logging to $LOGFILE  # (*)
                [ $SUDO_USER ] && enotify "Sudo user: $SUDO_USER" #(*)
        fi
}

function Log_Close() {
        if [ ${PIPE_OPENED} ] ; then
                exec 1<&3
                sleep 0.2
                ps --pid $teepid >/dev/null
                if [ $? -eq 0 ] ; then
                        # a wait $teepid whould be better but some
                        # commands leave file descriptors open
                        sleep 1
                        kill  $teepid
                fi
                rm $Pipe
                unset PIPE_OPENED
        fi
}

OPTIND=1
while getopts ":Z" opt ; do
        case $opt in
                Z)
                        NO_JOB_LOGGING="true"
                        ;;
        esac
done

Log_Open
echo "whatever I execute here will be logged to $LOGFILE"
command
Log_Close

(*) the functions edebug, einfo, etc, have to be created using the guidelines I have used in this post: Bash tips & tricks [ep. 4]: Use logging levels

The -Z parameter can be used to intentionally avoid logging.

Again, all this stuff (function definitions and variables) should be put in a global include file.

If I execute it:

# [ ludo@testsrv:/scripts [21:10:17] [not set env:"not set"] 0 ] #
# sudo -u oracle ./myscript.sh
2016-03-16 21:10:20 - Logging to /path/to/logfiles/myscript/myscript_whatever_I_want_20160316_211020.log
2016-03-16 21:10:20 - Sudo user: ludo
whatever I execute here will be logged to /path/to/logfiles/myscript/myscript_whatever_I_want_20160316_211020.log

# [ ludo@testsrv:/scripts [21:10:20] [not set env:"not set"] 0 ] #
# sudo -u oracle ./myscript.sh -Z
2016-03-16 21:15:18 - INFO ---- Not logging to a logfile because -Z option specified.
whatever I execute here will be logged to

# [ ludo@testsrv:/scripts [21:10:20] [not set env:"not set"] 0 ] #
# cat /path/to/logfiles/myscript/myscript_whatever_I_want_20160316_211020.log
2016-03-16 21:10:20 - Logging to /path/to/logfiles/myscript/myscript_whatever_I_want_20160316_211020.log
2016-03-16 21:10:20 - Sudo user: ludo
whatever I execute here will be logged to /path/to/logfiles/myscript/myscript_whatever_I_want_20160316_211020.log

↧

Bash tips & tricks [ep. 6]: Check the exit code

March 23, 2016, 6:57 am

≫ Next: Bash tips & tricks [ep. 7]: Cleanup on EXIT with a trap

≪ Previous: Bash tips & tricks [ep. 5]: Write the output to a logfile

This is the fifth epidose of a small series.

Description:

Every command in a script may fail due to external reasons. Bash programming is not functional programming! :-)

After running a command, make sure that you check the exit code and either raise a warning or exit with an error, depending on how a failure can impact the execution of the script.

BAD:

The worst example is not to check the exit code at all:

#!/bin/bash -l

recover -a -f -c ${NWCLIENT} -d ${DEST_FILE_PATH} $BASEBCK_FILENAME
# what if recover fails?

do_something_with_recovered_files

Next one is better, but you may have a lot of additional code to type:

#!/bin/bash -l

recover -a -f -c ${NWCLIENT} -d ${DEST_FILE_PATH} $BASEBCK_FILENAME

#---------
# the following piece of code is frequently copied&pasted 
ERR=$?
if [ $ERR -ne 0 ] ; then
    # I've got an error with the recovery
    eerror "The recovery failed with exit code $ERR"
    Log_Close
    exit $ERR
else
    eok "The recovery succeeded."
fi
#---------

do_something_with_recovered_files

Again, Log_Close, eok, eerror, etc are functions defined using the previous Bash Tips & Tricks in this series.

GOOD:

Define once the check functions that you will use after every command:

# F_check_warn will eventually raise a warning but let the script continue
function F_check_warn() {
        EXITCODE=$1
        shift
        if [ $EXITCODE -eq 0 ] ; then
                eok $@ succeded with exit code $EXITCODE
        else
                ewarn $@ failed with exit code $EXITCODE. The script will continue.
        fi
        # return the same code so other checks can follow this one inside the script
        return $EXITCODE
}

# F_check_warn will eventually raise an error and exit
function F_check_exit() {
        EXITCODE=$1
        shift
        if [ $EXITCODE -eq 0 ] ; then
                eok $@ succeded with exit code $EXITCODE
        else
                eerror $@ failed with exit code $EXITCODE. The script will exit.
                Log_Close
                exit $EXITCODE
        fi
}

CMD="recover -a -f -c ${NWCLIENT} -d ${DEST_FILE_PATH} $BASEBCK_FILENAME"
enotify "Recover command: $CMD"
eval $CMD
F_check_exit $? "Recovery from networker"

do_something_with_the_recovered_files
F_check_warn $? "Non-blocking operation with recovered files"

↧

Bash tips & tricks [ep. 7]: Cleanup on EXIT with a trap

March 24, 2016, 2:16 am

≫ Next: The short story of two ACE Directors, competitors and friends

≪ Previous: Bash tips & tricks [ep. 6]: Check the exit code

This is the seventh epidose of a small series.

Description:

Pipes, temporary files, lock files, processes spawned in background, rows inserted in a status table that need to be updated… Everything need to be cleaned up if the script exits, even when the exit condition is not triggered inside the script.

BAD:

The worst practice is, of course, to forget to cleanup the tempfiles, leaving my output and temporary directories full of files *.tmp, *.pipe, *.lck, etc. I will not show the code because the list of bad practices is quite long…

Better than forgiving to cleanup, but still very bad, is to cleanup everything just before triggering the exit command (in the following example, F_check_exit is a function that exits the script if the first argument is non-zero, as defined it in the previous episode):

...
some_command_that_must_succeed
EXITCODE=$?
if [ $EXITCODE -ne 0 ] ; then
    # Need to exit here, but F_check_exit function does not cleanup correctly
    [[ $TEMPFILE ]] && [[ -f $TEMPFILE ]] && rm $TMPFILE
    [[ $EXP_PIPE ]] && [[ -f $EXP_PIPE ]] && rm $EXP_PIPE
    if [ $CHILD_PID ] ; then
        ps --pid $CHILD_PID >/dev/null
        if [ $? -eq 0 ] ; then
            kill $CHILD_PID # or wait, or what?
        fi
    fi
    F_check_exit $EXITCODE "Some command that must succeed"
fi

A better approach, would be to put all the cleanup tasks in a Cleanup() function and then call this function instead of duplicating all the code everywhere:

...
some_command_that_must_succeed
EXITCODE=$?
[[ $EXITCODE -eq 0 ]] || Cleanup
F_check_exit $EXITCODE "Some command that must succeed"

But still, I need to make sure that I insert this piece of code everywhere. Not optimal yet.

I may include the Cleanup function inside the F_check_exit function, but then I have two inconvenients:
1 – I need to define the Cleanup function in every script that includes my include file
2 – still there will be exit conditions that are not trapped

GOOD:

The good approach would be to trap the EXIT signal with the Cleanup function:

Cleanup() {
  # cleanup your stuff here
}

trap Cleanup EXIT

do_something
F_check_exit $? "Something"

Much better! But what if my include script has some logic that also creates some temporary files?

I can create a global F_Cleanup function that eventually executes the local Cleanup function, if defined. Let me show this:

Include script:

# this is the include file (e.g. $BASEBIN/Init_Env.sh)
function F_cleanup() {
        EXITCODE=$?
        if [ `typeset -F Cleanup` ] ; then
                edebug "Cleanup function defined. Executing it..."
                Cleanup $EXITCODE
                edebug "Cleanup function executed with return code $?"
        else
                edebug "No cleanup function defined."
        fi
        # do other global cleanups
}

### Register the cleanup function
trap F_cleanup EXIT

Main script:

# Cleanup: If any function named Cleanup is defined, it will automatically be executed
# upon the EXIT signal.
Cleanup () {
    if [ $1 -eq 0 ] ; then
        # exit 0 trapped
    else
        # exit !0 trapped
        # report the error
    fi
    # remove pipes, temporary files etc
}

. $BASEBIN/Init_Env.sh

do_something
F_check_exit $? "Something"

The Cleanup function will be executed only if defined.

No Cleanup function: no worries, but still the F_Cleanup function can do some global cleanup not specific to the main script.

↧

The short story of two ACE Directors, competitors and friends

April 27, 2016, 8:05 am

≫ Next: How to fix CPU usage problem in 12c due to DBMS_FEATURE_AWR

≪ Previous: Bash tips & tricks [ep. 7]: Cleanup on EXIT with a trap

Well, this is a completely different post from what I usually publish. I like to blog about technology, personal interests and achievements.

This time I really would like to spend a few words to praise a friend.

I met Franck Pachot for the first time back in 2012, it was my first month in Trivadis and, believe it or not, Franck was working for it as well. I have the evidence here

It was the first time since years that I was meeting someone at least as smart as me on the Oracle stack (later, it happened many more times to meet smarter people, but that’s another story).

A few months later, he left Trivadis to join it’s sworn enemy dbi services. But established friendships and like mindedness don’t disappear, we continued to meet whenever an opportunity was coming up, and we started almost simultaneously to boost our blogging activities, doing public presentations and expanding our presence on the social medias (mostly Twitter).

After I’ve got my Oracle ACE status in 2014, we went together at the Oracle Open World. I used to know many folks there and I can say that I helped Franck to meet many smart people inside and outside the ACE Program. A month after the OOW, he became an Oracle ACE.

Franck’s energy, passion and devotion for the Oracle Community are endless. What he’s doing, including his last big effort, is just great and all the people in the Oracle Community respect him. I can say that now he is far more active than me in the Oracle Community (at least regarding “public” activities ;-))

We both had the target of becoming Oracle ACE Directors, and I have spent a bad month in April when I became an ACE Director and his nomination was still pending.

I said: “If you become ACE Director by the end of April I will write a blog post about you.” And that’s where this post comes from.

Congratulations ACE Director Franck, perfect timing! :-)

—

Ludo

↧

How to fix CPU usage problem in 12c due to DBMS_FEATURE_AWR

June 6, 2016, 8:58 am

≫ Next: Autumn: a season of conferences and travels

≪ Previous: The short story of two ACE Directors, competitors and friends

I love my job because I always have suprises. This week’s surprise has been another problem related to SQL Plan Directives in 12c. Because it is a common problem that potentially affects ALL the customers, I am glad to share the solution on my blog

Symptom of the problem: High CPU usage on the server

My customer’s DBA team has spotted a consistent high CPU utilisation on its servers:

Everyday, at the same time, and for 20-40 minutes, the servers hosting the Oracle databases run literally out of CPU.

Troubleshooting

Ok, it would be too easy to give the solution now. If you cannot wait, jump at the end of this post. But what I like more is to explain how I came to it.

First, I gave a look at the processes consuming CPU. Most of the servers have many consolidated databases on them. Surprisingly, this is what I have found:

It seems that the source of the problem is not a single database, but all of them. Isn’t it? And I see another pattern here: the CPU usage comes always from the [m001] process, so it is not related to a user process.

My customer has Diagnostic Pack so it is easy to go deeper, but you can get the same result with other free tools like s-ash, statspack and snapper. However, this is what I have found in the Instance Top Activity:

Ok, everything comes from a single query with sql_id auyf8px9ywc6j. This is the full sql_text:

(SELECT /*+ FULL(ST) */ SN.DBID ,SN.INSTANCE_NUMBER ,SN.STARTUP_TIME ,ST.STAT_ID ,ST.STAT_NAME ,MIN(SN.SNAP_ID) AS MIN_SNAP ,MAX(SN.SNAP_ID) AS MAX_SNAP ,MIN(CAST(BEGIN_INTERVAL_TIME AS DATE)) AS MIN_DATE ,MAX(CAST(END_INTERVAL_TIME AS DATE)) AS MAX_DATE
FROM DBA_HIST_SNAPSHOT SN ,WRH$_STAT_NAME ST
WHERE SN.BEGIN_INTERVAL_TIME > TRUNC(SYSDATE) - 7 AND SN.END_INTERVAL_TIME < TRUNC(SYSDATE) AND SN.DBID = ST.DBID AND ST.STAT_NAME IN ('DB time', 'DB CPU') GROUP BY SN.DBID,SN.INSTANCE_NUMBER,SN.STARTUP_TIME,ST.STAT_ID,ST.STAT_NAME ) ,DELTA_DATA AS
(SELECT SR.DBID ,SR.INSTANCE_NUMBER ,SR.STAT_NAME ,CASE WHEN SR.STARTUP_TIME BETWEEN SR.MIN_DATE AND SR.MAX_DATE THEN TM1.VALUE + (TM2.VALUE - TM1.VALUE) ELSE (TM2.VALUE - TM1.VALUE) END AS DELTA_TIME
FROM WRH$_SYS_TIME_MODEL TM1 ,WRH$_SYS_TIME_MODEL TM2 ,SNAP_RANGES SR
WHERE TM1.DBID = SR.DBID AND TM1.INSTANCE_NUMBER = SR.INSTANCE_NUMBER AND TM1.SNAP_ID = SR.MIN_SNAP AND TM1.STAT_ID = SR.STAT_ID AND TM2.DBID = SR.DBID AND TM2.INSTANCE_NUMBER = SR.INSTANCE_NUMBER AND TM2.SNAP_ID = SR.MAX_SNAP AND TM2.STAT_ID = SR.STAT_ID )
SELECT STAT_NAME ,ROUND(SUM(DELTA_TIME/1000000),2) AS SECS
FROM DELTA_DATA GROUP BY STAT_NAME

It looks like something made by a DBA, but it comes from the MMON.

Looking around, it seems closely related to two PL/SQL calls that I could find in the SQL Monitor and that systematically fail every day:

DBMS_FEATURE_AWR function calls internally the SQL auyf8px9ywc6j.

The MOS does not know anything about that query, but the internet does:

Oh no, not Franck again! He always discovers new stuff and blogs about it before I do :-)

In his blog post, he points out that the query fails because of error ORA-12751 (resource plan limiting CPU usage) and that it is a problem of Adaptive Dynamic Sampling. Is it true?

What I like to do when I have a problematic sql_id, is to run sqld360 from Mauro Pagano, but the resulting zip file does not contain anything useful, because actually there are no executions and no plans.

SQL> select sql_id,  executions, loads, cpu_time from v$sqlstats where sql_id='auyf8px9ywc6j';

SQL_ID        EXECUTIONS      LOADS   CPU_TIME
------------- ---------- ---------- ----------
auyf8px9ywc6j          0         11          0

SQL> select sql_id,  child_number from v$sql where sql_id='auyf8px9ywc6j';

no rows selected

SQL>

During the execution of the statement (or better, during the period with high CPU usage), there is an entry in v$sql, but no plans associated:

SQL> select sql_id, child_number from v$sql where sql_id='auyf8px9ywc6j';

SQL_ID        CHILD_NUMBER
------------- ------------
auyf8px9ywc6j            0

SQL> select * from table (dbms_xplan.display_cursor('auyf8px9ywc6j',0, 'ALL +NOTE'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  auyf8px9ywc6j, child number 0

WITH SNAP_RANGES AS (SELECT /*+ FULL(ST) */ SN.DBID ,SN.INSTANCE_NUMBER
,SN.STARTUP_TIME ,ST.STAT_ID ,ST.STAT_NAME ,MIN(SN.SNAP_ID) AS MIN_SNAP
,MAX(SN.SNAP_ID) AS MAX_SNAP ,MIN(CAST(BEGIN_INTERVAL_TIME AS DATE)) AS
MIN_DATE ,MAX(CAST(END_INTERVAL_TIME AS DATE)) AS MAX_DATE FROM
DBA_HIST_SNAPSHOT SN ,WRH$_STAT_NAME ST WHERE SN.BEGIN_INTERVAL_TIME >
TRUNC(SYSDATE) - 7 AND SN.END_INTERVAL_TIME < TRUNC(SYSDATE) AND
SN.DBID = ST.DBID AND ST.STAT_NAME IN ('DB time', 'DB CPU') GROUP BY
SN.DBID,SN.INSTANCE_NUMBER,SN.STARTUP_TIME,ST.STAT_ID,ST.STAT_NAME )
,DELTA_DATA AS (SELECT SR.DBID ,SR.INSTANCE_NUMBER ,SR.STAT_NAME ,CASE
WHEN SR.STARTUP_TIME BETWEEN SR.MIN_DATE AND SR.MAX_DATE THEN TM1.VALUE
+ (TM2.VALUE - TM1.VALUE) ELSE (TM2.VALUE - TM1.VALUE) END AS
DELTA_TIME FROM WRH$_SYS_TIME_MODEL TM1 ,WRH$_SYS_TIME_MODEL TM2
,SNAP_RANGES SR WHERE TM1.DBID = SR.DBID AND TM1.INSTANCE_NUMBER =
SR.INSTANCE_NUMBER AND TM1.SNAP_ID = SR.MIN_SNAP AND TM1.STAT_ID =
SR.STAT_ID AND TM2.DBID = SR.DBID AND TM2.

NOTE: cannot fetch plan for SQL_ID: auyf8px9ywc6j, CHILD_NUMBER: 0
      Please verify value of SQL_ID and CHILD_NUMBER;
      It could also be that the plan is no longer in cursor cache (check v$sql_plan)


22 rows selected.

And this is very likely because the statement is still parsing, and all the time is due to the Dynamic Sampling. But because the plan is not there yet, I cannot check it in the DBMS_XPLAN.DISPLAY_CURSOR.

I decided then to trace it with those two statements:

SQL> alter system set events 'sql_trace [sql:auyf8px9ywc6j]';

SQL> alter system set events 'trace[rdbms.SQL_Optimizer.*][sql:auyf8px9ywc6j]';

At the next execution I see indeed the Adaptive Dynamic Sampling in the trace file, the errror due to the exhausted CPU in the resource plan, and the directives that caused the Adaptive Dynamic Sampling:

=======================================
SPD: BEGIN context at query block level
=======================================
Query Block SEL$3877D5D0 (#3)
Applicable DS directives:
   dirid = 17707367266596005344, state = 5, flags = 1, loc = 1 {CJ(8694)[1, 2]}
   dirid = 17748238338555778238, state = 5, flags = 1, loc = 4 {(8694)[2, 3, 4]; (8460)[2, 3]}
   dirid = 10027833930063681981, state = 1, flags = 5, loc = 4 {(8694)[2, 3, 4]; (8460)[2, 3]; (8436)[1, 5]; (8436)[1, 5]}
Checking valid directives for the query block
  SPD: Directive valid: dirid = 17748238338555778238, state = 5, flags = 1, loc = 4 {(8694)[2, 3, 4]; (8460)[2, 3]}
  SPD: Return code in qosdDSDirSetup: EXISTS, estType = GROUP_BY
  SPD: Return code in qosdDSDirSetup: NODIR, estType = HAVING
  SPD: Return code in qosdDSDirSetup: NODIR, estType = QUERY_BLOCK

PARSING IN CURSOR #139834781881608 len=1106 dep=4 uid=0 oct=3 lid=0 tim=3349661181783 hv=4280474888 ad='95770310' sqlid='8w3h8fvzk5r88'
SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring optimizer_features_enable(default) no_parallel result_cache(snapshot=3600) */ SUM(C1) FROM (SELECT /*+ qb_name("innerQuery")  */ 1 AS C1 FROM (SELECT /*+ FULL ("ST") */ "WRM$_SNAPSHOT"."DBID" "DBID","WRM$_SNAPSHOT"."INSTANCE_NUMBER" "INSTANCE_NUMBER","WRM$_SNAPSHOT"."STARTUP_TIME" "STARTUP_TIME","ST"."STAT_ID" "STAT_ID","ST"."STAT_NAME" "STAT_NAME",MIN("WRM$_SNAPSHOT"."SNAP_ID") "MIN_SNAP",MAX("WRM$_SNAPSHOT"."SNAP_ID") "MAX_SNAP",MIN(CAST("WRM$_SNAPSHOT"."BEGIN_INTERVAL_TIME" AS DATE)) "MIN_DATE",MAX(CAST("WRM$_SNAPSHOT"."END_INTERVAL_TIME" AS DATE)) "MAX_DATE" FROM SYS."WRM$_SNAPSHOT" "WRM$_SNAPSHOT","WRH$_STAT_NAME" "ST" WHERE "WRM$_SNAPSHOT"."DBID"="ST"."DBID" AND ("ST"."STAT_NAME"='DB CPU' OR "ST"."STAT_NAME"='DB time') AND "WRM$_SNAPSHOT"."STATUS"=0 AND "WRM$_SNAPSHOT"."BEGIN_INTERVAL_TIME">TRUNC(SYSDATE@!)-7 AND "WRM$_SNAPSHOT"."END_INTERVAL_TIME"<TRUNC(SYSDATE@!) GROUP BY "WRM$_SNAPSHOT"."DBID","WRM$_SNAPSHOT"."INSTANCE_NUMBER","WRM$_SNAPSHOT"."STARTUP_TIME","ST"."STAT_ID","ST"."STAT_NAME") "VW_DIS_1") innerQuery
END OF STMT
...
>> Query Blk Card adjusted from 3.000000 to 2.000000 due to adaptive dynamic sampling

*** KEWUXS - encountered error: (ORA-12751: violation du temps UC ou des règles relatives au temps d'exécution
ORA-06512: à "SYS.DBMS_FEATURE_AWR", ligne 14
ORA-06512: à "SYS.DBMS_FEATURE_AWR", ligne 92
ORA-06512: à ligne 1
ORA-06512: à "SYS.DBMS_SQL", ligne 1707
ORA-06512: à "SYS.DBMS_FEATURE_USAGE_INTERNAL", ligne 312
ORA-06512: à "SYS.DBMS_FEATURE_USAGE_INTERNAL", ligne 522
ORA-06512: à "SYS.DBMS_FEATURE_USAGE_INTERNAL", ligne 694
ORA-06512: à "SYS.DBMS_FEATURE_USAGE_INTERNAL", ligne 791
ORA-06512: à ligne 1
)

So, there are some SQL Plan Directives that force the CBO to run ADS for this query.

SQL> select TYPE, ENABLED, STATE, AUTO_DROP, REASON, CREATED, LAST_MODIFIED, LAST_USED from dba_sql_plan_directives where directive_id in (10027833930063681981, 17707367266596005344, 17748238338555778238);

TYPE             ENA STATE      AUT REASON                               CREATED
---------------- --- ---------- --- ------------------------------------ ---------------------------------------------------------------------------
LAST_MODIFIED                                                               LAST_USED
--------------------------------------------------------------------------- ---------------------------------------------------------------------------
DYNAMIC_SAMPLING YES USABLE     YES GROUP BY CARDINALITY MISESTIMATE     03-JUN-16 02.10.41.000000 PM
03-JUN-16 04.14.32.000000 PM

DYNAMIC_SAMPLING YES USABLE     YES SINGLE TABLE CARDINALITY MISESTIMATE 27-MAR-16 09.01.20.000000 AM
17-APR-16 09.13.01.000000 AM                                                17-APR-16 09.13.01.000000000 AM

DYNAMIC_SAMPLING YES USABLE     YES GROUP BY CARDINALITY MISESTIMATE     13-FEB-16 06.07.36.000000 AM
27-FEB-16 06.03.09.000000 AM                                                03-JUN-16 02.10.41.000000000 PM

This query touches three tables, so instead of relying on the DIRECTIVE_IDs, it’s better to get the directives by object name:

SQL> r
  1  select distinct d.directive_id, TYPE, ENABLED, STATE, AUTO_DROP, REASON, CREATED, LAST_MODIFIED
  2  from dba_sql_plan_directives d join dba_sql_plan_dir_objects o on
  3*     (d.directive_id=o.directive_id) where o.owner='SYS' and o.object_name in ('WRH$_SYS_TIME_MODEL','WRH$_STAT_NAME','WRM$_SNAPSHOT')

DIRECTIVE_ID TYPE             ENA STATE      AUT REASON                               CREATED
------------ ---------------- --- ---------- --- ------------------------------------ ---------------------------------------------------------------------------
LAST_MODIFIED
---------------------------------------------------------------------------
  8.8578E+18 DYNAMIC_SAMPLING YES USABLE     YES JOIN CARDINALITY MISESTIMATE         14-FEB-16 08.11.29.000000 AM
06-JUN-16 01.57.35.000000 PM

  1.7748E+19 DYNAMIC_SAMPLING YES USABLE     YES GROUP BY CARDINALITY MISESTIMATE     19-MAR-16 02.15.17.000000 AM
06-JUN-16 01.57.35.000000 PM

  1.7170E+19 DYNAMIC_SAMPLING YES USABLE     YES JOIN CARDINALITY MISESTIMATE         14-FEB-16 08.11.29.000000 AM
06-JUN-16 01.57.35.000000 PM

  1.7707E+19 DYNAMIC_SAMPLING YES USABLE     YES SINGLE TABLE CARDINALITY MISESTIMATE 13-MAR-16 08.04.38.000000 AM
06-JUN-16 01.57.35.000000 PM

Solution

At this point, the solution is the same already pointed out in one of my previous blog posts: disable the directives individually!

BEGIN
  FOR rec in (select d.directive_id as did 
    from dba_sql_plan_directives d join dba_sql_plan_dir_objects o on
    (d.directive_id=o.directive_id) where o.owner='SYS'
      and o.object_name in ('WRH$_SYS_TIME_MODEL','WRH$_STAT_NAME','WRM$_SNAPSHOT'))
  LOOP
    DBMS_SPD.ALTER_SQL_PLAN_DIRECTIVE ( rec.did, 'ENABLED','NO');
  END LOOP;
END;
/

This very same PL/SQL block must be run on ALL the 12c databases affected by this Adaptive Dynamic Sampling problem on the sql_id auyf8px9ywc6j.

If you have just migrated the database to 12c, it would make even more sense to programmatically “inject” the disabled SQL Plan Directives into every freshly created or upgraded 12c database (until Oracle releases a patch for this non-bug).

-- export from a source where the directives exist and have been disabled
SET SERVEROUTPUT ON
DECLARE
  my_list  DBMS_SPD.OBJECTTAB := DBMS_SPD.ObjectTab();
  dir_cnt  NUMBER;
BEGIN
  DBMS_SPD.CREATE_STGTAB_DIRECTIVE  (table_name => 'AUYF8PX9YWC6J_DIRECTIVES', table_owner=> 'SYSTEM' );
  my_list.extend(3);
 
  -- TAB table
  my_list(1).owner := 'SYS';
  my_list(1).object_name := 'WRH$_SYS_TIME_MODEL';
  my_list(1).object_type := 'TABLE';
  my_list(2).owner := 'SYS';
  my_list(2).object_name := 'WRH$_STAT_NAME';
  my_list(2).object_type := 'TABLE';
  my_list(3).owner := 'SYS';
  my_list(3).object_name := 'WRM$_SNAPSHOT';
  my_list(3).object_type := 'TABLE';

  dir_cnt := DBMS_SPD.PACK_STGTAB_DIRECTIVE(table_name => 'AUYF8PX9YWC6J_DIRECTIVES', table_owner=> 'SYSTEM', obj_list => my_list);
   DBMS_OUTPUT.PUT_LINE('dir_cnt = ' || dir_cnt);
END;
/

expdp directory=data_pump_dir dumpfile=AUYF8PX9YWC6J_DIRECTIVES.dmp logfile=expdp_AUYF8PX9YWC6J_DIRECTIVES.log tables=system.AUYF8PX9YWC6J_DIRECTIVES

-- import into the freshly upgraded/created 12c database
impdp directory=data_pump_dir dumpfile=AUYF8PX9YWC6J_DIRECTIVES.dmp logfile=impdp_AUYF8PX9YWC6J_DIRECTIVES.log

SELECT DBMS_SPD.UNPACK_STGTAB_DIRECTIVE(table_name => 'AUYF8PX9YWC6J_DIRECTIVES', table_owner=> 'SYSTEM') FROM DUAL;

It comes without saying that the next execution has been very quick, consuming almost no CPU and without using ADS.

HTH

—

Ludovico

↧