If you use Oracle Clusterware or you deploy your databases to the Oracle Cloud, you probably have some application services defined with srvctl for your database.
If you have many databases, services and nodes, it might be annoying, when doing maintenance or service relocation, to have a quick overview about how services are distributed across the nodes and what’s their status.
With srvctl (the official tool for that), it is a per-database operation:
$ srvctl status service PRKO-2082 : Missing mandatory option -db
If you have many databases, you have to run db by db.
It is also slow! For example, this database has 20 services. Getting the status takes 27 seconds:
# [ oracle@server1:/home/oracle/ [15:52:00] [11.2.0.4.0 [DBMS EE] SID=HRDEV1] 1 ] # $ time srvctl status service -d hrdev_site1 Service SERVICE_NUMBER_01 is running on instance(s) HRDEV4 Service SERVICE_NUMBER_02 is running on instance(s) HRDEV4 Service SERVICE_NUMBER_03 is running on instance(s) HRDEV4 Service SERVICE_NUMBER_04 is running on instance(s) HRDEV4 Service SERVICE_NUMBER_05 is running on instance(s) HRDEV4 Service SERVICE_NUMBER_06 is running on instance(s) HRDEV4 Service SERVICE_NUMBER_07 is running on instance(s) HRDEV4 Service SERVICE_NUMBER_08 is running on instance(s) HRDEV4 Service SERVICE_NUMBER_09 is running on instance(s) HRDEV4 Service SERVICE_NUMBER_10 is running on instance(s) HRDEV4 Service SERVICE_NUMBER_11 is running on instance(s) HRDEV4 Service SERVICE_NUMBER_12 is running on instance(s) HRDEV4 Service SERVICE_NUMBER_13 is running on instance(s) HRDEV4 Service SERVICE_NUMBER_14 is running on instance(s) HRDEV4 Service SERVICE_NUMBER_15 is running on instance(s) HRDEV4 Service SERVICE_NUMBER_16 is running on instance(s) HRDEV4 Service SERVICE_NUMBER_17 is running on instance(s) HRDEV4 Service SERVICE_NUMBER_18 is running on instance(s) HRDEV4 Service SERVICE_NUMBER_19 is running on instance(s) HRDEV4 Service SERVICE_NUMBER_20 is running on instance(s) HRDEV4 real 0m27.858s user 0m1.365s sys 0m1.143s
Instead of operating row-by-row (get the status for each service), why not relying on the cluster resources with crsctl and get the big picture once?
$ time crsctl stat res -f -w "(TYPE = ora.service.type)" ... ... real 0m0.655s user 0m0.169s sys 0m0.098s
crsctl stat res -f returns a list of ATTRIBUTE_NAME=value for each service, eventually more than one if the service is not singleton/single instance but uniform/multi instance.
By parsing them with some awk code can provide nice results!
STATE, INTERNAL_STATE and TARGET are useful in this case and might be used to display colours as well.
- Green: Status ONLINE, Target ONLINE, STABLE
- Black: Status OFFLINE, Target OFFLNE, STABLE
- Red: Status ONLINE, Target OFFLINE, STABLE
- Yellow: all other cases
Here’s the code:
if [ -f /etc/oracle/olr.loc ] ; then export ORA_CLU_HOME=`cat /etc/oracle/olr.loc 2>/dev/null | grep crs_home | awk -F= '{print $2}'` export CRS_EXISTS=1 export CRSCTL=$ORA_CLU_HOME/bin/crsctl else export CRS_EXISTS=0 fi svcstat () { if [ $CRS_EXISTS -eq 1 ]; then ${CRSCTL} stat res -f -w "(TYPE = ora.service.type)" | awk -F= ' function print_row() { dbbcol=""; dbecol=""; instbcol=""; instecol=""; instances=res["INSTANCE_COUNT 1"]; for(i=1;i<=instances;i++) { # if at least one of the services is online, the service is online (then I paint it green) if (res["STATE " i] == "ONLINE" ) { dbbcol="\033[0;32m"; dbecol="\033[0m"; } } # db unique name is always the second part of the resource name # because it does not change, I can get it once from the resource name res["DB_UNIQUE_NAME"]=substr(substr(res["NAME"],5),1,index(substr(res["NAME"],5),".")-1); # same for service name res["SERVICE_NAME"]=substr(res["NAME"],index(substr(res["NAME"],5),".")+5,length(substr(res["NAME"],index(substr(res["NAME"],5),".")+5))-4); #starting printing the first part of the information printf ("%s%-24s %-30s%s",dbbcol, res["DB_UNIQUE_NAME"], res["SERVICE_NAME"], dbecol); # here, instance need to map to the correct server. # the mapping is node by attribute TARGET_SERVER (not last server) for ( n in node ) { node_name=node[n]; status[node_name]=""; for (i=1; i<=instances; i++) { # we are on the instance that matches the server if (node_name == res["TARGET_SERVER " i]) { res["SERVER_NAME " i]=node_name; if (status[node_name] !~ "ONLINE") { # when a service relocates both instances get the survival target_server # but just one is ONLINE... so we need to get always the ONLINE one. #printf("was::%s:", status[node_name]); status[node_name]=res["STATE " i]; } # colors modes if ( res["STATE " i] == "ONLINE" && res["INTERNAL_STATE " i] == "STABLE" ) { # online and stable: GREEN status[node_name]=sprintf("\033[0;32m%-14s\033[0m", status[node_name]); } else if ( res["STATE " i] != "ONLINE" && res["INTERNAL_STATE " i] == "STABLE" ) { # offline and stable if ( res["TARGET " i] == "OFFLINE" ) { # offline, stable, target offline: BLACK status[node_name]=sprintf("%-14s", status[node_name]); } else { # offline, stable, target online: RED status[node_name]=sprintf("\033[0;31m%-14s\033[0m", status[node_name]); } } else { # all other cases: offline and starting, online and stopping, clearning, etc.: YELLOW status[node_name]=sprintf("\033[0;33m%-14s\033[0m", status[node_name]); } #printf("%s %s %s %s\n", status[node_name], node[n], res["STATE " i], res["INTERNAL_STATE " i]); } } printf(" %-14s", status[node_name]); } printf("\n"); } function pad (string, len, char) { ret = string; for ( i = length(string); i<len ; i++) { ret = sprintf("%s%s",ret,char); } return ret; } BEGIN { debug = 0; first = 1; afterempty=1; # this loop should set: # node[1]=server1; node[2]=server2; nodes=2; nodes=0; while ("olsnodes" | getline a) { nodes++; node[nodes] = a; } fmt="%-24s %-30s"; printf (fmt, "DB_Unique_Name", "Service_Name"); for ( n in node ) { printf (" %-14s", node[n]); } printf ("\n"); printf (fmt, pad("",24,"-"), pad("",30,"-")); for ( n in node ) { printf (" %s", pad("",14,"-")); } printf ("\n"); } # MAIN awk svcstat { if ( $1 == "NAME" ) { if ( first != 1 && res["NAME"] == $2 ) { if ( debug == 1 ) print "Secondary instance"; instance++; } else { if ( first != 1 ) { print_row(); } first = 0; instance=1; delete res; res["NAME"] = $2; } } else { res[$1 " " instance] = $2 ; } } END { #if ( debug == 1 ) for (key in res) { print key ": " res[key] } print_row(); } '; else echo "svcstat not available on non-clustered environments"; false; fi }
Here’s what you can expect, for 92 services distributed on 4 nodes and a dozen of databases (the output is snipped and the names are masked):
$ time svcstat DB_Unique_Name Service_Name server1 server2 server3 server4 ------------------ ------------------ -------- -------- -------- -------- hrdev_site1 SERVICE_NUMBER_01 ONLINE hrdev_site1 SERVICE_NUMBER_02 ONLINE ... hrdev_site1 SERVICE_NUMBER_20 ONLINE hrstg_site1 SERVICE_NUMBER_21 ONLINE hrstg_site1 SERVICE_NUMBER_22 ONLINE ... hrstg_site1 SERVICE_NUMBER_41 ONLINE hrtest_site1 SERVICE_NUMBER_42 ONLINE hrtest_site1 SERVICE_NUMBER_43 ONLINE ... hrtest_site1 SERVICE_NUMBER_62 ONLINE hrtest_site1 SERVICE_NUMBER_63 ONLINE hrtest_site1 SERVICE_NUMBER_64 ONLINE hrtest_site1 SERVICE_NUMBER_65 ONLINE hrtest_site1 SERVICE_NUMBER_66 ONLINE erpdev_site1 SERVICE_NUMBER_67 ONLINE erptest_site1 SERVICE_NUMBER_68 ONLINE cmsstg_site1 SERVICE_NUMBER_69 ONLINE cmsstg_site1 SERVICE_NUMBER_70 ONLINE ... cmsstg_site1 SERVICE_NUMBER_74 ONLINE cmsstg_site1 SERVICE_NUMBER_75 ONLINE cmstest_site1 SERVICE_NUMBER_76 ONLINE ... cmstest_site1 SERVICE_NUMBER_81 ONLINE kbtest_site1 SERVICE_NUMBER_82 ONLINE ... kbtest_site1 SERVICE_NUMBER_84 ONLINE reporting_site1 SERVICE_NUMBER_85 ONLINE paydev_site1 SERVICE_NUMBER_86 ONLINE payrep_site1 SERVICE_NUMBER_87 ONLINE ... paytest_site1 SERVICE_NUMBER_90 ONLINE paytest_site1 SERVICE_NUMBER_91 ONLINE crm_site1 SERVICE_NUMBER_92 ONLINE real 0m0.358s user 0m0.232s sys 0m0.134s
I’d be curious to know if it works well for your environment, please comment here.
Thanks
—
Ludo