Quantcast
Channel: Ludovico – DBA survival BLOG
Viewing all articles
Browse latest Browse all 119

Oracle Clusterware Services Status at a glance, fast!

$
0
0

If you use Oracle Clusterware or you deploy your databases to the Oracle Cloud, you probably have some application services defined with srvctl for your database.

If you have many databases, services and nodes, it might be annoying, when doing maintenance or service relocation, to have a quick overview about how services are distributed across the nodes and what’s their status.

With srvctl (the official tool for that), it is a per-database operation:

$ srvctl status service
PRKO-2082 : Missing mandatory option -db

If you have many databases, you have to run db by db.

It is also slow! For example, this database has 20 services. Getting the status takes 27 seconds:

# [ oracle@server1:/home/oracle/ [15:52:00] [11.2.0.4.0 [DBMS EE] SID=HRDEV1] 1 ] #
$ time srvctl status service -d hrdev_site1
Service SERVICE_NUMBER_01 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_02 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_03 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_04 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_05 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_06 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_07 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_08 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_09 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_10 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_11 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_12 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_13 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_14 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_15 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_16 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_17 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_18 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_19 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_20 is running on instance(s) HRDEV4

real    0m27.858s
user    0m1.365s
sys     0m1.143s

Instead of operating row-by-row (get the status for each service), why not relying on the cluster resources with crsctl and get the big picture once?

$ time crsctl stat res -f -w "(TYPE = ora.service.type)"
...
...

real    0m0.655s
user    0m0.169s
sys     0m0.098s

crsctl stat res -f  returns a list of ATTRIBUTE_NAME=value for each service, eventually more than one if the service is not singleton/single instance  but uniform/multi instance.

By parsing them with some awk code can provide nice results!

STATE, INTERNAL_STATE and TARGET are useful in this case and might be used to display colours as well.

  • Green: Status ONLINE, Target ONLINE, STABLE
  • Black: Status OFFLINE, Target OFFLNE, STABLE
  • Red: Status ONLINE, Target OFFLINE, STABLE
  • Yellow: all other cases

Here’s the code:

if [ -f /etc/oracle/olr.loc ] ; then
        export ORA_CLU_HOME=`cat /etc/oracle/olr.loc 2>/dev/null | grep crs_home | awk -F= '{print $2}'`
        export CRS_EXISTS=1
        export CRSCTL=$ORA_CLU_HOME/bin/crsctl
else
        export CRS_EXISTS=0
fi

svcstat ()
{
    if [ $CRS_EXISTS -eq 1 ]; then
        ${CRSCTL} stat res -f -w "(TYPE = ora.service.type)" | awk -F= '
function print_row() {
        dbbcol="";
        dbecol="";
        instbcol="";
        instecol="";
        instances=res["INSTANCE_COUNT 1"];
        for(i=1;i<=instances;i++) {
                # if at least one of the services is online, the service is online (then I paint it green)
                if (res["STATE " i] == "ONLINE" ) {
                        dbbcol="\033[0;32m";
                        dbecol="\033[0m";
                }
        }
        # db unique name is always the second part of the resource name
        # because it does not change, I can get it once from the resource name
        res["DB_UNIQUE_NAME"]=substr(substr(res["NAME"],5),1,index(substr(res["NAME"],5),".")-1);

        # same for service name
        res["SERVICE_NAME"]=substr(res["NAME"],index(substr(res["NAME"],5),".")+5,length(substr(res["NAME"],index(substr(res["NAME"],5),".")+5))-4);

        #starting printing the first part of the information
        printf ("%s%-24s %-30s%s",dbbcol, res["DB_UNIQUE_NAME"], res["SERVICE_NAME"], dbecol);

        # here, instance need to map to the correct server.
        # the mapping is node by attribute TARGET_SERVER (not last server)
        for ( n in node ) {
                node_name=node[n];
                status[node_name]="";
                for (i=1; i<=instances; i++) {
                        # we are on the instance that matches the server
                        if (node_name == res["TARGET_SERVER " i]) {
                                res["SERVER_NAME " i]=node_name;
                                if (status[node_name] !~ "ONLINE") {
                                        # when a service relocates both instances get the survival target_server
                                        # but just one is ONLINE... so we need to get always the ONLINE one.
                                        #printf("was::%s:", status[node_name]);
                                        status[node_name]=res["STATE " i];
                                }

                                # colors modes
                                if ( res["STATE " i] == "ONLINE" && res["INTERNAL_STATE " i] == "STABLE" ) {
                                        # online and stable: GREEN
                                        status[node_name]=sprintf("\033[0;32m%-14s\033[0m", status[node_name]);
                                }
                                else if ( res["STATE " i] != "ONLINE" && res["INTERNAL_STATE " i] == "STABLE" ) {
                                        # offline and stable
                                        if ( res["TARGET " i] == "OFFLINE" ) {
                                                # offline, stable, target offline: BLACK
                                                status[node_name]=sprintf("%-14s", status[node_name]);
                                        }
                                        else {
                                                # offline, stable, target online: RED
                                                status[node_name]=sprintf("\033[0;31m%-14s\033[0m", status[node_name]);
                                        }
                                }
                                else {
                                        # all other cases: offline and starting, online and stopping, clearning, etc.: YELLOW
                                        status[node_name]=sprintf("\033[0;33m%-14s\033[0m", status[node_name]);
                                }
                                #printf("%s %s %s %s\n", status[node_name], node[n], res["STATE " i], res["INTERNAL_STATE " i]);
                        }
                }
               printf(" %-14s", status[node_name]);
        }
        printf("\n");
}
function pad (string, len, char) {
        ret = string;
        for ( i = length(string); i<len ; i++) {
                ret = sprintf("%s%s",ret,char);
        }
        return ret;
}
BEGIN {
        debug = 0;
        first = 1;
        afterempty=1;
        # this loop should set:
        # node[1]=server1; node[2]=server2; nodes=2;
        nodes=0;
        while ("olsnodes" | getline a) {
                nodes++;
                node[nodes] = a;
        }
        fmt="%-24s %-30s";
        printf (fmt, "DB_Unique_Name", "Service_Name");
        for ( n in node ) {
                printf (" %-14s", node[n]);
        }
        printf ("\n");
        printf (fmt, pad("",24,"-"), pad("",30,"-"));
        for ( n in node ) {
                printf (" %s", pad("",14,"-"));
        }
        printf ("\n");

}
# MAIN awk svcstat
{
        if ( $1 == "NAME" ) {
                if ( first != 1 && res["NAME"] == $2 ) {
                        if ( debug == 1 ) print "Secondary instance";
                        instance++;
                }
                else {
                        if ( first != 1 ) {
                                print_row();
                        }
                        first = 0;
                        instance=1;
                        delete res;
                        res["NAME"] = $2;
                }
        }
        else  {
                res[$1 " " instance] = $2 ;

        }
}
END {
        #if ( debug == 1 ) for (key in res) { print key ": " res[key] }
        print_row();
}
';
    else
        echo "svcstat not available on non-clustered environments";
        false;
    fi
}

Here’s what you can expect, for 92 services distributed on 4 nodes and a dozen of databases (the output is snipped and the names are masked):

$ time svcstat
DB_Unique_Name     Service_Name       server1  server2  server3  server4
------------------ ------------------ -------- -------- -------- --------
hrdev_site1        SERVICE_NUMBER_01                             ONLINE
hrdev_site1        SERVICE_NUMBER_02                             ONLINE
...
hrdev_site1        SERVICE_NUMBER_20                             ONLINE
hrstg_site1        SERVICE_NUMBER_21                    ONLINE  
hrstg_site1        SERVICE_NUMBER_22                    ONLINE  
...
hrstg_site1        SERVICE_NUMBER_41                    ONLINE  
hrtest_site1       SERVICE_NUMBER_42           ONLINE           
hrtest_site1       SERVICE_NUMBER_43           ONLINE           
...
hrtest_site1       SERVICE_NUMBER_62           ONLINE           
hrtest_site1       SERVICE_NUMBER_63           ONLINE           
hrtest_site1       SERVICE_NUMBER_64           ONLINE           
hrtest_site1       SERVICE_NUMBER_65           ONLINE           
hrtest_site1       SERVICE_NUMBER_66           ONLINE           
erpdev_site1       SERVICE_NUMBER_67  ONLINE                    
erptest_site1      SERVICE_NUMBER_68  ONLINE                    
cmsstg_site1       SERVICE_NUMBER_69  ONLINE                    
cmsstg_site1       SERVICE_NUMBER_70  ONLINE                    
...
cmsstg_site1       SERVICE_NUMBER_74  ONLINE                    
cmsstg_site1       SERVICE_NUMBER_75  ONLINE                    
cmstest_site1      SERVICE_NUMBER_76  ONLINE                    
...
cmstest_site1      SERVICE_NUMBER_81  ONLINE                    
kbtest_site1       SERVICE_NUMBER_82                    ONLINE           
...
kbtest_site1       SERVICE_NUMBER_84                    ONLINE           
reporting_site1    SERVICE_NUMBER_85  ONLINE                    
paydev_site1       SERVICE_NUMBER_86           ONLINE           
payrep_site1       SERVICE_NUMBER_87           ONLINE           
...
paytest_site1      SERVICE_NUMBER_90           ONLINE           
paytest_site1      SERVICE_NUMBER_91           ONLINE           
crm_site1          SERVICE_NUMBER_92                             ONLINE

real    0m0.358s
user    0m0.232s
sys     0m0.134s

I’d be curious to know if it works well for your environment, please comment here. 🙂

Thanks

Ludo


Viewing all articles
Browse latest Browse all 119

Trending Articles