Quantcast
Channel: Ludovico – DBA survival BLOG
Viewing all articles
Browse latest Browse all 119

Checking usage of HugePages by Oracle databases in Linux environments

$
0
0

Yesterday several databases on one server started logging errors in the alert log:

ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2

That means not enough contiguous free memory in the OS. The first thing that I have checked has been of course the memory, and the used huge pages:

# [ oracle@oraserver1:/home/oracle [10:45:46] [19.3.0.0.0 [GRID] SID=GRID] 0 ] #
$ free
              total        used        free      shared  buff/cache   available
Mem:      528076056   398142940     3236764   119855448   126696352     5646964
Swap:      16760828    11615324     5145504

# [ oracle@oraserver1:/home/oracle [10:46:47] [19.3.0.0.0 [GRID] SID=GRID] 0 ] #
$ cat /proc/meminfo | grep Huge
HugePages_Total:   180000
HugePages_Free:    86029
HugePages_Rsvd:    11507
HugePages_Surp:        0
Hugepagesize:       2048 kB

The memory available (last column in the free command) was indeed quite low, but still plenty of space in the huge pages (86k pages free out of 180k).

The usage by Oracle instances:

# [ oracle@oraserver1:/home/oracle [10:45:39] [19.3.0.0.0 [GRID] SID=GRID] 0 ] #
$ sh mem.sh
DB12 : 54081544
DB22 : 37478820
DB32 : 67970828
DB42 : 14846552
DB52 : 16326380
DB62 : 15122048
DB82 : 56900472
DB92 : 14401080
DBA2 : 12622736
DBB2 : 14379916
DBC2 : 46078336
DBD2 : 46137728
DB72 : 37351336
total :  433697776

You can get the code of mem.sh in this post.

Regarding pure shared memory usage, the situation was what I was expecting:

$ ipcs -m | awk 'BEGIN{a=0} {a+=$5} END{print a}'
369394520064

360G of shared memory usage, much more than what was allocated in the huge pages.

I have compared the situation with the other node in the cluster: it had more memory allocated by the databases (because of more load on it), more huge page usage and less 4k pages consumption overall.

$ sh mem.sh
DB12 : 78678000
DB22 : 14220000
DB32 : 14287528
DB42 : 12369352
DB52 : 14868596
DB62 : 14633984
DB82 : 54316104
DB92 : 86148332
DBA2 : 61473288
DBB2 : 68678788
DBC2 : 9831288
DBD2 : 64759352
DB72 : 68114604
total :  562379216

$ free
              total        used        free      shared  buff/cache   available
Mem:      528076056   402288800    17100464     5818032   108686792   114351784
Swap:      16760828       47360    16713468

$ cat /proc/meminfo | grep Huge
AnonHugePages:     10240 kB
HugePages_Total:   176654
HugePages_Free:    15557
HugePages_Rsvd:    15557
HugePages_Surp:        0
Hugepagesize:       2048 kB

So I was wondering if all the DBs were property allocating the SGA in huge pages or not.

This redhat page has been quite useful to create a quick snippet to check the huge page memory allocation per process:

# [ oracle@oraserver1:/home/oracle [10:55:27] [19.3.0.0.0 [GRID] SID=GRID] 0 ] #
$ cat /proc/707/numa_maps | grep -i hug
60000000 default file=/SYSV00000000\040(deleted) huge dirty=1 mapmax=57 N0=1 kernelpagesize_kB=2048
70000000 default file=/SYSV00000000\040(deleted) huge dirty=1525 mapmax=57 N0=743 N1=782 kernelpagesize_kB=2048
c60000000 interleave:0-1 file=/SYSV0b46df00\040(deleted) huge dirty=1 mapmax=57 N0=1 kernelpagesize_kB=2048


# [ oracle@oraserver1:/home/oracle [10:56:39] [19.3.0.0.0 [GRID] SID=GRID] 0 ] #
$ function pshugepage () {
> HUGEPAGECOUNT=0
> for num in `grep 'huge.*dirty=' /proc/$@/numa_maps | awk '{print $5}' | sed 's/dirty=//'` ; do
> HUGEPAGECOUNT=$((HUGEPAGECOUNT+num))
> done
> echo process $@ using $HUGEPAGECOUNT huge pages
> }

# [ oracle@oraserver1:/home/oracle [10:57:09] [19.3.0.0.0 [GRID] SID=GRID] 0 ] #
$ pshugepage 707
process 707 using 1527 huge pages


# [ oracle@oraserver1:/home/oracle [10:57:11] [19.3.0.0.0 [GRID] SID=GRID] 0 ] #
$ for pid in `ps -eaf | grep [p]mon | awk '{print $2}'` ; do pshugepage $pid ; done
process 707 using 1527 huge pages
process 3685 using 2409 huge pages
process 16092 using 3056 huge pages
process 55718 using 0 huge pages
process 58490 using 0 huge pages
process 70583 using 0 huge pages
process 94479 using 1135 huge pages
process 98216 using 0 huge pages
process 98755 using 0 huge pages
process 100245 using 0 huge pages
process 100265 using 0 huge pages
process 100270 using 0 huge pages
process 101681 using 0 huge pages
process 179079 using 1699 huge pages
process 189585 using 14566 huge pages

It has been easy to spot the databases not using huge pages at all:

# [ oracle@oraserver1:/home/oracle [10:58:26] [19.3.0.0.0 [GRID] SID=GRID] 0 ] #
$ ps -eaf | grep [p]mon
oracle      707      1  0 Sep30 ?        00:23:55 ora_pmon_DB12
oracle     3685      1  0 Nov01 ?        00:09:17 ora_pmon_DB22
oracle    16092      1  0 Oct15 ?        00:04:15 ora_pmon_DB32
oracle    55718      1  0 Aug12 ?        00:08:25 asm_pmon_+ASM2
oracle    58490      1  0 Aug12 ?        00:08:24 apx_pmon_+APX2
oracle    70583      1  0 Aug12 ?        00:57:55 ora_pmon_DB42
oracle    94479      1  0 Oct02 ?        00:32:03 ora_pmon_DB52
oracle    98216      1  0 Aug12 ?        00:58:36 ora_pmon_DB62
oracle    98755      1  0 Aug12 ?        00:59:27 ora_pmon_DB82
oracle   100245      1  0 Aug12 ?        00:56:52 ora_pmon_DB92
oracle   100265      1  0 Aug12 ?        00:51:54 ora_pmon_DBA2
oracle   100270      1  0 Aug12 ?        00:54:57 ora_pmon_DBB2
oracle   101681      1  0 Aug12 ?        00:56:55 ora_pmon_DBC2
oracle   179079      1  0 Sep10 ?        00:35:17 ora_pmon_DBD2
oracle   189585      1  0 Nov01 ?        00:09:34 ora_pmon_DB72

Indeed, after stopping them, the huge page usage has not changed:

# [ oracle@oraserver1:/home/oracle [11:01:52] [11.2.0.4.0 [DBMS EE] SID=DB62] 1 ] #
$ srvctl stop instance -d DB6_SITE1 -i DB62

# [ oracle@oraserver1:/home/oracle [11:02:24] [11.2.0.4.0 [DBMS EE] SID=DB62] 0 ] #
$ srvctl stop instance -d DB4_SITE1 -i DB42

# [ oracle@oraserver1:/home/oracle [11:03:29] [11.2.0.4.0 [DBMS EE] SID=DB62] 0 ] #
$ srvctl stop instance -d DB8_SITE1 -i DB82

# [ oracle@oraserver1:/home/oracle [11:06:36] [11.2.0.4.0 [DBMS EE] SID=DB62] 130 ] #
$ srvctl stop instance -d DB9_SITE1 -i DB92

# [ oracle@oraserver1:/home/oracle [11:07:16] [11.2.0.4.0 [DBMS EE] SID=DB62] 0 ] #
$ srvctl stop instance -d DBA_SITE1 -i DBA2

# [ oracle@oraserver1:/home/oracle [11:07:56] [11.2.0.4.0 [DBMS EE] SID=DB62] 0 ] #
$ srvctl stop instance -d DBB_SITE1 -i DBB2

# [ oracle@oraserver1:/home/oracle [11:08:42] [11.2.0.4.0 [DBMS EE] SID=DB62] 0 ] #
$ srvctl stop instance -d DBC_SITE1 -i DBC2

# [ oracle@oraserver1:/home/oracle [11:09:16] [11.2.0.4.0 [DBMS EE] SID=DB62] 0 ] #
$ cat /proc/meminfo | grep Huge
HugePages_Total:   180000
HugePages_Free:    86029
HugePages_Rsvd:    11507
HugePages_Surp:        0
Hugepagesize:       2048 kB

But after starting them back I could see the new huge pages reserved/allocated:

# [ oracle@oraserver1:/home/oracle [11:10:35] [11.2.0.4.0 [DBMS EE] SID=DB62] 0 ] #
$ srvctl start instance -d DB6_SITE1 -i DB62

# [ oracle@oraserver1:/home/oracle [11:12:14] [11.2.0.4.0 [DBMS EE] SID=DB62] 0 ] #
$ srvctl start instance -d DB4_SITE1 -i DB42

# [ oracle@oraserver1:/home/oracle [11:12:54] [11.2.0.4.0 [DBMS EE] SID=DB62] 0 ] #
$ srvctl start instance -d DB8_SITE1 -i DB82

# [ oracle@oraserver1:/home/oracle [11:13:41] [11.2.0.4.0 [DBMS EE] SID=DB62] 0 ] #
$ srvctl start instance -d DB9_SITE1 -i DB92

# [ oracle@oraserver1:/home/oracle [11:14:43] [11.2.0.4.0 [DBMS EE] SID=DB62] 0 ] #
$ srvctl start instance -d DBA_SITE1 -i DBA2

# [ oracle@oraserver1:/home/oracle [11:15:25] [11.2.0.4.0 [DBMS EE] SID=DB62] 0 ] #
$ srvctl start instance -d DBB_SITE1 -i DBB2

# [ oracle@oraserver1:/home/oracle [11:15:54] [11.2.0.4.0 [DBMS EE] SID=DB62] 0 ] #
$ srvctl start instance -d DBC_SITE1 -i DBC2

# [ oracle@oraserver1:/home/oracle [11:17:49] [11.2.0.4.0 [DBMS EE] SID=DB62] 0 ] #
$ cat /proc/meminfo | grep Huge
HugePages_Total:   180000
HugePages_Free:    72820
HugePages_Rsvd:    68961
HugePages_Surp:        0
Hugepagesize:       2048 kB

# [ oracle@oraserver1:/home/oracle [11:17:54] [11.2.0.4.0 [DBMS EE] SID=DB62] 0 ] #
$ free
              total        used        free      shared  buff/cache   available
Mem:      528076056   392011828   123587116     5371848    12477112   126250868
Swap:      16760828      587308    16173520

The reason was that the server has been started without huge pages first, and after a few instances started, the huge pages has been set.

HTH

Ludovico

 


Viewing all articles
Browse latest Browse all 119

Trending Articles