Quantcast
Viewing all articles
Browse latest Browse all 119

Moving Clusterware Interconnect from single NIC/Bond to HAIP

Very recently I had to configure a customer’s RAC private interconnect from bonding to HAIP to get benefit of both NICs.

So I would like to recap here what the hypothetic steps would be if you need to do the same.

In this example I’ll switch from a single-NIC interconnect (eth1) rather than from a bond configuration, so if you are familiar with the RAC Attack! environment you can try to put everything in place on your own.

First, you need to plan the new network configuration in advance, keeping in mind that there are a couple of important restrictions:

  1. Your interconnect interface naming must be uniform on all nodes in the cluster. The interconnect uses the interface name in its configuration and it doesn’t support different names on different hosts
  2. You must bind the different private interconnect interfaces in different subnets (see Note: 1481481.1 – 11gR2 CSS Terminates/Node Eviction After Unplugging one Network Cable in Redundant Interconnect Environment if you need an explanation)

 

Implementation 

The RAC Attack book uses one interface per node for the interconnect (eth1, using network 172.16.100.0)

To make things a little more complex, we’ll not use the eth1 in the new HAIP configuration, so we’ll test also the deletion of the old interface.

What you need to do is add two new interfaces (host only in your virtualbox) and configure them as eth2 and eth3, e.g. in networks 172.16.101.0 and 172.16.102.0)

eth2      Link encap:Ethernet  HWaddr 08:00:27:32:76:DD
          inet addr:172.16.101.51  Bcast:172.16.101.255  Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fe32:76dd/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:29 errors:0 dropped:0 overruns:0 frame:0
          TX packets:25 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2044 (1.9 KiB)  TX bytes:1714 (1.6 KiB)

eth3      Link encap:Ethernet  HWaddr 08:00:27:2E:05:4B
          inet addr:172.16.102.61  Bcast:172.16.102.255  Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fe2e:54b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:19 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1140 (1.1 KiB)  TX bytes:720 (720.0 b)

 

modify /var/named/racattack in order to use the new addresses (RAC doesn’t care about logical names, it’s just for our convenience):

collabn1 A 192.168.78.51
collabn1-vip A 192.168.78.61
collabn1-priv A 172.16.100.51
collabn1-priv1 A 172.16.101.51
collabn1-priv2 A 172.16.102.61
collabn2 A 192.168.78.52
collabn2-vip A 192.168.78.62
collabn2-priv A 172.16.100.52
collabn2-priv1 A 172.16.101.52
collabn2-priv2 A 172.16.102.62

add also the reverse lookup in  in-addr.arpa:

51.101.16.172 PTR collabn1-priv1.racattack.
52.102.16.172 PTR collabn1-priv2.racattack.
61.101.16.172 PTR collabn2-priv1.racattack.
62.102.16.172 PTR collabn2-priv2.racattack.

 

restart  named on the first node and check that both nodes can ping all the names correctly:

[root@collabn1 named]# ping collabn2-priv1
PING collabn2-priv1.racattack (172.16.101.52) 56(84) bytes of data.
64 bytes from 172.16.101.52: icmp_seq=1 ttl=64 time=1.27 ms
64 bytes from 172.16.101.52: icmp_seq=2 ttl=64 time=0.396 ms
^C
--- collabn2-priv1.racattack ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1293ms
rtt min/avg/max/mdev = 0.396/0.835/1.275/0.440 ms
[root@collabn1 named]# ping collabn2-priv2
PING collabn2-priv2.racattack (172.16.102.62) 56(84) bytes of data.
64 bytes from 172.16.102.62: icmp_seq=1 ttl=64 time=0.924 ms
64 bytes from 172.16.102.62: icmp_seq=2 ttl=64 time=0.251 ms
^C
--- collabn2-priv2.racattack ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1480ms
rtt min/avg/max/mdev = 0.251/0.587/0.924/0.337 ms
[root@collabn1 named]# ping collabn1-priv2
PING collabn1-priv2.racattack (172.16.102.61) 56(84) bytes of data.
64 bytes from 172.16.102.61: icmp_seq=1 ttl=64 time=0.019 ms
64 bytes from 172.16.102.61: icmp_seq=2 ttl=64 time=0.032 ms
^C
--- collabn1-priv2.racattack ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1240ms
rtt min/avg/max/mdev = 0.019/0.025/0.032/0.008 ms
[root@collabn1 named]# ping collabn1-priv1
PING collabn1-priv1.racattack (172.16.101.51) 56(84) bytes of data.
64 bytes from 172.16.101.51: icmp_seq=1 ttl=64 time=0.017 ms
64 bytes from 172.16.101.51: icmp_seq=2 ttl=64 time=0.060 ms
^C
--- collabn1-priv1.racattack ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1224ms
rtt min/avg/max/mdev = 0.017/0.038/0.060/0.022 ms

check the nodes that compose the cluster:

[root@collabn1 network-scripts]# olsnodes -s
collabn1 Active
collabn2 Active

on all nodes, make a copy of the gpnp profile.xml (just in case, the oifcfg tool does the copy automatically)

$ cd $GRID_HOME/gpnp/`hostname`/profiles/peer/
$ cp -p profile.xml profile.xml.bk

List the available networks:

[root@collabn1 bin]# ./oifcfg iflist -p -n
eth0 192.168.78.0 PRIVATE 255.255.255.0
eth1 172.16.100.0 PRIVATE 255.255.255.0
eth1 169.254.0.0 UNKNOWN 255.255.0.0
eth2 172.16.101.0 PRIVATE 255.255.255.0
eth3 172.16.102.0 PRIVATE 255.255.255.0

Get the current ip configuration for the interconnect:

[root@collabn1 bin]# ./oifcfg getif
eth0 192.168.78.0 global public
eth1 172.16.100.0 global cluster_interconnect

one one node only, set the new interconnect interfaces:

[root@collabn1 network-scripts]# oifcfg setif -global eth2/172.16.101.0:cluster_interconnect
[root@collabn1 network-scripts]# oifcfg setif -global eth3/172.16.102.0:cluster_interconnect
[root@collabn1 network-scripts]# oifcfg getif
eth0 192.168.78.0 global public
eth1 172.16.100.0 global cluster_interconnect
eth2 172.16.101.0 global cluster_interconnect
eth3 172.16.102.0 global cluster_interconnect

check that the other nodes has received the new configuration:

[root@collabn2 bin]# ./oifcfg getif
eth0 192.168.78.0 global public
eth1 172.16.100.0 global cluster_interconnect
eth2 172.16.101.0 global cluster_interconnect
eth3 172.16.102.0 global cluster_interconnect

Before deleting the old interface, it would be sensible to stop your cluster resources (in some cases, one of the nodes may be evicted), in any case the cluster must be restarted completely in order to get the new interfaces working.

Note: having three interfaces in a HAIP interconnect is perfectly working, HAIP works from 2 to 4 interfaces. I’m showing how to delete eth1 just for information!! Image may be NSFW.
Clik here to view.
:-)

[root@collabn1 network-scripts]# oifcfg delif -global eth1/172.16.100.0
[root@collabn1 network-scripts]# oifcfg getif
eth0 192.168.78.0 global public
eth2 172.16.101.0 global cluster_interconnect
eth3 172.16.102.0 global cluster_interconnect

on all nodes, shutdown the CRS:

[root@collabn1 network-scripts]# crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'collabn1'
...

Now you can disable the old interface:

[root@collabn1 network-scripts]# ifdown eth1

and modify the parameter ONBOOT=no inside the configuration script of eth1 interface.

Start the cluster again:

[root@collabn1 network-scripts]# crsctl start crs

And check that the resources are up & running:

# crscst stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE collabn1
ONLINE ONLINE collabn2
ora.LISTENER.lsnr
ONLINE ONLINE collabn1
ONLINE ONLINE collabn2
ora.asm
ONLINE ONLINE collabn1 Started
ONLINE ONLINE collabn2 Started
ora.gsd
OFFLINE OFFLINE collabn1
OFFLINE OFFLINE collabn2
ora.net1.network
ONLINE ONLINE collabn1
ONLINE ONLINE collabn2
ora.ons
ONLINE ONLINE collabn1
ONLINE ONLINE collabn2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE collabn2
ora.LISTENER_SCAN2.lsnr
1 ONLINE ONLINE collabn1
ora.LISTENER_SCAN3.lsnr
1 ONLINE ONLINE collabn1
ora.collabn1.vip
1 ONLINE ONLINE collabn1
ora.collabn2.vip
1 ONLINE ONLINE collabn2
ora.cvu
1 ONLINE ONLINE collabn1
ora.oc4j
1 ONLINE ONLINE collabn1
ora.orcl.db
1 ONLINE ONLINE collabn1 Open
2 ONLINE ONLINE collabn2 Open
ora.scan1.vip
1 ONLINE ONLINE collabn2
ora.scan2.vip
1 ONLINE ONLINE collabn1
ora.scan3.vip
1 ONLINE ONLINE collabn1

 

 Testing the high availability

Disconnect cable from one of the two interfaces (virtually if you’re in virtualbox Image may be NSFW.
Clik here to view.
:-)
)

Pay attention at the NO-CARRIER status (in eth2 in this example):

# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 08:00:27:07:33:94 brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
link/ether 08:00:27:7f:b4:88 brd ff:ff:ff:ff:ff:ff
4: eth2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
link/ether 08:00:27:51:1d:78 brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 08:00:27:39:86:f2 brd ff:ff:ff:ff:ff:ff

check that the CRS is still up & running:

# crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE collabn1
ONLINE ONLINE collabn2
ora.LISTENER.lsnr
ONLINE ONLINE collabn1
ONLINE ONLINE collabn2
ora.asm
ONLINE ONLINE collabn1 Started
ONLINE ONLINE collabn2 Started
ora.gsd
OFFLINE OFFLINE collabn1
OFFLINE OFFLINE collabn2
ora.net1.network
ONLINE ONLINE collabn1
ONLINE ONLINE collabn2
ora.ons
ONLINE ONLINE collabn1
ONLINE ONLINE collabn2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE collabn2
ora.LISTENER_SCAN2.lsnr
1 ONLINE ONLINE collabn1
ora.LISTENER_SCAN3.lsnr
1 ONLINE ONLINE collabn1
ora.collabn1.vip
1 ONLINE ONLINE collabn1
ora.collabn2.vip
1 ONLINE ONLINE collabn2
ora.cvu
1 ONLINE ONLINE collabn1
ora.oc4j
1 ONLINE ONLINE collabn1
ora.orcl.db
1 ONLINE ONLINE collabn1 Open
2 ONLINE ONLINE collabn2 Open
ora.scan1.vip
1 ONLINE ONLINE collabn2
ora.scan2.vip
1 ONLINE ONLINE collabn1
ora.scan3.vip
1 ONLINE ONLINE collabn1

 

The virtual interface eth2:1 as failed over on the second interface as eth3:2

eth3:1    Link encap:Ethernet  HWaddr 08:00:27:39:86:F2
          inet addr:169.254.185.134  Bcast:169.254.255.255  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth3:2    Link encap:Ethernet  HWaddr 08:00:27:39:86:F2
          inet addr:169.254.104.52  Bcast:169.254.127.255  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

 

After the cable is reconnected, the virtual interface is back on eth2:

eth2:1 Link encap:Ethernet HWaddr 08:00:27:51:1D:78
inet addr:169.254.104.52 Bcast:169.254.127.255 Mask:255.255.128.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

 

Further information

For this post I’ve used a RAC version 11.2, but RAC 12c use the very same procedure.

You can discover more here about HAIP:

http://docs.oracle.com/cd/E11882_01/server.112/e10803/config_cw.htm#HABPT5279 

And here about how to set it (beside this post!):

https://docs.oracle.com/cd/E11882_01/rac.112/e41959/admin.htm#CWADD90980

https://docs.oracle.com/cd/E11882_01/rac.112/e41959/oifcfg.htm#BCGGEFEI

 

Cheers

Ludo


Viewing all articles
Browse latest Browse all 119

Trending Articles