Chapter 7. Oracle 10gR2 Clusterware
Although this is documented in Oracle install manuals, in metalink notes, and elsewhere, it is consolidated here, so that this manual can be used as the main reference for a successful installation. A good supplementary Oracle article for doing RAC installations can be found here:
http://www.oracle.com/technology/pub/articles/smiley_rac10g_install.html
All four RAC nodes need to be up and running, and in the CS4 cluster. All GFS volumes that will be used for this Oracle install should be mounted on all four nodes. At a minimum, the GFS volume (/mnt/ohome) that will contain the shared installation must be mounted:
Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/redo1-log1 4062624 20 4062604 1% /mnt/log1 /dev/mapper/redo2-log2 4062368 20 4062348 1% /mnt/log2 /dev/mapper/redo3-log3 4062624 20 4062604 1% /mnt/log3 /dev/mapper/redo4-log4 4062368 20 4062348 1% /mnt/log4 /dev/mapper/common-ohome 6159232 20 6159212 1% /mnt/ohome /dev/mapper/oradata-datafiles 50193856 40 50193816 1% /mnt/datafiles
The certified version of Oracle 10g on GFS requires that the two clusterware files be located on shared raw partitions and be visible by all RAC nodes in the cluster. The GULM lock server nodes do not need access to these files. These partitions are usually located on a small LUN that is not used for other purposes.
The LUN /dev/sda should be large enough to create two 256MB partitions. Using the /dev/sda command, create two primary partitions:
rac1 # fdisk /dev/sda
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel Building a new DOS disklabel. Changes will remain in memory only, until you decide to write them. After that, of course, the previous content will not be recoverable. Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite) Command (m for help): p Disk /dev/sda: 536 MB, 536870912 bytes 17 heads, 61 sectors/track, 1011 cylinders Units = cylinders of 1037 * 512 = 530944 bytes Device Boot Start End Blocks Id System Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 1 First cylinder (1-1011, default 1): Using default value 1 Last cylinder or +size or +sizeM or +sizeK (1-1011, default 1011): +256M Command (m for help): p Disk /dev/sda: 536 MB, 536870912 bytes 17 heads, 61 sectors/track, 1011 cylinders Units = cylinders of 1037 * 512 = 530944 bytes Device Boot Start End Blocks Id System /dev/sda1 1 483 250405 83 Linux Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 2 First cylinder (484-1011, default 484): Using default value 484 Last cylinder or +size or +sizeM or +sizeK (484-1011, default 1011): Using default value 1011 Command (m for help): p Disk /dev/sda: 536 MB, 536870912 bytes 17 heads, 61 sectors/track, 1011 cylinders Units = cylinders of 1037 * 512 = 530944 bytes Device Boot Start End Blocks Id System /dev/sda1 1 483 250405 83 Linux /dev/sda2 484 1011 273768 83 Linux Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. Syncing disks.
If the other nodes were already up and running while you created these partitions, these other nodes must re-read the partition table from disk (blockdev –rereadpt /dev/sda).
Make sure the service rawdevices is enabled on all four RAC nodes for the run level that will be used. This example enables it for both run levels. Run:
rac1 # chkconfig –level 35 rawdevices on
The mapping occurs in the files /etc/sysconfig/rawdevices
# raw device bindings # format: <rawdev> <major> <minor> # <rawdev> <blockdev> # example: /dev/raw/raw1 /dev/sda1 # /dev/raw/raw2 8 5 /dev/raw/raw1 /dev/sda1 /dev/raw/raw2 /dev/sda2
The permissions of these files must always be owned by the oracle user used to install the software (oracle). A 10 second delay is needed to insure that the rawdevices service has a chance to configure the /dev/raw directory. Add these lines to the /etc/rc.local file. This file is symbolically linked to /etc/rc?.d/S99local.
echo "Sleep a bit first and then set the permissions on raw" sleep 10 chown oracle:dba /dev/raw/raw?
After you install Clusterware and if you see a set of three /tmp/crsctl.<pid> trace files, then Clusterware did not start and there will be an error message in these files, usually complaining about permissions. Make sure the /dev/raw/raw? files are owned by oracle owner (in this example, oracle:dba)
All four RAC nodes should have the same settings.
# # Oracle specific settings # x86 Huge Pages are 2MB # #vm.hugetlb_pool = 3000 # kernel.shmmax = 4047483648 kernel.shmmni = 4096 kernel.shmall = 1051168 kernel.sem = 250 32000 100 128 net.ipv4.ip_local_port_range = 1024 65000 fs.file-max = 65536 # # This is for Oracle RAC core GCS services # net.core.rmem_default = 1048576 net.core.rmem_max = 1048576 net.core.wmem_default = 1048576 net.core.wmem_max = 1048576
The parameter that most often needs to be modified to support larger SGAs is the shared memory setting: kernel.shmmax. Typically 75% of the memory in a node should be allocated to the SGA. This does assume a modest number of Oracle foreground processes, which can consume physical memory for allocating the PGA (Oracle Process Global Area). The PGA is typically used for sorting. On a 4GB system, a 3GB SGA is recommended. The amount of memory consumed by the SGA and the PGA are very workload-dependant.
The maximum size of the SGA on a 64-bit version of RHEL4 is currently slightly less than 128GB. The maximum size of the SGA on a 32-bit version of RHEL4 varies a bit. The standard size is 1.7GB. If the oracle binary is lower mapped, then this maximum can be increased to 2.5GB on –SMP kernels and 3.7GB on –HUGEMEM kernels. Lower mapping is an Oracle approved linking technique that changes the address where the SGA attaches in the user address space. When it is lowered, there is more space available for attaching a larger shared memory segment. See Metalink Doc 260152.1
Another strategy for extending the SGA to 8GB and higher in a 32-bit environment is through the use of the /dev/shm filesystem, although this is not recommended. If you need this much SGA, then using the 64-bit version of Oracle and RHEL4 is a better strategy.
The net.core.* parameters establish the UDP buffers that will be used by the Oracle Global Cache Services (GCS) for heartbeats and inter-node communication (including the movement of Oracle buffers). For large SGAs (more than 16GB), the use of HugeTLBs is recommended.
TLBs or Translation Lookaside Buffers is the working end of a Page Table Entry (PTE). The hardware speaks in physical addresses, whereas the processes running in user-mode speak only PVAs (Process Virtual Address), including the SGA. These addresses have to be translated and modern CPUs must provide some TLB register space so that during memory loads, the translation does not cause extra memory references.
By default, the page table entry on x86 hardware is 4K. When configuring a large SGA (16GB or more), the number of 4K PTEs (or TLBs slots) required to just map the SGA into the user’s process space requires 4,000,000 PTEs. HugeTLBs are a mechanism in RHEL that permits the use of 2MB hardware page tables. This mechanism reduces the number of PTEs required to map the SGA. The performance improvments increase with the size of the SGA, but can be between 10-30%.
During RHEL installation, 4GB of swap was set up and the Oracle Installer will check for this minimum.
You have to create a user (typically oracle or oinstall). The user name is somewhat arbitrary, but the DBAs might insist that it be one of these two. However, the group must be dba. Configure the /etc/sudoers file so that oracle admin users can safely execute root commands, which is required during and after the install:
# User alias specification User_Alias SYSAD=oracle, oinstall User_Alias USERADM=oracle, oinstall # User privilege specification SYSAD ALL=(ALL) ALL USERADM ALL=(root) NOPASSWD:/usr/local/etc/yanis.client root ALL=(ALL) ALL
You have to insure that whenever Clusterware talks to other nodes in the cluster, the ssh commands proceed unimpeded and without extraneous session dialog. In order to insure that all connection pathways are set up, run:
rac1 $ ssh rac2 date
Wed May 10 21:48:02 PDT 2006
and not return any extra strings or prompts, such as:
rac1 $ ssh rac2 date
oracle@rac2's password: OR The authenticity of host 'rac2 (192.168.1.151)' can't be established. RSA key fingerprint is 48:e5:e0:84:63:62:03:84:c7:57:05:6b:58:7d:12:07. Are you sure you want to continue connecting (yes/no)?
Create a file of ~/.ssh/authorized_keys, distribute it to all four nodes and then proceed to execute ssh hostname date to every host in the RAC cluster, in all combinations over both the primary and heartbeat interfaces. If you miss any one of them, the Oracle Clusterware installer will fail at the node verification step.
On rac1, login to the oracle user and make sure $HOME/.ssh is empty. Do not supply a passphrase for the keygen command; just press Return. Run:
rac1 $ ssh-keygen –t dsa
Generating public/private dsa key pair. Enter file in which to save the key (/home/oracle/.ssh/id_dsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/oracle/.ssh/id_dsa. Your public key has been saved in /home/oracle/.ssh/id_dsa.pub. The key fingerprint is: 9e:98:88:5c:17:bc:1f:dc:05:33:21:cf:04:99:23:e1 oracle@rac1
Repeat this step on all four RAC nodes (not required by GULM lock servers), collect up all the ~/.ssh/id_dsa.pub files into one ~/.ssh/authorized_keys file and distribute this to the other three nodes:
ssh rac2 cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys ssh rac3 cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys ssh rac4 cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys scp ~/.ssh/authorized_keys rac2:~/.ssh scp ~/.ssh/authorized_keys rac3:~/.ssh scp ~/.ssh/authorized_keys rac4:~/.ssh
Run all cominations from all nodes for both PUBLIC and PRIVATE networks (including the node where you are currently executing):
rac1 $ ssh rac1 date rac1 $ ssh rac-priv date rac1 $ ssh rac2 date rac1 $ ssh rac2-priv date rac1 $ ssh rac3 date rac1 $ ssh rac3-priv date rac1 $ ssh rac4 date rac1 $ ssh rac4-priv date
Download the Clusterware and Database installation materials from OTN (Oracle Technology Network) as this is where the current base releases for all platforms are located. These are zipped, cpio files. Create a local installer directory on node1 (/home/oracle/inst) and then expand the archives:
gunzip -c 10201_clusterware_linux_x86_64.cpio.gz | cpio -ivdm &>log1 &
gunzip -c 10201_database_linux_x86_64.cpio.gz | cpio -ivdm &>log2 &
The installer can be run from any filesystem mounted on node1.
The Clusterware can be installed on each node locally, or on the shared_home. This is a production maintenance decision. A single shared Clusterware home is clearly less complex, but requires the entire cluster to shutdown when you do a Clusterware upgrade. Node-local Clusterware gives you the ability to do rolling upgrades, but with some added maintenance costs. This sample cluster will perform a single shared Clusterware install, so directories should be created and owned prior to running the installer
rac1 $ sudo mkdir /mnt/ohome/oracle rac1 $ sudo chown oracle:dba /mnt/ohome/oracle
In this example, the remote hostname where the X Windows will appear is called adminws. For X11, xhost + must be executed on adminws from any session running on this system. A shell window on adminws will login to lock1 and must have the DISPLAY environment variable set either upon login or in some profile:
rac1 $ export DISPLAY=adminws:0.0
Run xclock, to make sure that the X11 clock program appears on the adminws desktop.
Although, you can have ORACLE_BASE, ORACLE_HOME pre-set in the oracle user profile prior to running the installer, it is not mandatory. In our case, it is set to point to the shared Oracle home location that is a 6GB GFS volume. The installer will detect these values if they are set:
export ORACLE_BASE=/mnt/ohome/oracle/1010 export ORACLE_HOME=/mnt/ohome/oracle/1010/product/db
The script /home/oracle/inst/clusterware/rootpre/rootpre.sh checks to see if a previous version of Clusterware has been installed. Once this script executes successfully, then it is safe to start up the Clusterware installer:
/home/oracle/inst/clusterware/runInstaller
******************************************************************************** Please run the script rootpre.sh as root on all machines/nodes. The script can be found at the top level of the local installer directory. Once you have run the script, please type Y to proceed Answer 'y' if root has run 'rootpre.sh' so you can proceed with Oracle Clusterware installation. Answer 'n' to abort installation and then ask root to run 'rootpre.sh'. ******************************************************************************** Has 'rootpre.sh' been run by root? [y/n] (n) y Starting Oracle Universal Installer...
Verify that $ORACLE_BASE/oraInventory is located on the shared GFS volume (/mnt/ohome). If you want an inventory on each node for CRS or the RDBMS, you would need to type in a node local directory (/opt/oracle/1010/oraInventory), but you have to insure the directory is created and owned by the oracle user before you click Next.
This screen’s default path will need to be changed, as it wants to put the CRSHOME in ORACLE_HOME. This install is a single, shared CRS install, so the path is on the shared GFS volume. The name was simplified to just crs. Click Next.
Prerequisite checks run and since we have done our preparation work in the file /etc/sysctl.conf, then we expect no errors or warnings.
Click Next.
Click Next.
Next, the other three nodes need to be added to the cluster configuration. All of these hosts must be defined in /etc/hosts on all nodes.
Click OK.
The completed configuration screen should contain all four nodes.
Click Next.
This is the step that fails if any part of the ssh hostname date set up was not performed correctly.
If the /etc/hosts, ~/.ssh/authorized_keys and ~/.ssh/known_hosts are all properly setup, then the installer should proceed to the next screen. Fully qualified hostnames can sometimes cause confusion, so the public network hostnames entered into the Clusterware installer must match the string that is returned from (hostname. Otherwise, go back and verify the entire matrix of ssh hostname date calls to make sure all these paths are clean. Often the self-referential ones are missed, ssh rac1 date from rac1 itself.
Edit the eth0 fabric and change the interface type to Public and click Next.
Click OK.
Click Next.
Assign the quorum voting and registry files. The option external redundancy is chosen as the files reside on a storage array that implements redundancy.
The quorum vote disk will be located on /dev/raw/raw2. Once again, external redundancy is chosen. Click Next.
The next screen is the Install Summary screen. Click Install.
The installer starts to install, link and copy. This process typically takes less than 10 minutes depending on the performance of the CPU and the filesystem.
This screen prompts for 2 sets of scripts to be run on all 4 nodes. Run the orainstRoot.sh script first on each node, in order.
rac1 $ sudo /mnt/ohome/oracle/1010/oraInventory/orainstRoot.sh
Password: Changing permissions of /mnt/ohome/oracle/1010/oraInventory to 770. Changing groupname of /mnt/ohome/oracle/1010/oraInventory to dba. The execution of the script is complete
The script /mnt/ohome/oracle/1010/product/crs/root.sh must be run on every node, one at a time, starting with rac1. You must wait until this script completes successfully on a given node before you can execute it on the next node. This script can take several minutes to complete per node, so be patient. This script will initialize the files, configure RHEL to run the Oracle Clusterware kernel services (including appending services to /etc/inittab) and then start these services up. Only the first execution of this script will initialize the registry and quorum disk files.
rac1 $ sudo /mnt/ohome/oracle/1010/product/crs/root.sh
WARNING: directory '/mnt/ohome/oracle/1010/product' is not owned by root WARNING: directory '/mnt/ohome/oracle/1010' is not owned by root WARNING: directory '/mnt/ohome/oracle is not owned by root Checking to see if Oracle CRS stack is already configured Setting the permissions on OCR backup directory Setting up NS directories Oracle Cluster Registry configuration upgraded successfully WARNING: directory '/mnt/ohome/oracle/1010/product' is not owned by root WARNING: directory '/mnt/ohome/oracle/1010' is not owned by root WARNING: directory '/mnt/ohome/oracle is not owned by root Successfully accumulated necessary OCR keys. Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897. node FIXME node 1: rac1 rac1-priv rac1 node 2: rac2 rac2-priv rac2 node 3: rac3 rac3-priv rac3 node 4: rac4 rac4-priv rac4 Creating OCR keys for user 'root', privgrp 'root'.. Operation successful. Now formatting voting device: /dev/raw/raw2 Format of 1 voting devices complete. Startup will be queued to init within 90 seconds. Adding daemons to inittab Expecting the CRS daemons to be up within 600 seconds. CSS is active on these nodes. rac1 CSS is inactive on these nodes. rac2 rac3 rac4 Local node checking complete.
Run /mnt/ohome/oracle/1010/product/crs/root.sh on the remaining nodes. As this script executes on the other nodes, the last few lines should change to indicate that more nodes are active. These last few lines are from the command crsctl check install:
CSS is active on these nodes. rac1 rac2 CSS is inactive on these nodes. rac3 rac4 Local node checking complete.
If successful, the completion of the script on the fourth node should indicate that CSS is running on all nodes
CSS is active on these nodes. rac1 rac2 rac3 rac4 CSS is active on all nodes.
Return to the main installer screen and click OK. Most of the verification and installation checks should pass.
If not, or if this pop-up occurs then is it likely the CRS application registration has failed to start up. This is usually due to it not finding the tool in the path, but this can be fixed by running the vipca utility from rac1 once you quit the installer. Click OK to the pop-up and Next for the Configuration Assistants screen.
The crs_stat command will display any registered CRS resources. There are currenlty none, so the vipca utility will need to be executed next.
rac1 $ crs_stat –t
CRS-0202: No resources are registered.
The environment variable $ORA_CRS_HOME should be added to the oracle user profile and vipca must run as root.
rac1 $ export CRS_HOME=/mnt/ohome/oracle/1010/product/crs rac1 $ sudo $CRS_HOME/bin/vipca
Click Next on this window and the next one. Then the hostnames mapping window appears:
Fill in the first IP Alias name and press Tab. The tool should fill in the rest.
Click Next and a summary screen appears. Click OK.
The final window should be:
Click Exit and then rerun the status command.
rac1 $ crs_stat -t
Name Type Target State Host ------------------------------------------------------------ ora.rac1.gsd application ONLINE ONLINE rac1 ora.rac1.ons application ONLINE ONLINE rac1 ora.rac1.vip application ONLINE ONLINE rac1 ora.rac2.gsd application ONLINE ONLINE rac2 ora.rac2.ons application ONLINE ONLINE rac2 ora.rac2.vip application ONLINE ONLINE rac2 ora.rac3.gsd application ONLINE ONLINE rac3 ora.rac3.ons application ONLINE ONLINE rac3 ora.rac3.vip application ONLINE ONLINE rac3 ora.rac4.gsd application ONLINE ONLINE rac4 ora.rac4.ons application ONLINE ONLINE rac4 ora.rac4.vip application ONLINE ONLINE rac4