Thursday, March 7, 2013

Part 2: Command Categories, Configuration, and Basic Commands

by Arup Nanda

In Part 1, we learned about the composition of the Oracle Exadata Database Machine and its various components. Figure 1 shows the different components again and what types of commands are used in each.
exadata-commands-p2f1
Figure 1 Command categories
  • Linux commands - Let’s start with the lowest-level component – the physical disk. The physical disk, as you learned from the previous installment, is the actual disk drive. It has to be partitioned to be used for ASM and regular filesystem. Normal disk management commands come here, e.g. fdisk. The storage cells are Linux servers; so all the regular Unix server administration tasks – shutdown, ps, etc., are relevant here. (For a refresher on Linux commands, you can check out my five-part series on advanced Linux commands.)
  • CellCLI - Let’s move on the next stack in the software: the Exadata Storage Server. To manage this, Oracle provides a command line tool: CellCLI (Cell Command Line Interpreter). All the cell-related commands are entered through the CellCLI.
  • DCLI - The scope of the CellCLI command is the cell where it is run, not in other cells. Sometimes you may want to execute a command across multiple cells from one command prompt, e.g. shutting down multiple nodes. There is another command line tool for that: DCLI.
  • SQL – Once the cell disks are made available to the database nodes, the rest of the work is similar to what happens in a typical Oracle RAC database, in the language you use every day: SQL. SQL*Plus is an interface many DBAs use. You can also use other interfaces such as Oracle SQL Developer. If you have Grid Control, there are lots of commands you don’t even need to remember; they will be GUI based.
  • ASMCMD – ASMCMD this is the command line interface for managing ASM resources like diskgroups, backups, etc.
  • SRVCTL – SRVTCL is a command-line interface to manage Oracle Database 11.2 RAC Clusters. At the database level, most of the commands related to cluster, e.g. starting/stopping cluster resources, checking for status, etc. can be done through this interface.
  • CRSCTL – CRSCTL is another tool to manage clusters. As of 11.2, the need to use this tool has dwindled to near zero. But there is at least one command in this category.

These are the basic categories of the commands. Of these only CellCLI and DCLI are Exadata specific. The rest, especially SQL, should be very familiar to DBAs.

Now that you know how narrow the scope of the commands is, do you feel a bit more relaxed? In the next sections we will see how these commands are used. (Note: Since CellCLI and DCLI are Exadata-specific commands, most DBAs making the transition to DMA are not expected to know about them. The next installment of the series – Part 3 –focuses on these two command categories exclusively.)

Configuration

Let’s start with the most exciting part: Your shiny new Exadata Database Machine is here, uncrated, mounted on the floorboards and connected to power. Now what?
Fortunately, the machine comes pre-imaged with all the necessary OS, software and drivers. There is no reason to tinker with the software installation. In fact, it’s not only unnecessary but dangerous as well, since it may void the warranty. You should not install any software on storage cells at all, and only the following on the database servers themselves:
  • Grid Control Agent (required for management through Grid Control, explained in Part 4)
  • RMAN Media Management Library (to back up to tape)
  • Security Agent (if needed)
You are itching to push that button, aren’t you? But wait; before you start the configuration you have to have the following information handy:
  • Network – you should decide what names you will use for the servers, decide on IP addresses,  have them in DNS, etc.
  • SMTP and SNMP information - for sending mails, alerts, etc.
  • Storage layout to address your specific requirements – for instance do you want Normal or High Redundancy, how many diskgroups do you want, what do you want to name them, etc.?
Once all these are done, here are the rough steps:
  1. Storage configuration
  2. OS configuration
  3. Creation of userids in Linux or Oracle Solaris
  4. ASM configuration
  5. Clusterware installation
  6. Database creation
Let’s examine the steps. Please note with several models, capacity classes, and types of hardware, it is not possible to provide details about all the possible combinations. Your specific environment may be unique as well.

The following section shows a sample configuration and should be followed as an illustration only. For simplicity, the OS covered here is Oracle Linux.

Configuration Worksheet

Oracle provides a detailed configuration worksheet that allows you to enter specific details of your implementation and decide on exact configuration. This worksheet is found in Exadata storage server in the following directory :

opt/oracle/cell/doc/doc
The exact file you want to open is e16099.pdf, which has all the worksheets to guide you how to configure. Here is an excerpt from the worksheet:
exadata-commands-p2f2
Figure 2 Worksheet excerpt

The configuration worksheet creates the following files in the directory /opt/oracle.SupportTools/onecommand. Here is a listing of that directory:
# ls
all_group           cell_group     config.dat    patches
all_ib_group        cell_ib_group  dbs_group     priv_ib_group
all_nodelist_group  checkip.sh     dbs_ib_group  tmp

These files are very important. Here is a brief description of each file:
File Name Description
all_groupList of database nodes and storage cells in this Exadata Database Machine. Here is an excerpt:

proldb01
proldb02
proldb03
proldb04


These are the database server nodes.
all_ib_groupAll host names of the private interconnects, both of cell servers and database nodes. Here is an excerpt from this file:

proldb01-priv
proldb02-priv
proldb03-priv
proldb04-priv
proldb05-priv
all_nodelist_groupAll host names – public, hosts, private interconnects – of both storage and database nodes. Here is an excerpt from this file:

proldb07
proldb08
prolcel01
prolcel02
prolcel03
cell_groupHost names of all cell servers. Here is an excerpt from this file:

prolcel01
prolcel02
prolcel03
prolcel04
prolcel05
cell_ib_groupHostnames of private interconnects of all cell servers. Here is an excerpt from this file:

prolcel01-priv
prolcel02-priv
prolcel03-priv
prolcel04-priv
prolcel05-priv
config.datThe data file that is created from the configuration worksheet and is used to create the various scripts. Here is an excerpt from this file:

customername=AcmeBank
dbmprefix=prol
cnbase=db
cellbase=cel
machinemodel=X2-2 Full rack
dbnodecount=8
cellnodecount=14
dbs_groupHostnames of the database nodes, similar to the cell servers. Here is an excerpt from the file:

proldb01
proldb02
proldb03
proldb04
dbs_ib_groupHostnames of private interconnects of the database nodes, similar to the cell servers. Here is an excerpt from the file:

proldb01-priv
proldb02-priv
proldb03-priv
proldb04-priv
priv_ib_groupAll private interconnect hostnames and their corresponding IP addresses are listed in this file. This is used to populate /etc/hosts file. Here is an excerpt from the file:

### Compute Node Private Interface details
172.32.128.1    proldb01-priv.test.prol proldb01-priv
172.32.128.2    proldb02-priv.test.prol proldb02-priv
172.32.128.3    proldb03-priv.test.prol proldb03-priv
172.32.128.4    proldb04-priv.test.prol proldb04-priv
checkip.shThis is a shell script to validate the accuracy of the network configuration. This is one of the most important files. The chckip script is called at multiple places with different parameters as you will see to perform validation at multiple places.


Hardware Profile

The next thing to do is to check the hardware profile. Oracle provides a tool for that as well. This is the command you should use:
# /opt/oracle.SupportTools/CheckHWnFWProfile
The output should be:
[SUCCESS] The hardware and firmware profile matches one of the supported profiles
If you see something different here, the message should be self-explanatory. The right thing to do at this point is to call up Exadata installation support since some hardware/software combination is not as expected.

Physical Disks

Next, you should check the disks to make sure they are up and online. Online does not mean they are available to ASM; it simply means the disks are visible to the server. To check the disks are visible and online, use this command:
# /opt/MegaRAID/MegaCli/MegaCli64 Pdlist -aAll |grep "Slot \|Firmware"
Here is truncated output:
Slot Number: 0
Firmware state: Online, Spun Up
Slot Number: 1
Firmware state: Online, Spun Up
… Output truncated …
Slot Number: 11
Firmware state: Online, Spun Up

If a disk is not online, you may want to replace it or at least understand the reason.

Flash Disks

After checking physical disks you should check flash disks. The Linux command for that is lsscsi, shown below.

# lsscsi |grep -i marvel
[1:0:0:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdm
[1:0:1:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdn
[1:0:2:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdo
[1:0:3:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdp
[2:0:0:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdq
[2:0:1:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdr
[2:0:2:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sds
[2:0:3:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdt
[3:0:0:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdu
[3:0:1:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdv
[3:0:2:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdw
[3:0:3:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdx
[4:0:0:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdy
[4:0:1:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdz
[4:0:2:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdaa
[4:0:3:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdab

By the way, you can also check the flashdisks from the CellCLI tool as well.  TheCellCLI tool is explainedin detail in the next installment in this series.
#cellcli
CellCLI: Release 11.2.2.2.0 - Production on Sun Mar 13 12:57:24 EDT 2011
Copyright (c) 2007, 2009, Oracle.  All rights reserved.
Cell Efficiency Ratio: 627M

CellCLI> list lun where disktype=flashdisk
         1_0     1_0     normal
         1_1     1_1     normal
         1_2     1_2     normal
         1_3     1_3     normal
         2_0     2_0     normal
         2_1     2_1     normal
         2_2     2_2     normal
         2_3     2_3     normal
         4_0     4_0     normal
         4_1     4_1     normal
         4_2     4_2     normal
         4_3     4_3     normal
         5_0     5_0     normal
         5_1     5_1     normal
         5_2     5_2     normal
         5_3     5_3     normal

To make sure the numbering of the flashdisks is correct, use the following command in CellCLI. Note that there is a hyphen (“-“) after the first line, since the command is too long to fit in one line and the “-“ is the continuation character.
CellCLI> list physicaldisk attributes name, id, slotnumber -
> where disktype="flashdisk" and status != "not present"

         [1:0:0:0]       5080020000f21a2FMOD0    "PCI Slot: 4; FDOM: 0"
         [1:0:1:0]       5080020000f21a2FMOD1    "PCI Slot: 4; FDOM: 1"
         [1:0:2:0]       5080020000f21a2FMOD2    "PCI Slot: 4; FDOM: 2"
         [1:0:3:0]       5080020000f21a2FMOD3    "PCI Slot: 4; FDOM: 3"
         [2:0:0:0]       5080020000f131aFMOD0    "PCI Slot: 1; FDOM: 0"
         [2:0:1:0]       5080020000f131aFMOD1    "PCI Slot: 1; FDOM: 1"
         [2:0:2:0]       5080020000f131aFMOD2    "PCI Slot: 1; FDOM: 2"
         [2:0:3:0]       5080020000f131aFMOD3    "PCI Slot: 1; FDOM: 3"
         [3:0:0:0]       5080020000f3ec2FMOD0    "PCI Slot: 5; FDOM: 0"
         [3:0:1:0]       5080020000f3ec2FMOD1    "PCI Slot: 5; FDOM: 1"
         [3:0:2:0]       5080020000f3ec2FMOD2    "PCI Slot: 5; FDOM: 2"
         [3:0:3:0]       5080020000f3ec2FMOD3    "PCI Slot: 5; FDOM: 3"
         [4:0:0:0]       5080020000f3e16FMOD0    "PCI Slot: 2; FDOM: 0"
         [4:0:1:0]       5080020000f3e16FMOD1    "PCI Slot: 2; FDOM: 1"
         [4:0:2:0]       5080020000f3e16FMOD2    "PCI Slot: 2; FDOM: 2"
         [4:0:3:0]       5080020000f3e16FMOD3    "PCI Slot: 2; FDOM: 3"

Auto-configuration

While it is possible to configure Exadata Database Machine manually, you don’t need to. In fact, you may not want to. Oracle provides three shell scripts for automatic configuration in the directory /opt/oracle.SupportTools/onecommand (these steps may change in later versions):
  • check_ip.sh – for checking the configuration at various stages
  • applyconfig.sh – to change the configuration
  • deploy112.sh – for final deployment

First, you should check the configuration for validity.  To do that execute:

# check_ip.sh -m pre_applyconfig
Exadata Database Machine Network Verification version 1.9
Network verification mode pre_applyconfig starting ...
Saving output file from previous run as dbm.out_17739
Using name server xx.xxx.59.21 found in dbm.dat for all DNS lookups
Processing section DOMAIN  : SUCCESS
Processing section NAME    : SUCCESS
Processing section NTP     : SUCCESS
Processing section GATEWAY : SUCCESS
Processing section SCAN    : ERROR - see dbm.out for details
Processing section COMPUTE : ERROR - see dbm.out for details
Processing section CELL    : ERROR - see dbm.out for details
Processing section ILOM    : ERROR - see dbm.out for details
Processing section SWITCH  : ERROR - see dbm.out for details
Processing section VIP     : ERROR - see dbm.out for details
Processing section SMTP    : SMTP "Email Server Settings" standardrelay.acmehotels.com 25:0
SUCCESS

One or more checks report ERROR. Review dbm.out for details
If you check the file dbm.out, you can see the exact error messages.

Running in mode pre_applyconfig
Using name server xx.xxx.59.21 found in dbm.dat for all DNS lookups

Processing section DOMAIN
test.prol

Processing section NAME
GOOD : xx.xxx.59.21 responds to resolve request for proldb01.test.prol
GOOD : xx.xxx.59.22 responds to resolve request for proldb01.test.prol

Processing section NTP
GOOD : xx.xxx.192.1 responds to time server query (/usr/sbin/ntpdate -q)

Processing section GATEWAY
GOOD : xx.xxx.192.1 pings successfully
GOOD : xx.xxx.18.1 pings successfully

Processing section SCAN
GOOD : prol-scan.test.prol resolves to 3 IP addresses
GOOD : prol-scan.test.prol forward resolves to xx.xxx.18.32
GOOD : xx.xxx.18.32 reverse resolves to prol-scan.test.prol.
ERROR : xx.xxx.18.32 pings
GOOD : prol-scan.test.prol forward resolves to xx.xxx.18.33
GOOD : xx.xxx.18.33 reverse resolves to prol-scan.test.prol.
ERROR : xx.xxx.18.33 pings
GOOD : prol-scan.test.prol forward resolves to xx.xxx.18.34
GOOD : xx.xxx.18.34 reverse resolves to prol-scan.test.prol.
ERROR : xx.xxx.18.34 pings

Processing section COMPUTE
GOOD : proldb01.test.prol forward resolves to xx.xxx.192.16
GOOD : xx.xxx.192.16 reverse resolves to proldb01.test.prol.
ERROR : xx.xxx.192.16 pings
GOOD : proldb02.test.prol forward resolves to xx.xxx.192.17
GOOD : xx.xxx.192.17 reverse resolves to proldb02.test.prol.
ERROR : xx.xxx.192.17 pings
GOOD : proldb03.test.prol forward resolves to xx.xxx.192.18
GOOD : xx.xxx.192.18 reverse resolves to proldb03.test.prol.
ERROR : xx.xxx.192.18 pings
… output truncated …

It will report all issues that must be addressed. After addressing all issues, execute the actual configuration:

# applyconfig.sh
After it completes, connect the Exadata Database Machine to your network and check for the validity:

# check_ip.sh -m post_applyconfig
It will report the output in the same manner as the pre_applyconfig parameter and will report any issue, if present. After fixing the issues, run the deployment script. That script actually executes several steps inside it – 29 in all. The most prudent thing to do is to first list out all the steps so that you can be familiar with them. The option -l (that’s the letter “l”; not the numeral “1”) displays all the steps in the list.

# deploy112.sh –l
To run all the steps you should issue

# deploy112.sh –i
If you would prefer, you can run steps one by one, or groups at a time. To run steps 1 through 3, issue:

# deploy112.sh –i -r 1-3
Or, to run only step 1:

# deploy112.sh -i -s 1
The steps are listed here. (Please note: the steps can change without notice. The most up-to-date list will always be found in the release notes that come with an Exadata box.)
Step Description
0Validate this server setup
1Setup SSH for the root user.
2Validate all nodes.
3Unzip files.
4Update the /etc/hosts directory.
5Create the cellip.ora and cellinit.ora files
6Validate the hardware.
7Validate the InfiniBand network.
8Validate the cells.
9Check RDS using the ping command.
10Run the CALIBRATE command.
11Validate the time and date.
12Update the configuration.
13Create the user accounts for celladmin and cellmonitor.
14Set up SSH for the user accounts.
15Create the Oracle home directories.
16Create the grid disks.
17Install the grid software.
18Run the grid root scripts.
19Install the Oracle Database software.
20Create the listener.
21Run Oracle ASM configuration assistant to configure Oracle ASM.
22Unlock the Oracle Grid Infrastructure home directory.
23Relink Reliable Data Socket (RDS) protocol.
24Lock Oracle Grid Infrastructure.
25Set up e-mail alerts for Exadata Cells.
26Run Oracle Database Configuration Assistant.
27Set up Oracle Enterprise Manager Grid Control.
28Apply any security fixes.
29Secure Oracle Exadata Database Machine.


Here is the output of the script (amply truncated at places to conserve space):

# ./deploy112.sh -i
Script started, file is /opt/oracle.SupportTools/onecommand/tmp/STEP-0-proldb01-20110331154414.log
=========== 0 ValidateThisNodeSetup Begin ===============
Validating first boot...
This step will validate DNS, NTS, params.sh, dbmachine.params, and all the
files generated by the DB Machine Configurator
In Check and Fix Hosts...

INFO: This nslookup could take upto ten seconds to resolve if the host isn't in DNS, please wait..
INFO: Running /usr/bin/nslookup prol-scan...
INFO: Running /usr/bin/nslookup proldb02...
SUCCESS: SCAN and VIP found in DNS...
Looking up nodes in dbmachine.params and dbs_group...
SUCCESS: proldb01 has ip address of xx.xxx.192.16..A_OK
SUCCESS: proldb02 has ip address of xx.xxx.192.17..A_OK
… output truncated …
SUCCESS: proldb08 has ip address of xx.xxx.192.23..A_OK
SUCCESS: prol01-vip has ip address of xx.xxx.18.24..A_OK
SUCCESS: Found IP Address xx.xxx.18.24 for prol01-vip using ping...
SUCCESS: Based on bondeth0:xx.xxx.18.16 and NetMask:255.255.255.0 we picked bondeth0 as the appropriate VIP interface
SUCCESS: prol02-vip has ip address of xx.xxx.18.25..A_OK
SUCCESS: Found IP Address xx.xxx.18.24 for prol01-vip using ping...
SUCCESS: Based on bondeth0:xx.xxx.18.16 and NetMask:255.255.255.0 we picked bondeth0 as the appropriate VIP interface
… output truncated …
SUCCESS: prol08-vip has ip address of xx.xxx.18.31..A_OK
SUCCESS: Found IP Address xx.xxx.18.24 for prol01-vip using ping...
SUCCESS: Based on bondeth0:xx.xxx.18.16 and NetMask:255.255.255.0 we picked bondeth0 as the appropriate VIP interface
Checking blocksizes...
SUCCESS: DB blocksize is 16384 checks out
checking patches
checking patches and version = 11202
SUCCESS: Located patch# 10252487 in /opt/oracle.SupportTools/onecommand/patches...
INFO: Checking zip files
INFO: Validating zip file /opt/oracle.SupportTools/onecommand/p10098816_112020_Linux-x86-64_1of7.zip...
Archive:  /opt/oracle.SupportTools/onecommand/p10098816_112020_Linux-x86-64_1of7.zip
  Length     Date   Time    Name
 --------    ----   ----    ----
        0  11-16-10 03:10   database/
        0  11-16-10 03:03   database/install/
      182  11-16-10 03:03   database/install/detachHome.sh
… output truncated …
    41092  11-16-10 03:03   database/doc/install.112/e17212/concepts.htm
     1892  11-16-10 03:03   database/doc/install.112/e17212/contents.js
    44576  11-16-10 03:03   database/doc/install.112/e17212/crsunix.htm
ERROR: /usr/bin/unzip -l /opt/oracle.SupportTools/onecommand/p10098816_112020_Linux-x86-64_1of7.zip did not complete successfully: Return Status: 80 Step# 1
Exiting...
Time spent in step 1  = 1 seconds
INFO: Going to run /opt/oracle.cellos/ipconf /opt/oracle.SupportTools/onecommand/preconf-11-2-1-2-2.csv -verify -ignoremismatch -verbose to validate first boot...
INFO: Running /opt/oracle.cellos/ipconf -verify -ignoremismatch -verbose on this node...
Verifying of configuration for /opt/oracle.cellos/cell.conf
Config file exists                                                : PASSED
Load configuration                                                : PASSED
Config version defined                                            : PASSED
Config version 11.2.2.1.1 has valid value                         : PASSED
Nameserver xx.xxx.59.21 has valid IP address syntax               : PASSED
Nameserver xx.xxx.59.22 has valid IP address syntax               : PASSED
Canonical hostname defined                                        : PASSED
Canonical hostname has valid syntax                               : PASSED
Node type defined                                                 : PASSED
Node type db is valid                                             : PASSED
This node type is db                                              : PASSED
Timezone defined                                                  : PASSED
Timezone found in /usr/share/zoneinfo                             : PASSED
NTP server xx.xxx.192.1 has valid syntax                          : PASSED
NTP drift file defined                                            : PASSED
Network eth0 interface defined                                    : PASSED
IP address defined for eth0                                       : PASSED
IP address has valid syntax for eth0                              : PASSED
Netmask defined for eth0                                          : PASSED
Netmask has valid syntax for eth0                                 : PASSED
Gateway has valid syntax for eth0                                 : PASSED
Gateway is inside network for eth0                                : PASSED
Network type defined for eth0                                     : PASSED
Network type has proper value for eth0                            : PASSED
Hostname defined for eth0                                         : PASSED
Hostname for eth0 has valid syntax                                : PASSED
Network bondeth0 interface defined                                : PASSED
IP address defined for bondeth0                                   : PASSED
IP address has valid syntax for bondeth0                          : PASSED
Netmask defined for bondeth0                                      : PASSED
Netmask has valid syntax for bondeth0                             : PASSED
Gateway has valid syntax for bondeth0                             : PASSED
Gateway is inside network for bondeth0                            : PASSED
Network type defined for bondeth0                                 : PASSED
Network type has proper value for bondeth0                        : PASSED
Hostname defined for bondeth0                                     : PASSED
Hostname for bondeth0 has valid syntax                            : PASSED
Slave interfaces for bondeth0 defined                             : PASSED
Two slave interfaces for bondeth0 defined                         : PASSED
Master interface ib0 defined                                      : PASSED
Master interface ib1 defined                                      : PASSED
Network bondib0 interface defined                                 : PASSED
IP address defined for bondib0                                    : PASSED
IP address has valid syntax for bondib0                           : PASSED
Netmask defined for bondib0                                       : PASSED
Netmask has valid syntax for bondib0                              : PASSED
Network type defined for bondib0                                  : PASSED
Network type has proper value for bondib0                         : PASSED
Hostname defined for bondib0                                      : PASSED
Hostname for bondib0 has valid syntax                             : PASSED
Slave interfaces for bondib0 defined                              : PASSED
Two slave interfaces for bondib0 defined                          : PASSED
At least 1 configured Eth or bond over Eth interface(s) defined   : PASSED
2 configured Infiniband interfaces defined                        : PASSED
1 configured bond over ib interface(s) defined                    : PASSED
ILOM hostname defined                                             : PASSED
ILOM hostname has valid syntax                                    : PASSED
ILOM short hostname defined                                       : PASSED
ILOM DNS search defined                                           : PASSED
ILOM full hostname matches short hostname and DNS search          : PASSED
ILOM IP address defined                                           : PASSED
ILOM IP address has valid syntax                                  : PASSED
ILOM Netmask defined                                              : PASSED
ILOM Netmask has valid syntax                                     : PASSED
ILOM Gateway has valid syntax                                     : PASSED
ILOM Gateway is inside network                                    : PASSED
ILOM nameserver has valid IP address syntax                       : PASSED
ILOM use NTP servers defined                                      : PASSED
ILOM use NTP has valid syntax                                     : PASSED
ILOM first NTP server has non-empty value                         : PASSED
ILOM first NTP server has valid syntax                            : PASSED
ILOM timezone defined                                             : PASSED
Done. Config OK
INFO: Printing group files....
######################################################
This is the list of Database nodes...
proldb01
… output truncated …
proldb08
This is the list of Cell nodes...
prolcel01
… output truncated …
prolcel14

This is the list of Database Private node names...
proldb01-priv
… output truncated …
proldb08-priv

This is the list of Cell Private node names...
prolcel01-priv
… output truncated …
prolcel14-priv

This is the list all node names...
proldb01
… output truncated …
prolcel14

This is the list all private node names...
proldb01-priv
… output truncated …
prolcel14-priv

This is the template /etc/hosts file for private nodes...
### Compute Node Private Interface details
172.32.128.1    proldb01-priv.test.prol proldb01-priv
… output truncated …
172.32.128.8    proldb08-priv.test.prol proldb08-priv

### CELL Node Private Interface details
172.32.128.9    prolcel01-priv.test.prol        prolcel01-priv
… output truncated …
172.32.128.22   prolcel14-priv.test.prol        prolcel14-priv

### Switch details
# The following 5 IP addresses are for reference only. You may
# not be able to reach these IP addresses from this machine
# xx.xxx.192.60 prolsw-kvm.test.prol    prolsw-kvm
# xx.xxx.192.61 prolsw-ip.test.prol     prolsw-ip
# xx.xxx.192.62 prolsw-ib1.test.prol    prolsw-ib1
# xx.xxx.192.63 prolsw-ib2.test.prol    prolsw-ib2
# xx.xxx.192.64 prolsw-ib3.test.prol    prolsw-ib3
Creating work directories and validating  required files
ERROR: Please review and fix all ERROR's, we appear to have 1 errors...
Exiting...
Time spent in step 0 ValidateThisNodeSetup = 1 seconds
Script done, file is /opt/oracle.SupportTools/onecommand/tmp/STEP-0-proldb01-20110331154414.log

Check post-deployment configuration for IP addresses.
# ./checkip.sh -m post_deploy112

Exadata Database Machine Network Verification version 1.9

Network verification mode post_deploy112 starting ...
Saving output file from previous run as dbm.out_772
Using name server xx.xxx.59.21 found in dbm.dat for all DNS lookups
Processing section DOMAIN  : SUCCESS
Processing section NAME    : SUCCESS
Processing section NTP     : SUCCESS
Processing section GATEWAY : SUCCESS
Processing section SCAN    : SUCCESS
Processing section COMPUTE : SUCCESS
Processing section CELL    : SUCCESS
Processing section ILOM    : SUCCESS
Processing section SWITCH  : SUCCESS
Processing section VIP     : SUCCESS
Processing section SMTP    : SMTP "Email Server Settings" standardrelay.acmehotels.com 25:0
SUCCESS

If everything should come back OK, your installation and configuration was successful.

Basic Commands

Power

Let’s start by understanding some very first commands you will need: powering on and off. The command for that is IPMITOOL. To power on a cell or database server, issue this from another server:

#  ipmitool -H prolcel01-ilom -U root chassis power on
IPMI – short for Intelligent Platform Management Interface - is an interface standard that allows remote management of a server from another using standardized interface. The servers in the Exadata Database Machine follow that. It’s not an Exadata command but rather a general Linux one. To get all the options available, execute:

# ipmitool –h
To stop a server, use the shutdown command. To stop immediately and keep it down, i.e. not reboot, execute:

# shutdown -h -y now
To shut down after 10 minutes (the users will get a warning message)

# shutdown -h -y 10
To reboot the server (the “-r” option is for reboot)

# shutdown –r –y now
Or, a simple:

# reboot
Sometimes you may want to shutdown multiple servers. The DCLI command comes handy that time. To shut down all the cells, execute the command:

# dcli -l root -g all_cells shutdown -h -y now
The –g option allows you to give a filename containing all the cell servers. For instance all_cells is a file as shown below:

# cat all_cells
prolcel01
prolcel02
prolcel03
prolcel04
prolcel05
prolcel06
prolcel07
prolcel08

You could use a similar file for all database servers and name it all_nodes. To shutdown all database servers:

# dcli -l root -g all_nodes shutdown -h -y now
You will learn the DCLI command in detail in the next installment.

Maintenance

From time to time you will need to maintain the servers. (Remember, you are the DMA now, not the DBA.) One of the most common tasks is to install new software Images. Let’s see some of the related commands.

To learn what software image is installed, use the following:

# imageinfo
Kernel version: 2.6.18-194.3.1.0.3.el5 #1 SMP Tue Aug 31 22:41:13 EDT 2010 x86_64
Cell version: OSS_11.2.0.3.0_LINUX.X64_101206.2
Cell rpm version: cell-11.2.2.2.0_LINUX.X64_101206.2-1

Active image version: 11.2.2.2.0.101206.2
Active image activated: 2011-01-21 14:09:21 -0800
Active image status: success
Active system partition on device: /dev/md5
Active software partition on device: /dev/md7

In partition rollback: Impossible
Cell boot usb partition: /dev/sdac1
Cell boot usb version: 11.2.2.2.0.101206.2

Inactive image version: undefined
Rollback to the inactive partitions: Impossible

You can glean some important information from the output above. Note the line Active image version: 11.2.2.2.0.101206.2, which indicates the specific Exadata Storage Server version. It also shows the date and time the software image was activated, which can be used to troubleshoot. If you see problems occurring from a specific date and time, you may be able to correlate.

On the heels of the above, the next logical question could be, if a new image was installed (activated), what was the version before this. To find out the history of all the image changes, you can use the imagehistory command.

# imagehistory
Version                              : 11.2.2.2.0.101206.2
Image activation date                : 2011-01-21 14:09:21 -0800
Imaging mode                         : fresh
Imaging status                       : success

This is a fresh install, so you don’t see much of history.

Managing Infiniband

For the newly minted DMA nothing is as rattling as the networking commands. It’s like being given a stick-shift car when all you have ever driven is an automatic.
As DBAs you probably didn’t have to execute anything other than ifconfig and netstat. Well, they still apply; so don’t forget that. But let’s see how to extend that knowledge to infiniband.

Status

To get the status of the Infiniband services. First to check the status of the infiniband devices, use the ibstatus command.

# ibstatus
Infiniband device 'mlx4_0' port 1 status:
        default gid:     fe80:0000:0000:0000:0021:2800:01a0:fd45
        base lid:        0x1a
        sm lid:          0xc
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            40 Gb/sec (4X QDR)

Infiniband device 'mlx4_0' port 2 status:
        default gid:     fe80:0000:0000:0000:0021:2800:01a0:fd46
        base lid:        0x1c
        sm lid:          0xc
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            40 Gb/sec (4X QDR)
… output truncated …

If it comes out OK, the next step is to check the status of the Infiniband Link, using the iblinkinfo. Here is a truncated output to save space.

# iblinkinfo
Switch 0x0021286cd6ffa0a0 Sun DCS 36 QDR switch prolsw-ib1.test.prol:
           1    1[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )
           1    2[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )
… output truncated …
           1   17[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )
           1   18[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )
           1   19[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>      12   32[  ] "Sun DCS 36 QDR switch localhost" ( )
           1   20[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )
           1   21[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>      11   32[  ] "Sun DCS 36 QDR switch prolsw-ib2.test.prol" ( )
… output truncated …
           1   36[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )
Switch 0x0021286cd6eba0a0 Sun DCS 36 QDR switch localhost:
          12    1[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>      43    2[  ] "prolcel02 C 172.32.128.10 HCA-1" ( )
… output truncated …
          12   11[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )
          12   12[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>      17    2[  ] "proldb04 S 172.32.128.4 HCA-1" ( )
… output truncated …
          12   18[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>      11   17[  ] "Sun DCS 36 QDR switch prolsw-ib2.test.prol" ( )
          12   19[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>      20    1[  ] "prolcel13 C 172.32.128.21 HCA-1" ( )
… output truncated …
          12   29[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )
          12   30[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>       6    1[  ] "proldb05 S 172.32.128.5 HCA-1" ( )
          12   31[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>      11   31[  ] "Sun DCS 36 QDR switch prolsw-ib2.test.prol" ( )
          12   32[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>       1   19[  ] "Sun DCS 36 QDR switch prolsw-ib1.test.prol" ( )
          12   33[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )
… output truncated …
          12   36[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )
Switch 0x0021286ccc72a0a0 Sun DCS 36 QDR switch prolsw-ib2.test.prol:
          11    1[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>      42    1[  ] "prolcel02 C 172.32.128.10 HCA-1" ( )
… output truncated …
          11   10[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>      14    1[  ] "proldb02 S 172.32.128.2 HCA-1" ( )
          11   11[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )
… output truncated …
          11   28[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>       3    2[  ] "proldb07 S 172.32.128.7 HCA-1" ( )
          11   29[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )
          11   30[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>       7    2[  ] "proldb05 S 172.32.128.5 HCA-1" ( )
          11   31[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>      12   31[  ] "Sun DCS 36 QDR switch localhost" ( )
          11   32[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>       1   21[  ] "Sun DCS 36 QDR switch prolsw-ib1.test.prol" ( )
          11   33[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )
          11   34[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )
          11   35[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )
          11   36[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )

Topology

To get the topology of the infiniband network inside Exadata, use an Oracle supplied tool verify-topology, available in the directory /opt/oracle.SupportTools/ibdiagtools
# ./verify-topology.
        [ DB Machine Infiniband Cabling Topology Verification Tool ]
                [Version 11.2.1.3.b]

Looking at 1 rack(s).....
Spine switch check: Are any Exadata nodes connected ..............[SUCCESS]
Spine switch check: Any inter spine switch connections............[SUCCESS]
Spine switch check: Correct number of spine-leaf links............[SUCCESS]
Leaf switch check: Inter-leaf link check..........................[SUCCESS]
Leaf switch check: Correct number of leaf-spine connections.......[SUCCESS]
Check if all hosts have 2 CAs to different switches...............[SUCCESS]
Leaf switch check: cardinality and even distribution..............[SUCCESS]

Cluster Operations

To manage the Oracle Clusterware you use the same commands as you would in a traditional Oracle 11g Release 2 RAC database cluster. The commands are:
  • CRSCTL – for a few cluster related commands
  • SRVCTL – for most cluster related commands

CRSCTL is not used much but you need it for some occasions – mostly to shut down the cluster and to start up (if is not started automatically during the machine startup). Remember, you have to be root to issue this command. However, the root user may not have the location of this tool in its path. So, you should use its fully qualified patch while issuing the command. Here is the command to stop the cluster on all nodes:

# <OracleGridInfrastructureHome>/bin/crsctl stop cluster –all
You don’t need to shutdown the cluster on all nodes; sometimes all you need is to shut down the cluster on only one node. To shut down the cluster on one node alone, use:

# <OracleGridInfrastructureHome>/bin/crsctl stop cluster –n <HostName>
Similarly to start the cluster on one of the nodes where the cluster was initially stopped,

# <OracleGridInfrastructureHome>/bin/crsctl start cluster –n <HostName>
Finally, you may want to make sure all the cluster resources are running. Here is the command for that. The status command does not need to be issued by root.

# <OracleGridInfrastructureHome>/bin/crsctl status resource –t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS      
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DBFS_DG.dg
               ONLINE  ONLINE       proldb01                                    
               ONLINE  ONLINE       proldb02                                    
               ONLINE  ONLINE       proldb03                                    
               ONLINE  ONLINE       proldb04                                    
               ONLINE  ONLINE       proldb05                                    
               ONLINE  ONLINE       proldb06                                    
               ONLINE  ONLINE       proldb07                                    
               ONLINE  ONLINE       proldb08                                    
ora.PRODATA.dg
               ONLINE  ONLINE       proldb01                                    
               ONLINE  ONLINE       proldb02                                    
               ONLINE  ONLINE       proldb03                                    
               ONLINE  ONLINE       proldb04                                    
               ONLINE  ONLINE       proldb05                                    
               ONLINE  ONLINE       proldb06                                    
               ONLINE  ONLINE       proldb07                                    
               ONLINE  ONLINE       proldb08                                    
ora.PRORECO.dg
               ONLINE  ONLINE       proldb01                                    
               ONLINE  ONLINE       proldb02                                    
               ONLINE  ONLINE       proldb03                                    
               ONLINE  ONLINE       proldb04                                    
               ONLINE  ONLINE       proldb05                                    
               ONLINE  ONLINE       proldb06                                    
               ONLINE  ONLINE       proldb07                                    
               ONLINE  ONLINE       proldb08                                    
ora.LISTENER.lsnr
               ONLINE  ONLINE       proldb01                                    
               ONLINE  ONLINE       proldb02                                    
               ONLINE  ONLINE       proldb03                                    
               ONLINE  ONLINE       proldb04                                    
               ONLINE  ONLINE       proldb05                                    
               ONLINE  ONLINE       proldb06                                    
               ONLINE  ONLINE       proldb07                                    
               ONLINE  ONLINE       proldb08                                    
ora.asm
               ONLINE  ONLINE       proldb01                 Started            
               ONLINE  ONLINE       proldb02                 Started            
               ONLINE  ONLINE       proldb03                 Started            
               ONLINE  ONLINE       proldb04                 Started            
               ONLINE  ONLINE       proldb05                 Started            
               ONLINE  ONLINE       proldb06                 Started            
               ONLINE  ONLINE       proldb07                 Started            
               ONLINE  ONLINE       proldb08                                    
ora.gsd
               OFFLINE OFFLINE      proldb01                                    
               OFFLINE OFFLINE      proldb02                                    
               OFFLINE OFFLINE      proldb03                                    
               OFFLINE OFFLINE      proldb04                                    
               OFFLINE OFFLINE      proldb05                                    
               OFFLINE OFFLINE      proldb06                                    
               OFFLINE OFFLINE      proldb07                                    
               OFFLINE OFFLINE      proldb08                                    
ora.net1.network
               ONLINE  ONLINE       proldb01                                    
               ONLINE  ONLINE       proldb02                                    
               ONLINE  ONLINE       proldb03                                    
               ONLINE  ONLINE       proldb04                                    
               ONLINE  ONLINE       proldb05                                    
               ONLINE  ONLINE       proldb06                                
               ONLINE  ONLINE       proldb07                                    
               ONLINE  ONLINE       proldb08                                    
ora.ons
               ONLINE  ONLINE       proldb01                                    
               ONLINE  ONLINE       proldb02                                    
               ONLINE  ONLINE       proldb03                                    
               ONLINE  ONLINE       proldb04                                
               ONLINE  ONLINE       proldb05                                    
               ONLINE  ONLINE       proldb06                                    
               ONLINE  ONLINE       proldb07                                
               ONLINE  ONLINE       proldb08                                    
ora.registry.acfs
               ONLINE  ONLINE       proldb01                                    
               ONLINE  ONLINE       proldb02                                
               ONLINE  ONLINE       proldb03                                    
               ONLINE  ONLINE       proldb04                                    
               ONLINE  ONLINE       proldb05                                    
               ONLINE  ONLINE       proldb06                                    
               ONLINE  ONLINE       proldb07                                    
               ONLINE  ONLINE       proldb08                                
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       proldb07                                    
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       proldb02                                    
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       proldb05                                
ora.cvu
      1        ONLINE  ONLINE       proldb02                                    
ora.proldb01.vip
      1        ONLINE  ONLINE       proldb01                                    
ora.proldb02.vip
      1        ONLINE  ONLINE       proldb02                                    
ora.proldb03.vip
      1        ONLINE  ONLINE       proldb03                                    
ora.proldb04.vip
      1        ONLINE  ONLINE       proldb04                                
ora.proldb05.vip
      1        ONLINE  ONLINE       proldb05                                    
ora.proldb06.vip
      1        ONLINE  ONLINE       proldb06                                    
ora.proldb07.vip
      1        ONLINE  ONLINE       proldb07                                    
ora.proldb08.vip
      1        ONLINE  ONLINE       proldb08                                    
ora.prolrd.db
      1        ONLINE  ONLINE       proldb01                 Open            
      2        ONLINE  ONLINE       proldb02                 Open               
      3        ONLINE  ONLINE       proldb03                 Open               
      4        ONLINE  ONLINE       proldb04                 Open               
      5        ONLINE  ONLINE       proldb05                 Open               
      6        ONLINE  ONLINE       proldb06                 Open               
      7        ONLINE  ONLINE       proldb07                 Open               
      8        ONLINE  ONLINE       proldb08                 Open               
ora.oc4j
      1        ONLINE  ONLINE       proldb01                                    
ora.scan1.vip
      1        ONLINE  ONLINE       proldb07                                    
ora.scan2.vip
      1        ONLINE  ONLINE       proldb02                                    
ora.scan3.vip
      1        ONLINE  ONLINE       proldb05 

                    
This output shows clearly the status of the various resources. A complete explanation of all the options of CRSCTL is not possible to give. Here is an abbreviated list of the options. To know the exact parameters required for each resource, simply call it with -h option.  For instance, to know about the backup option, execute

# crsctl backup -h
Usage:
  crsctl backup css votedisk
     Backup the voting disk.


Here is the list of the options for CRSCTL:

       crsctl add        - add a resource, type or other entity
       crsctl backup    - back up voting disk for CSS
       crsctl check     - check a service, resource or other entity
       crsctl config    - output autostart configuration
       crsctl debug     - obtain or modify debug state
       crsctl delete    - delete a resource, type or other entity
       crsctl disable   - disable autostart
       crsctl discover  - discover DHCP server
       crsctl enable    - enable autostart
       crsctl get       - get an entity value
       crsctl getperm   - get entity permissions
       crsctl lsmodules - list debug modules
       crsctl modify    - modify a resource, type or other entity
       crsctl query     - query service state
       crsctl pin       - Pin the nodes in the nodelist
       crsctl relocate  - relocate a resource, server or other entity
       crsctl replace   - replaces the location of voting files
       crsctl release   - release a DHCP lease
       crsctl request   - request a DHCP lease
       crsctl setperm   - set entity permissions
       crsctl set       - set an entity value
       crsctl start     - start a resource, server or other entity
       crsctl status    - get status of a resource or other entity
       crsctl stop      - stop a resource, server or other entity
       crsctl unpin     - unpin the nodes in the nodelist
       crsctl unset     - unset a entity value, restoring its default

Another command SRVCTL performs most of the server-based operations including resource (such as service) relocation. This is nothing different from the tool on a traditional Oracle RAC 11g Release 2 Cluster. To know more about the options in this tool, execute this command:

# srvctl -h
Usage: srvctl [-V]
Usage: srvctl add database -d <db_unique_name> -o <oracle_home> [-c {RACONENODE | RAC | SINGLE}
[-e <server_list>] [-i <instname>] [-w <timeout>]] [-m <domain_name>] [-p <spfile>] [-r {PRIMARY | PHYSICAL_STANDBY | LOGICAL_STANDBY | SNAPSHOT_STANDBY}]
[-s <start_options>] [-t <stop_options>] [-n <db_name>] [-y {AUTOMATIC | MANUAL}] [-g "<serverpool_list>"] [-x <node_name>] [-a "<diskgroup_list>"]
[-j "<acfs_path_list>"]
Usage: srvctl config database [-d <db_unique_name> [-a] ] [-v]
Usage: srvctl start database -d <db_unique_name> [-o <start_options>] [-n <node>]
Usage: srvctl stop database -d <db_unique_name> [-o <stop_options>] [-f]
Usage: srvctl status database -d <db_unique_name> [-f] [-v]
… output truncated …

IPMI Tool

Earlier in this article you saw a reference to the IPMI tool. We used it to power the servers on. But that is not the only thing you can do with this tool; there are plenty more options. If you want to find out what options are available, issue the command without any arguments.

# ipmitool
No command provided!
Commands:
        raw           Send a RAW IPMI request and print response
        i2c           Send an I2C Master Write-Read command and print response
        spd           Print SPD info from remote I2C device
        lan           Configure LAN Channels
        chassis       Get chassis status and set power state
        power         Shortcut to chassis power commands
        event         Send pre-defined events to MC
        mc            Management Controller status and global enables
        sdr           Print Sensor Data Repository entries and readings
        sensor        Print detailed sensor information
        fru           Print built-in FRU and scan SDR for FRU locators
        sel           Print System Event Log (SEL)
        pef           Configure Platform Event Filtering (PEF)
        sol           Configure and connect IPMIv2.0 Serial-over-LAN
        tsol          Configure and connect with Tyan IPMIv1.5 Serial-over-LAN
        isol          Configure IPMIv1.5 Serial-over-LAN
        user          Configure Management Controller users
        channel       Configure Management Controller channels
        session       Print session information
        sunoem        OEM Commands for Sun servers
        kontronoem    OEM Commands for Kontron devices
        picmg         Run a PICMG/ATCA extended cmd
        fwum          Update IPMC using Kontron OEM Firmware Update Manager
        firewall      Configure Firmware Firewall
        shell         Launch interactive IPMI shell
        exec          Run list of commands from file
        set           Set runtime variable for shell and exec
        hpm           Update HPM components using PICMG HPM.1 file

It’s not possible to explain each option here. Let’s examine one of the most used ones. The option sel shows System Event Log, one of the key commands you will need to use.

# ipmitool sel
SEL Information
Version          : 2.0 (v1.5, v2 compliant)
Entries          : 96
Free Space       : 14634 bytes
Percent Used     : 9%
Last Add Time    : 02/27/2011 20:23:44
Last Del Time    : Not Available
Overflow         : false
Supported Cmds   : 'Reserve' 'Get Alloc Info'
# of Alloc Units : 909
Alloc Unit Size  : 18
# Free Units     : 813
Largest Free Blk : 813
Max Record Size  : 18

The output is summary only. To know the details of the Event Log, you can use an additional parameter: list.

# ipmitool sel list
   1 | 01/21/2011 | 07:05:39 | System ACPI Power State #0x26 | S5/G2: soft-off | Asserted
   2 | 01/21/2011 | 08:59:43 | System Boot Initiated | System Restart | Asserted
   3 | 01/21/2011 | 08:59:44 | Entity Presence #0x54 | Device Present
   4 | 01/21/2011 | 08:59:44 | System Boot Initiated | Initiated by hard reset | Asserted
   5 | 01/21/2011 | 08:59:44 | System Firmware Progress | Memory initialization | Asserted
   6 | 01/21/2011 | 08:59:44 | System Firmware Progress | Primary CPU initialization | Asserted
   7 | 01/21/2011 | 08:59:49 | Entity Presence #0x58 | Device Present
   8 | 01/21/2011 | 08:59:52 | Entity Presence #0x57 | Device Present
   9 | 01/21/2011 | 08:59:53 | System Boot Initiated | Initiated by warm reset | Asserted
   a | 01/21/2011 | 08:59:53 | System Firmware Progress | Memory initialization | Asserted
   b | 01/21/2011 | 08:59:53 | System Firmware Progress | Primary CPU initialization | Asserted
   c | 01/21/2011 | 08:59:54 | System Boot Initiated | Initiated by warm reset | Asserted
   d | 01/21/2011 | 08:59:55 | System Firmware Progress | Memory initialization | Asserted
   e | 01/21/2011 | 08:59:55 | System Firmware Progress | Primary CPU initialization | Asserted
   f | 01/21/2011 | 09:00:01 | Entity Presence #0x55 | Device Present
... truncated ...

The output has been shown partially to conserve space. This is one of the key commands you should be aware of. In a troubleshooting episode, you should check the system even log to make sure the components have not failed. If they did, of course, you would have to replace them before going further. If you get a clean bill of health from IPMITOOL, you should go to the next step of making sure you have no issues with the cluster, then no issues with the RAC database and so on.

ASMCMD Tool

You can manage the ASM instance in three different ways:
  • SQL – traditional SQL commands are enough for ASM but may not be the best for scripting and quick checks such as checking for free space
  • ASMCMD – a command line tool for the ASM operations. It’s very user-friendly, especially for the SysAdmin-turned-DMA since it does not need knowledge of SQL
  • ASMCA – ASM Configuration Assistant; has limited functionality
Of these, ASMCMD is the most widely used. Let’s see how it works. You invoke the tool by executing asmcmd at the linux command prompt.

$ asmcmd –p
The -p parameter merely shows the current directory in the prompt. At the ASMCMD prompt, you can enter the commands. To now the free space in diskgroups, you issue the lsdg command.

ASMCMD [+PRORECO] > lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB   Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N         512   4096  4194304   4175360   4172528           379578         1896475              0             N  DBFS_DG/
MOUNTED  NORMAL  N         512   4096  4194304  67436544  64932284          6130594        29400845              0             N  PRODATA/
MOUNTED  HIGH    N         512   4096  4194304  23374656  21800824          4249936         5850296              0             Y  PRORECO/

The command such as ls and cd works just like their namesakes in the Linux world.

ASMCMD [+] > ls
DBFS_DG/
PRODATA/
PRORECO/
ASMCMD [+] > cd PRORECO


To know the space consumed by each file, you issue ls –ls command.

ASMCMD [+PRORECO/PROLRD/ONLINELOG] > ls -ls
Type       Redund  Striped  Time             Sys  Block_Size   Blocks       Bytes        Space  Name
ONLINELOG  HIGH    COARSE   MAR 31 19:00:00  Y           512  8388609  4294967808  12910067712  group_1.257.744724579
ONLINELOG  HIGH    COARSE   MAR 31 19:00:00  Y           512  8388609  4294967808  12910067712  group_xx.277.744725199
ONLINELOG  HIGH    COARSE   MAR 31 19:00:00  Y           512  8388609  4294967808  12910067712  group_11.278.744725207
ONLINELOG  HIGH    COARSE   MAR 31 19:00:00  Y           512  8388609  4294967808  12910067712  group_12.279.744725215
ONLINELOG  HIGH    COARSE   MAR 31 19:00:00  Y           512  8388609  4294967808  12910067712  group_13.270.744725161
ONLINELOG  HIGH    COARSE   MAR 31 19:00:00  Y           512  8388609  4294967808  12910067712  group_14.272.744725169
… output truncated …

To get a complete listing of all the ASMCMD commands, use help.

ASMCMD [+] > help
       commands:
        --------

        md_backup, md_restore
        lsattr, setattr
        cd, cp, du, find, help, ls, lsct, lsdg, lsof, mkalias
        mkdir, pwd, rm, rmalias

        chdg, chkdg, dropdg, iostat, lsdsk, lsod, mkdg, mount
        offline, online, rebal, remap, umount

        dsget, dsset, lsop, shutdown, spbackup, spcopy, spget
        spmove, spset, startup

        chtmpl, lstmpl, mktmpl, rmtmpl
        chgrp, chmod, chown, groups, grpmod, lsgrp, lspwusr, lsusr
        mkgrp, mkusr, orapwusr, passwd, rmgrp, rmusr

        volcreate, voldelete, voldisable, volenable, volinfo
        volresize, volset, volstat

To get help about a specific command, use help <Command> ASMCMD [+] > help chkdg.

        chkdg
        Checks or repairs the metadata of a disk group.
        chkdg [--repair] diskgroup
        The options for the chkdg command are described below.
        --repair        - Repairs the disk group.
        diskgroup       - Name of disk group to check or repair.

        chkdg checks the metadata of a disk group for errors and optionally
        repairs the errors.

        The following is an example of the chkdg command used to check and
        repair the DATA disk group.

        ASMCMD [+] > chkdg --repair data

Task(s)Command Category
Manage the operating system and servers – nodes as well as cellsLinux Commands such as shutdown fdisk, etc.
Power off and check status of componentsIPMITOOL (Linux Tool)
Manage the Exadata Storage Server and cell related commandCellCLI Tool
Manage multiple cells at one timeDCLI
Manage ASM resources like diskgroupSQL commands (can be SQL*Plus) or ASMCMD
Manage ClusterwareCRSCTL
Manage cluster componentsSRVCTL
Manage the databaseSQL commands (can be SQL*Plus)


Next Steps

Now that you know the different categories of commands, you should know about the specific ones.

Enjoy Reading !!! :)

No comments:

Post a Comment