The Oracle Cluster Registry (OCR) records cluster
configuration information. If it fails, the entire clustered environment
for Oracle 11g RAC will be adversely affected and a possible outage may result
if OCR is lost.
OCR is the central repository for CRS, which stores the
metadata, configuration and state information for all cluster resources defined
in clusterware. It is a cluster registry used to maintain application resources
and their availability within the RAC environment. It also stores configuration
information for CRS daemons and clusterwaremanaged applications.
What is stored in OCR?
§ - Node membership information i.e. which nodes
are part of the cluster
§ - Software active version
§ - the location of the 11g voting disk.
§ - Serverpools
§ - Status for the cluster resources such as RAC
databases, listeners, instances, and services
. Server
up/down
. Network up/down
. Database up/down
. Instance up/down
. Listener up/down …
. Network up/down
. Database up/down
. Instance up/down
. Listener up/down …
§ - configuration for the cluster resources such
as RAC databases, listeners, instances, and services.
. Dependencies
. Management policy (automatic/manual)
. Callout scripts
. Retries
. cluster database instance to node mapping
. Management policy (automatic/manual)
. Callout scripts
. Retries
. cluster database instance to node mapping
§ - ASM instance, Diskgroups etc.
§ - CRS application resource profiles such as
VIP addresses, services etc.
§ - Database services’ characteristics e.g
preferred/available nodes, TAF policy , Load balancing goal etc.
§ - Information about clusterware processes
§ - Information about interaction and management
of third party applications controlled by CRS
§ - Details of the network interfaces held by
the cluster network
§ - Communication settings where the Clusterware
daemons or background processes listen
§ - Information about OCR backups
§
Let’s take a peek at the OCR backup …
[root@host01 ~]# ocrconfig
-manualbackup
host02 2015/01/18 01:03:40
/u01/app/11.2.0/grid/cdata/cluster01/backup_20150118_010340.ocr
[root@host02~]# strings
/u01/app/11.2.0/grid/cdata/cluster01/backup_20150118_010340.ocr| grep -v type
|grep ora!
ora!LISTENER!lsnr
ora!host02!vip
rora!host01!vip
;ora!oc4j
6ora!LISTENER_SCAN3!lsnr
ora!LISTENER_SCAN2!lsnr
ora!LISTENER_SCAN1!lsnr
ora!scan3!vip
ora!scan2!vip
ora!scan1!vip
ora!gns
ora!gns!vip
ora!registry!acfs
ora!DATA!dg
dora!asm
_ora!eons
ora!ons
ora!gsd
ora!net1!network
Who updates OCR ?
————————————————–
OCR, which contains information about the high-availability
components of the RAC cluster, is maintained and updated by several client
applications:
- CSSd during cluster setup – to update the status of
servers
- CSS during node addition/deletion – to add/delete node names
- CRSd about status of nodes during failure/reconfiguration
- OUI
- SRVCTL (used to manage clusters and RAC databases/instance)
- Cluster control utility – CRSCTL (to manage cluster/local
resources)
- Enterprise Manager (EM),
- Database Configuration assistant (DBCA),
- Database Upgrade Assistant (DBUA),
- Network Configuration Assistant (NETCA) and
- the ASM Configuration Assistant (ASMCA).
Each node in the cluster maintains a copy of the OCR in memory
for better performance and each node is responsible for updating the OCR as
required. The CRSd process is responsible for reading and writing to the OCR
files as well as refreshing the local OCR cache and the caches on the other
nodes in the cluster.Oracle uses a distributed shared cache architecture during
cluster management to optimize queries against the cluster repository. Each
node maintains a copy of the OCR in memory. Oracle Clusterware uses a
background process to access the OCR cache. Only one CRSd process (designated
as the master) in the cluster performs any disk read/write activity. Once any
new information is read by the master CRSd process, it performs a refresh of
the local OCR cache and the OCR cache on other nodes in the cluster. Since the
OCR cache is distributed across all nodes in the cluster, OCR clients (srvctl,
crsctl etc.) communicate directly with the local OCR process on the node to
obtain required information. Clients communicate via the local CRSd process for
any updates on the physical OCR binary file.
However, the ocrconfig command cannot modify OCR
configuration information for nodes that are shut down or for nodes on which
Oracle Clusterware is not running. So, you should avoid shutting down nodes
while modifying the OCR using the ocrconfig command. If for any reason, any of
the nodes in the cluster are shut down while modifying the OCR using the
ocrconfig command, you will need to perform a repair on the stopped node before
it can brought online to join the cluster.
The ocrconfig –repair command changes the OCR configuration only
on the node from which you run this command. For example, if the OCR mirror was
relocated to a disk named /dev/raw/raw2 from racnode1 while the node racnode2
was down, then use the command ocrconfig -repair ocrmirror /dev/raw/raw2 on
racnode2 while the CRS stack is down on that node to repair its OCR
configuration.
Purpose of OCR
———————–
- Oracle
Clusterware reads the ocr.loc file for the location of the registry and to
determine which applications resources need to be started and the nodes on
which to start them.
- It is used to
bootstrap the CSS for port info, nodes in the cluster and similar info.
- The CRSd, or Oracle Clusterware daemon’s function is to define
and manage resources managed by Clusterware. Resources have profiles that
define metadata about them. This metadata is stored in the OCR. The CRS reads
the OCR and
. manages the application resources: starts, stops,
monitors and manages their failover
. maintains and tracks information pertaining to
the definition, availability, and current state of the services.
. implements the workload balancing and continuous
availability features of services
. generates events during cluster state changes;
. maintains configuration profiles of resources in the OCR.
. maintains configuration profiles of resources in the OCR.
. records the currently known state of the
cluster on a regular basis and provides the same when queried (using srvctl,
crsctl etc.)
How is the info stored in OCR
—————————–
The OCR uses a file-based repository to store configuration
information in a series of key-value pairs, using a directory tree-like
structure.It contains information pertaining to all tiers of the clustered
database. Various parameters are stored as name-value pairs used and maintained
at different levels of the architecture.
Each tier is managed and administrated by daemon processes with
appropriate privileges to manage them. For example,
. all SYSTEM level resource or application definitions
would require root, or superuser, privileges to start,
stop, and execute resources defined at this level.
. those defined at the DATABASE level will require dba
privileges to execute.
Where and how should OCR be stored?
————————————————-
§ - You can find the location of the OCR in
a file on each individual node of the cluster. This location varies by platform
but on Linux the location of the OCR is stored in the file /etc/oracle/ocr.loc
§ - The OCR must reside on a shared disk(s) that
is accessible by all of the nodes in the cluster. In the prior releases of
Oracle, the Oracle Cluster Repository (OCR) was on raw devices. Since the raw
devices have been deprecated, the choice now is between a cluster filesystem or
an ASM diskgroup. The OCR and voting disk must be on a shared device so a
local filesystem is not going to work. Clustered filesystems may not be an
option due to high cost. Other options may include network filesystems but they
are usually slow and unreliable. So, ASM remains the best choice. The OCR and
voting disks could be on any available ASM diskgroup; not ones exclusively
created for them.
§ - The OCR is striped and mirrored (if we have
a redundancy other than external), similar to ordinary Database Files . So we
can now leverage the mirroring capabilities of ASM to mirror the OCR also,
without having to use multiple RAW devices for that purpose only.
§ - The OCR is replicated across all the
underlying disks of the diskgroup; so failure of a disk does not bring the
failure of the diskgroup.
§ - Considering the criticality of the OCR
contents to the cluster functionality, Oracle strongly recommends you to
multiplex the OCR file. In 11g R2, you can have up to five OCR copies.
§ - Due to its shared location, from a single
location, all the components running on all nodes and instances of Oracle can
be administrated, irrespective of the node on which the registry was created.
§ – A small disk of around 300 MB-500 MB
is a good choice.
various utilities used to manage OCR
————————————————–
Add an OCR file
—————-
Add an OCR file to an ASM diskgroup called +DATA
ocrconfig –add +DATA
Moving the OCR
————–
Move an existing OCR file to another location :
ocrconfig –replace /u01/app/oracle/ocr
–replacement +DATA
Removing an OCR location
————————
- requires that at least one other OCR file must remain online.
ocrconfig –delete +DATA
Migrating to ASM
—————-
Oracle Clusterware 11g Release 2 supports the storage of OCR
files on ASM. Clusterware makes it easy to migrate your OCR files to ASM.
Simply follow these instructions:
1.Check the active version of Clusterware and make sure that it
is 11.2.0.1 or greater
#crsctl query crs activeversion
2.Make sure that ASM is running on all nodes.
3.Create a new disk group for the OCR file. It should have a
minimum of 1GB of space on it.
4.Use the ocrconfig command to add the OCR file to the new ASM
disk group
#ocrconfig –add +NEW_DISKGROUP
5.Remove any OCR storage locations that you no longer wish to
use with the ocrconfig command as seen here:
#ocrconfig –delete
/u01/shared/OCR1
Migrating Off ASM
—————–
If you do not want to store OCR on ASM you can migrate
your OCR files from ASM to other shared storage. Simply follow these
instructions:
1.Check the active version of Clusterware and make sure that it
is 11.2.0.1 or greater with the crsctl command as seen here:
#crsctl query crs activeversion
2.Create the shared file for the OCR. Make sure that root owns
it, that oinstall is the group and with permissions 640. The mount should have
at least 300MB of free space.
3.Add any OCR storage locations that you wish with the ocrconfig
command as seen here:
#ocrconfig –add
/u01/shared/OCR1
4.Use the ocrconfig command to remove the OCR file from ASM disk
group
#ocrconfig –delete
+OLD_DISKGROUP
OCR repair
———-
If the OCR becomes damaged (which might be evidenced by cluster
failures, or error messages in Clusterware logs) then you may need to repair
the OCR. Also, if you make a change to the cluster configuration while a node
is down then you may need to repair the OCR too. For example if another OCR
location was addeed while a node was down, to repair the OCR use the ocrconfig
command as seen here:
#ocrconfig –repair –add
/u01/app/oracle/ocr
This command will only run on the node that the command is
executed on. Thus, if you stopped a node, made some cluster adjustments on
another node and then restarted the down node, you might need to execute the
ocrconfig command on the node once it’s started.
OCR Backups
————
Oracle Clusterware 11g Release 2 backs up the OCR automatically
every four hours on a schedule that is dependent on when the node started (not
clock time). OCR backups are made to the GRID_HOME/cdata/<cluster name>
directory on the node performing the backups. One node known as the master node
is dedicated to these backups, but in case master node is down , some other
node may become the master. Hence, backups could be spread across nodes due to
outages. These backups are named as follows:
-4-hour backups (3 max) –backup00.ocr, backup01.ocr, and
backup02.ocr.
-Daily backups (2 max) – day.ocr and
day_.ocr
-Weekly backups (2 max) – week.ocr and week_.ocr
It is recommended that OCR backups may be placed on a shared
location which can be configured using ocrconfig -backuploc <new
location> command.
Oracle Clusterware maintains the last three backups, overwriting
the older backups. Thus, you will have 3 4-hour backups, the current one, one
four hours old and one eight hours old.
Therefore no additional clean-up tasks are required of the DBA.
Oracle Clusterware will also take a backup at the end of the day. The last two
of these backups are retained. Finally, at the end of each week Oracle will
perform another backup, and again the last two of these backups are retained.
You should make sure that your routine file system backups backup the OCR
location.
Note that RMAN does not backup the OCR.
You can use the ocrconfig command to view the current OCR backups:
#ocrconfig –showbackup auto
If your cluster is shutdown, then the automatic backups will not
occur (nor will the purging). The timer restarts from the beginning when the
cluster is restarted. When you start the cluster backup, a backup will not be
taken immediately. Hence, if you are stopping and starting your cluster
that you could impact the OCR backups and the backup period could go long
beyond 4 hours.
If you feel that you need to backup the OCR immediately (for
example, you have made a number of cluster related changes) then you can use
the ocrconfig command to perform a manual backup:
#ocrconfig –manualbackup
You can list
the manual backups with the
ocrconfig command too:
#ocrconfig –showbackup manual
Ocrconfig also supports the creation of a logical backup of the OCR as seen here:
#ocrconfig –export
<filename>
It is recommended that the OCR backup location be on a shared
file system and that the cluster be configured to write the backups to that
file system. To change the location of
the OCR backups, you can use the
ocrconfig command as seen in this example:
#ocrconfig –backuploc
/u01/app/oracle/ocrloc
Note that the ASM Cluster File System (ACFS) does not support
storage of OCR backups.
Restoring the OCR
—————–
If you back it up, there might come a time to restore it. Recovering
the OCR from the physical backups is fairly straight forward, just
follow these steps:
1.Locate the OCR backup using the ocrconfig command.
#ocrconfig -showbackup
2.Stop CRS on all nodes one by one
#crsctl stop crs
If above command fails due to OCR corruption, stop CRS on all nodes one by one using the following command:
#crsctl stop crs -f
3.Start CRS on one node in exclusive mode
#crsctl start crs -excl
Check if crsd is running. If it is, stop it :
# crsctl stop resource
ora.crsd -init
4.Restore the
OCR
If you want to restore OCR to an Oracle ASM disk group, then you
must first create a disk group using SQL*Plus that has the same name as the
disk group you want to restore and mount it on the local node.
If you cannot mount the disk group locally, then run the
following SQL*Plus command:
SQL> drop diskgroup disk_group_name force including contents;
Restore OCR from its physical backup
#ocrconfig –restore
{path_to_backup/backup_file_to_restore}
5. Verify the integrity of OCR
#ocrcheck
6. STOP CRS on the node
where you had started in exclusive mode
#crsctl stop crs
If CRS does not stop normally, stop it with force option
#crsctl stop crs -f
7.Start CRS on all nodes one by one
#crsctl start crs
8.Check the integrity of the newly restored OCR:
#cluvfy comp ocr –n all
-verbose
I hope you found this information useful. Your
comments/suggestions are welcome.
Thanks Manish..!!
ReplyDeleteI go through it and I must say it is very informative.
Many Thanks for your words, and Glad it helped.
Delete