Saturday, February 14, 2015

Critical Oracle Database Recovery without Archive Logs - Redo Logs / Force Open Database in an Inconsistent State

Recently, we had a situation where we restored Production backup (Database is 11.2.0.3 RAC on Exadata/ASM) of 14th Jan, 2015 through Netvault tape backup due to some major issue in finance to get the status/correction of critical tables  (AP, XLA, GL Schemas) of our eBusiness Suite Database. 

We had successfully restored all datafiles through Netvault tape backup however we found that some archives were missing which were required for recovery due to which we unable to open database in resetlogs.

There was no way to generate required archive logs again or to take incremental backup from the needed SCN from production database to roll forward the clone database until the issue occurred.

But the above feature is still not there in Oracle Database that to take backup from SCN until SCN. We can  only create incremental backups from SCN until current time. (which we used to roll forward standby database using RMAN Incremental Backup)

Although there is an enhancement request to get this feature implemented in newer releases, it is still under consideration: Bug 20327651 CREATING INCREMENTAL BACKUP WITH STARTING SCN AND ENDING SCN


We had no other option but to open database in an inconsistent state. So to open database in inconsistent state, we have followed below steps. (PLEASE REMEMBER THAT THE  INSTRUCTIONS HERE ARE DESTRUCTIVE. YOU ARE STRONGLY ADVISED TO BACKUP THE DATABASE BEFORE PROCEEDING. IF YOU DO NOT DO THIS YOU MAY LOSE THE CHANCE TO TRY OTHER OPTIONS AND I'll NOT BE RESPONSIBLE FOR ANY DAMAGE)

IF THE STEPS BELOW DO ALLOW YOU TO OPEN YOUR DATABASE THEN IT IS ESSENTIAL THAT THE DATABASE BE REBUILT AS IT IS NO LONGER SUPPORTED. FAILURE TO DO SO MAY LEAD TO DATA DICTIONARY INCONSISTENCIES, INTERNAL ERRORS AND CORRUPTIONS. 

Steps to attempt to force the database open:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


1) Backup the database while the database is closed.

2) Disable this database from EM if running. We also need to disable RAC to avoid Auto restart.

3) If your datafiles are from different points in time, it is best to try to
use system tablespace datafiles at a similar timestamp to the OLDEST files
you have online in the database. This reduces the chance that you will get
problems during the bootstrap phase of opening the database.

4) Edit your init.ora file to change undo_management and add two parameters.

* Change UNDO_MANAGEMENT=AUTO to

UNDO_MANAGEMENT=MANUAL

* Remove or comment out UNDO_TABLESPACE and UNDO_RETENTION.

* Add

CLUSTER_DATABASE=FALSE
JOB_QUEUE_PROCESSES=0
_ALLOW_RESETLOGS_CORRUPTION = TRUE
_CORRUPTED_ROLLBACK_SEGMENTS =(comma separated list of Automatic Undo segments)

Example:

_CORRUPTED_ROLLBACK_SEGMENTS = (_SYSSMU1$, _SYSSMU2$, _SYSSMU3$, _SYSSMU4$,
_SYSSMU5$, _SYSSMU6$, _SYSSMU7$, _SYSSMU8$, _SYSSMU9$, _SYSSMU10$)

Note, sometimes the alert log will tell you what Automatic Undo segments are in
use. Search the alert log for SYSS. If the alert log does not contain that
information then use _SYSSMU1$ through _SYSSMU10$ as shown in the example above.

From 11g onwards the names of the UNDO-segments are a little different and have an extension:

_SYSSMU< nn>_< mmmmm>$) ( example _SYSSMU10_1221075320$)

Be sure to include the additional numbers in the UNDO-segment name as well.

In UNIX you can issue this command on SYSTEM datafile(s) to get the undo segment names:

$ strings system01.dbf | grep _SYSSMU | cut -d $ -f 1 | sort -u

If files are in ASM, you can use RMAN command to copy them to filesystem before using strings command, e.g. :

RMAN> backup as copy datafile 1 format '/tmp/system01.dbf' ;

You should run above command on all SYSTEM datafiles as undo segments may reside in more than one SYSTEM datafile.
To list the name of SYSTEM datafiles, run below command in at least MOUNT mode:

SQL> select file#, f.name from v$datafile f, v$tablespace t where f.ts#=t.ts# and t.name='SYSTEM' order by file# ;

From the output of the strings command above, add a $ to end of each _SYSSMU undo segment name.

* If you only have a spfile available, you can from the closed, nomount or the
mount stage create an init.ora file as follows:

SQL> CREATE PFILE FROM SPFILE;

Do NOT edit the SPFILE.

5) Invoke SQL*Plus, startup mount, check that correct init.ora was used and
all datafiles are in the status of online or system.

$ sqlplus "/as sysdba"

SQL> startup mount pfile = (full path / file name to init.ora)
Confirm that the hidden parameters from step 3 were used:

SQL> show parameters corrupt

You should see both hidden parameters listed. If not, the wrong init.ora
may have been modified. Do not continue until "show parameters corrupt" shows
both hidden parameters.

SQL> show parameters undo

You should see undo management set to manual. If not, the wrong init.ora
may have been modified. Do not continue until "show parameters undo" shows
undo management as manual.

Check that all files you want to open with are listed as ONLINE or as SYSTEM.

SQL> select name, file#, status from v$datafile where status not in
('SYSTEM', 'ONLINE');

If any rows are returned from the query above, bring the file(s) online with:

SQL> ALTER DATABASE DATAFILE file# ONLINE;

and repeat until there are no files in an OFFLINE status. If any file remains or
changes into "recover" status after you try to online the file proceed to step 5.

6) Perform a fake incomplete recovery then open the database with resetlogs.

SQL> recover database using backup controlfile until cancel;

WHEN PROMPTED FOR AN ARCHIVELOG FILE TYPE cancel THEN PRESS ENTER.

SQL> ALTER DATABASE OPEN RESETLOGS;

7) If the database opens try selecting from a table. For example:

SQL> SELECT SYSDATE FROM DUAL; 


With the database open and functional you should attempt to export the database
IMMEDIATELY. Since database is unstable, don't try another shutdown/startup unless needed.
Once you have an export the database MUST be recreated from scratch.
This means dropping and deleting ALL datafiles and creating a new database from
scratch.

A database which has been opened in this way but not rebuilt will not be
supported by Oracle. Any delay in exporting the contents or any attempt to
use the system may cause irreparable damage.

NOTE: BE SURE TO REVERSE / REMOVE THE INIT.ORA PARAMETERS ADDED IN STEP 3
OTHERWISE YOU MAY ACCIDENTALLY CORRUPT ANY NEW DATABASE CREATED USING THE SAME
INIT.ORA FILE. 


In our case, OPEN RESETLOGS was failing with below,

SQL> ALTER DATABASE OPEN RESETLOGS;
ALTER DATABASE OPEN RESETLOGS
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 135 with name
"_SYSSMU135_2708413105$" too small
Process ID: 67764
Session ID: 627 Serial number: 3 


Also RESETLOGS Clause made datafiles header updated and controlfile type from BACKUP to CURRENT, so we had to restore Controlfile from backup again along with CATALOGing all datafiles again.

Database was been forced open. All files were not in sync. File having block scn higher than database scn The issue was due to when forcing open the ora-1555 means one of the undo segments had a higher SCN. Thus needing the _MINIMUM_GIGA_SCN.  

===== Below is content from Metalink Note ID to identify how much to set _MINIMUM_GIGA_SCN ======


If no trace file is available for Ora-1555 please do the following to generate a trace file for 704 And ora-1555 :-

SQL>Startup mount ;

SQL>Alter session set tracefile_identifier=new1555 ;

SQL>Alter session set events '1555 trace name errorstack forever, level 3';

Alternatively you can set the event in the init.ora file (This would need database to be restarted and mounted for parameter to come into effect)

 Event = "1555 trace name errorstack level 3"

Now try open resetlogs

SQL>Alter database open resetlogs ;

It will error out with ORA-1092

Alert log would show 1555 error message.

Go to udump or trace directory

ls -lrt *new1555*

Case one undo segment name is reported in alert log

ORA-01555: snapshot too old: rollback segment number 11 with name "_SYSSMU11$" too small

In the above case the Ora-1555 was reported on _SYSSMU11$ which is undo segment number 11.

Hex value of same is 11 --> b

Search the trace file from Table or index block header dump whose transaction layer has an undo segment 11 been used in the ITL.Scroll up to get the Buffer header dump of that block
BH (0xacff8e48) file#: 1 rdba: 0x0040003e (1/62) class: 1 ba: 0xacf66000
set: 3 blksize: 8192 bsi: 0 set-flg: 2 pwbcnt: 0
dbwrid: 0 obj: 18 objn: 18 tsn: 0 afn: 1
hash: [d0e04c98,d0e04c98] lru: [acff8fd8,acff8db8]
ckptq: [NULL] fileq: [NULL] objq: [cc9a3d00,cc9a3d00]
use: [ce699910,ce699910] wait: [NULL]
st: CR md: EXCL tch: 0
cr: [scn: 0x0.4124e1e],[xid: 0x6.0.c28d],[uba: 0x820075.ea1.23],[cls: 0x0.46b5261],[sfl: 0x1]
flags:
Using State Objects
----------------------------------------
SO: 0xce6998d0, type: 24, owner: 0xd04439e8, flag: INIT/-/-/0x00
(buffer) (CR) PR: 0xd02fa378 FLG: 0x500000
class bit: (nil)
kcbbfbp: [BH: 0xacff8e48, LINK: 0xce699910]
where: kdswh02: kdsgrp, why: 0
buffer tsn: 0 rdba: 0x0040003e (1/62)
scn: 0x0000.046b527a seq: 0x00 flg: 0x00 tail: 0x527a0600
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x00000000ACF66000 to 0x00000000ACF68000
0ACF67FF0 FFFF02C1 01FF8001 02C10280 527A0600 [..............zR]
.
.
.

Block header dump: 0x0040003e
Object id on Block? Y
seg/obj: 0x12 csc: 0x00.46b519e itc: 1 flg: - typ: 1 - DATA
fsl: 0 fnx: 0x0 ver: 0x01

Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x000b.00b.00000e7a 0x00802042.00db.1a C--- 0 scn 0x0000.04624228

So here we see ITL allocated is 0x01.

Transaction identifier --> <undo no>.<slot>.<wrap> ---> 0x000b.00b.00000e7a

Undo segment no ---> 0x000b --> 11 in decimal .

This block belong to 0x0040003e --(1/62)

Find the SCN of this block from the BH(Buffer header) for 0x0040003e --(1/62)

So in this case its the one highlighted above in trace scn: 0x0000.046b527a

Now set _mimimum_giga_scn value based on this SCN

scn: 0x0000.046b527a 

Convert 0x0000 --> Decimal --> 0
Convert 046b527a --> Decimal -->74142330

Combine both these value and find the value for _minimum_giga_scn
Convert --> 074142330 /1024/1024/1024 =0.069050
Add + 2G to the above value and round it up

_minimum_giga_scn = 3  

Set this parameter in the pfile along with the other force open parameter

This parameter has been deprecated starting in CPUJAN2012 which is included in the next PSU versions or above:

10.2.0.4.11
10.2.0.5.6
11.1.0.7.10
11.2.0.2.5
11.2.0.3.1
 
SQL>Startup mount pfile=<> ;

SQL>recover database using backup controlfile until cancel ;
Cancel

SQL>Alter database open resetlogs ;

Try to shutdown and startup DB several times, sometimes the SCN increases in DB bounce are enough to bypass the issue.  ### in case if you still face issue while opening a database.

sql>shutdown; 
sql>startup; 

Do it many times (10-15) and lets see if we can bypass the issue... 
====================================================================

so you might be required to change compatible parameter as well followed by restoring controlfiles / datafiles again.

So key parameters are to force open database in an inconsistent state as below,

_minimum_giga_scn
adjust_scn event or _smu_debug_mode=268435456 (deprecated) ### Not Used
_allow_resetlogs_corruption
_corrupted_rollback_segments
compatible ### If required



If you still face any issue, while doing this critical recovery of database then contact me on my mail id: manish.nashikkar@hotmail.com or give me call on +91 - 9920083054 at support cost.

Thanks,
Manish Nashikkar

2 comments:

  1. Oracle Golden Gate 11g --- "
    Oracle Golden Gate 11g Online Training

    Send ur Enquiry to contact@21cssindia.com
    Introduction to Oracle Golden Gate 11g: Fundamentals for Oracle
    Objectives
    Oracle Golden Gate 11g
    Key Capabilities & Technolo" more… Online Training- Corporate Training- IT Support U Can Reach Us On +917386622889 - +919000444287 http://www.21cssindia.com/courses/oracle-golden-gate-11g-online-training-232.html

    ReplyDelete