Friday, November 30, 2018

Recovery after killing a large transaction

Summary 

After killing a large running transaction, either by killing the process or aborting the database, the database may appear to hang and/or the SMON background process is taking all the available cpu. Basically SMON is in the progress of rolling back this terminated large transaction, which can be time consuming. As a result of this users often try to restart the database, but because a shutdown immediate will hang, the database is often aborted. Please note that stopping the database will actually make things worse and will not decrease the amount of work SMON needs to perfom to complete the rollback. 

The message 'Waiting for smon to disable tx recovery' will posted in the alert log as well. 

First we need to determine the progress SMON is making. It is very possible the SMON process to work with one rollback segment. You can find it using the following query:

SELECT r.NAME "RB Segment Name", dba_seg.size_mb,
DECODE(TRUNC(SYSDATE - LOGON_TIME), 0, NULL, TRUNC(SYSDATE - LOGON_TIME) || ' Days' || ' + ') || 
TO_CHAR(TO_DATE(TRUNC(MOD(SYSDATE-LOGON_TIME,1) * 86400), 'SSSSS'), 'HH24:MI:SS') LOGON, 
v$session.SID, v$session.SERIAL#, p.SPID, v$session.process,
v$session.USERNAME, v$session.STATUS, v$session.OSUSER, v$session.MACHINE, 
v$session.PROGRAM, v$session.module, action 
FROM v$lock l, v$process p, v$rollname r, v$session, 
(SELECT segment_name, ROUND(bytes/(1024*1024),2) size_mb FROM dba_segments 
WHERE segment_type = 'TYPE2 UNDO' ORDER BY bytes DESC) dba_seg 
WHERE l.SID = p.pid(+) AND 
v$session.SID = l.SID AND 
TRUNC (l.id1(+)/65536)=r.usn AND 
l.TYPE(+) = 'TX' AND 
l.lmode(+) = 6 
AND r.NAME = dba_seg.segment_name
--AND v$session.username = 'SYSTEM'
--AND status = 'INACTIVE'
ORDER BY size_mb DESC;
After killing the shadow process or aborting the database the v$transaction entry is lost, so you cannot estimate by examining v$transaction.used_ublk how the rollback procedure proceeds. 

The following queries are available to monitor the progress of the transaction recovery:

SQL> alter session set NLS_DATE_FORMAT='DD-MON-YYYY HH24:MI:SS'; 

SELECT usn, state, undoblockstotal "Total", undoblocksdone "Done", 
undoblockstotal-undoblocksdone "ToDo", 
DECODE(cputime,0,'unknown',SYSDATE+(((undoblockstotal-undoblocksdone) / (undoblocksdone / cputime)) / 86400)) 
"Finish at" FROM v$fast_start_transactions; 

       USN STATE                 Total       Done       ToDo Finish at
---------- ---------------- ---------- ---------- ---------- --------------------
        88 RECOVERED             36591      36591          0 17-NOV-2018 17:02:44
        36 RECOVERING              464        100        364 10-DEC-2018 03:32:06
        71 RECOVERING              309        193        116 20-NOV-2018 19:19:23
         8 RECOVERING          2350785     189363    2161422 18-NOV-2018 00:51:51

Run the above query several times in a row, this will give you a good idea on how SMON is progressing. 

- In some versions the cputime does not work (always 0), hence the estimated completion time will not be displayed 
- In some cases the v$fast_start_transactions view will not work. If this is the case then you can query the internal data dictionary 
view x$ktuxe, the 'ktuxesiz' column represents the remaining number of undo blocks required for rollback:

   select ktuxeusn, to_char(sysdate,'DD-MON-YYYY HH24:MI:SS') "Time", ktuxesiz, ktuxesta
   from x$ktuxe
   where ktuxecfl = 'DEAD';