How To Repair UNDO log corruption

May 18th, 2009 No Comments   Posted in Oracle How-To

Today I had a corrupted segment in my UNDO tablespace. While I have had datafile corruption before I have not seen UNDO segment corruption. This fix could have been a more drastic, but as it was it ended up being pretty easy.

I first found out there was a problem when one of my databases emailed me that “ORA-01034: ORACLE not available , ORA-27101: shared memory realm does not exist”. Well that is never good. I logged into the server and sure enough the database was down. I logged into SQLPlus and started the database backup. Everything start fine so I went to look for the cause. The first place I went look is the alert log. As I expected I saw errors like this one.

ORA-00600: internal error code, arguments: [kdourp_inorder2], [44], [0], [48], [44], [], [], []

No DBA ever likes to see ORA-00600. It’s too generic tell exactly what is going on and generally means something simple broke or something is terribly wrong. Now that the database was backup I started to tail the alert log and noticed more ORA 600 errors, then followed by ORA-00474: SMON process terminated with error and PMON: terminating instance due to error 474. Ouch database crashed again. It was time to really dig into the issue.

Continuing to review the alert log I went back to the earliest point where the ORA-00600 errors started and found these errors scattered in the alert log.

ORA-00600: internal error code, arguments: [kdourp_inorder2],
[44], [0], [48], [44], [], [], []
ORA-08007: Further changes to this block by this transaction
not allowed
Doing block recovery for file 21 block 456408
SMON: Parallel transaction recovery slave got internal error
SMON: Downgrading transaction recovery to serial

It looked like I had corruption somewhere. So using this SQL I was able to determine where my issue was:

select
   segment_name,
   status
from
   dba_rollback_segs
where
   tablespace_name='undotbs_corrupt'
and
   status = ‘NEEDS RECOVERY’;

My UNDO table space has the corruption. Because I was able to open my database things were looking pretty good. Is I had pending transaction in my UNDO table space things would have been a bit more tricky. My action pla was to get a second UNDO table space up as quickly as possible and take offline the current UNDO tablespace. This will stop the database from crashing.

I created the tablespace with this SQL

create undo tablespace undotbs2 datafile
‘/u02/oracle/oradata/test/undotbs2.dbf’ size 500m;

Alter the database to use the new UNDO tablespace.

alter system set undo_tablespace=undotbs2 scope=both;

Then take the original UNDO tablespace offline

alter tablespace undotbs1 offline; 

Next drop the corrupted segment.

drop rollback segment "_SYSSMU22$";

Then finally drop the old UNDO tablespace

alter tablespace undotbs1 offline;