Skip to content

Observation of Oracle automatic Fast Start Failover on user configurable conditions – archiver stuck

Oracle 11g DB with Fast Start Failover (FSFO) supports automatic failover base on user configuration conditions. Conditions include health conditions and Oracle error conditions

Health conditions, by default would trigger DB FSF failover, are
Stuck Archiver (archive destination is out of space)
Corrupted Controlfile/Dictionary
Logfile unaccesable
Data file offline

Oracle Error conditions have no default. Error condition is easy to set. for example
dgmgrl> ENABLE FAST_START FAILOVER CONDITION 27102;
This specifies that a fast-start failover should be done if an ORA-27102 error is raised.

It is worth mentioning that in the case of failover triggered by user-configurable conditions violation, just as a fast-start failover initiated by application calling the DBMS_DG.INITIATE_FS_FAILOVER function. data guard observer would NOT honor ( skip ) FastStartFailoverThreshold configuration property before attempting a fast-start failover. It would not wait for the time specified by FastStartFailoverThreshold to expire.

dgmgrl provides a quick way to check
DGMGRL> show fast_start failover

Fast-Start Failover: ENABLED

Threshold: 60 seconds
Target: abc123
Observer: abc123-monitor.amazon.com
Lag Limit: 0 seconds
Shutdown Primary: TRUE
Auto-reinstate: TRUE

Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile YES
Stuck Archiver YES
Datafile Offline YES

Oracle Error Conditions:
ORA-27102: out of memory

Curious about how Oracle does it, dgbroker trace file and alert log would satisfy the curious mind a bit.

Using a case of archiver in primary stuck and failover happens on a FSFO with physical standby
Time 1: Primary archiver stuck
SFSO observer ping detected bad health – Stuck Archiver
Immediately, Oracle would disable the service, shutdown the primary, start a failover
——-broker would record these event, usually within a minute ——
Fore: FSFO observer ping detected bad health – Stuck Archiver
set oFoCond Stuck Archiver (0)
Fore: Notifying Oracle Clusterware to disable services and monitoring for Primary Shutdown on Failover
Fore: A user-configurable Fast-Start Failover condition was
detected. The primary is shutting down.
This database will not be automatically reinstated.
Fore: Becoming the shutdown master
Fore: persisting FSFO state flags=, version=, obid= (), threshold=, laglimit
Setting GIMH_STC_SHUT_DOWN
GIMH_STC_SHUT_DOWN set for shutdown master
————————–
Time 2 on standby database, which is immediately, (skipped FastStartFailoverTheshold), after Time1. Gap between time 1 and time 2 is expected to be a few seconds on a healthy system

DMON received failover and start to failover

DMON: Entered rfm_get_chief_lock() for CTL_FAIL, reason 0
DMON: start task execution: FAILOVER

DMON: posting standby instances for FAILOVER phase 1
DMON: status from rfi_post_instances() for CTL_FAIL = ORA-00000
DMON: dispersing message to standbys for FAILOVER phase BEGIN
INSV: Received message for inter-instance publication
req ID 2.1.1234232891321, opcode CTL_FAIL, phase BEGIN, flags 5
DMON: Entered rfmfoexinst() for phase BEGIN
INSV: Reply received for message with
req ID 2.1.1234232891321, opcode CTL_FAIL, phase BEGIN
DMON: posting standby instances for FAILOVER phase 2
INSV: Received message for inter-instance publication
DMON: status from rfi_post_instances() for CTL_FAIL = ORA-00000
req ID 2.1.1234232891321, opcode CTL_FAIL, phase TEARDOWN, flags 5
DMON: dispersing message to standbys for FAILOVER phase TEARDOWN
DMON: Entered rfmfoexinst() for phase TEARDOWN
RSM0: Received Set State Request: rid=0×02031001, sid=0, phid=1, econd=6, sitehndl=0×02001000
Redo Apply Resource: SetState OFFLINE, phase TEAR-DOWN, External Cond GRACEFUL-FAIL-OVER-PHYS_STBY
RSM0: Received Set State Request: rid=0×02012001, sid=9, phid=1, econd=6, sitehndl=0×02001000
Database Resource[IAM=PHYSICAL]: SetState READ-WRITE-XPTON, phase TEAR-DOWN, External Cond GRACEFUL-FAIL-OV

Executing SQL [ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL]
SQL [ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL] Executed successfully
Executing SQL [ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH FORCE]
SQL [ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH FORCE] Executed successfully
Executing SQL [ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WAIT WITH SESSION SHUTDOWN]
SQL [ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WAIT WITH SESSION SHUTDOWN] Executed successfully

Oracle Data Guard broker behavior when an observer is unreachable

DMON would notice observer heartbeat missing immediately if observer is not available.

Once that happens,
First, DMON does healthcheck; RSM0 will report “ORA-16820: fast-start failover observer is no longer observing this database”

Then, DMON does validation of observer by calling rfm_release_chief_lock() with MON_VERIFY; the validation happens multiple times, would take 3~4 minutes, even validation keeps failing.
During this period, dgmgrl reports “ORA-16820: fast-start failover observer is no longer observing this database” if you check the configuration.
If you restart the observer from the same host, the observer would register just fine.
If you start a new observer, broker would not take the new observer as it has not “de-register” the old server.

At last, if old observer is still unresponsive, DMON would call STOP_OBSVR and release rfm_release_chief_lock().
DG brokers would report “ORA-16819: fast-start failover observer not started”.
The new observer’s location does not matter anymore.

Partitioning exchange between compressed and non-compressed tables with and without indexes

It might come to the time that you want to compress a large partitioned table one partition a time. Without indexes, you can exchange compressed standalone table into a non-compressed table; and Vice Versa. With indexes, you’d be careful about indexes compression attribute. If any index on either partitioned table or stand-alone table is compressed, you’d also be sure that indexes on the partner is compressed too. Moreover, you’d be sure the prefix degree of compress should be the same on both tables.

Preparation:
create table liang_t1 ( cyear number, OWNER VARCHAR2(30), OBJECT_NAME VARCHAR2(128), SUBOBJECT_NAME VARCHAR2(30), OBJECT_ID NUMBER )
partition by range (cyear)
( partition p2010 values less than (2011),
partition p2011 values less than (2012));

insert into liang_t1 select ’2010′, owner, object_name, subobject_name, object_id from dba_objects;
insert into liang_t1 select ’2011′, owner, object_name, subobject_name, object_id from dba_objects;
Commit;

Case 1: without indexes, you can exchange compressed standalone table into a non-compressed table.

select table_name, partitioned, compression, compress_for from dba_tables where table_name = ‘LIANG_T1′;

TABLE_NAME PAR COMPRESS COMPRESS_FOR
—————————— — ——– ————
LIANG_T1 YES

select table_name, partition_name, compression, compress_for from dba_tab_partitions where table_name = ‘LIANG_T1′;

TABLE_NAME PARTITION_NAME COMPRESS COMPRESS_FOR
—————————— —————————— ——– ————
LIANG_T1 P2010 DISABLED
LIANG_T1 P2011 DISABLED

create table liang_t1_comp compress for oltp as select * from liang_t1 partition (p2010) order by owner, object_name;

select table_name, partitioned, compression, compress_for from dba_tables where table_name = ‘LIANG_T1_COMP’;

TABLE_NAME PAR COMPRESS COMPRESS_FOR
—————————— — ——– ————
LIANG_T1_COMP NO ENABLED OLTP

alter table liang_t1 exchange partition p2010 with table liang_t1_comp with validation;

Vola, you get a compressed partition into a non-compressed table.

select table_name, partitioned, compression, compress_for from dba_tables where table_name = ‘LIANG_T1_COMP’;

TABLE_NAME PAR COMPRESS COMPRESS_FOR
—————————— — ——– ————
LIANG_T1_COMP NO DISABLED

select table_name, partition_name, compression, compress_for from dba_tab_partitions where table_name = ‘LIANG_T1′;

TABLE_NAME PARTITION_NAME COMPRESS COMPRESS_FOR
—————————— —————————— ——– ————
LIANG_T1 P2010 ENABLED OLTP
LIANG_T1 P2011 DISABLED

You can exchange a compressed partition out with non-compressed table.

alter table liang_t1 exchange partition p2010 with table liang_t1_comp with validation;

Case 2: Partition exchange fails if one of exchange partner has compression indexes and the other does not.

create index idx_liang_t1_first3 on liang_t1 (cyear, owner, OBJECT_NAME) local;
select index_name, partition_name, compression from dba_ind_partitions where index_name = ‘IDX_LIANG_T1_FIRST3′;

INDEX_NAME PARTITION_NAME COMPRESS
—————————— —————————— ——–
IDX_LIANG_T1_FIRST3 P2010 DISABLED

IDX_LIANG_T1_FIRST3 P2011 DISABLED

create index idx_liang_t1_comp_first3 on liang_t1_comp (cyear, owner, OBJECT_NAME) compress 1;

select index_name, compression, prefix_length from dba_indexes where index_name = ‘IDX_LIANG_T1_COMP_FIRST3′;

INDEX_NAME COMPRESS PREFIX_LENGTH
—————————— ——– ————-
IDX_LIANG_T1_COMP_FIRST3 ENABLED 1

alter table liang_t1 exchange partition p2010 with table liang_t1_comp including indexes with validation;

alter table liang_t1 exchange partition p2010 with table liang_t1_comp including indexes with validation
*
ERROR at line 1:
ORA-28665: table and partition must have same compression attribute

Case 3: Partition exchange fails if one of exchange partner has compression indexes with different prefix degree.

drop index idx_liang_t1_first3;
create index idx_liang_t1_first3 on liang_t1 (cyear, owner, OBJECT_NAME) compress 1 local;

drop index idx_liang_t1_comp_first3;
create index idx_liang_t1_comp_first3 on liang_t1_comp (cyear, owner, OBJECT_NAME) compress 2;

alter table liang_t1 exchange partition p2010 with table liang_t1_comp including indexes with validation;
alter table liang_t1 exchange partition p2010 with table liang_t1_comp including indexes with validation
*
ERROR at line 1:
ORA-28665: table and partition must have same compression attribute

Case 4: Only both indexes are compressed, partition exchange “including indexes with validation” would succeed.

drop index idx_liang_t1_first3;
create index idx_liang_t1_first3 on liang_t1 (cyear, owner, OBJECT_NAME) compress 1 local;

select index_name, compression, prefix_length from dba_indexes where index_name = ‘IDX_LIANG_T1_FIRST3′;

INDEX_NAME COMPRESS PREFIX_LENGTH
—————————— ——– ————-
IDX_LIANG_T1_FIRST3 ENABLED 1

drop index idx_liang_t1_comp_first3;

create index idx_liang_t1_comp_first3 on liang_t1_comp (cyear, owner, OBJECT_NAME) compress 1;

alter table liang_t1 exchange partition p2010 with table liang_t1_comp including indexes with validation;

Table altered.

Tiny tip – “clear logfile group” to create redo logs on Standby

If the redo log files are missing from Standby, e.g. creating a standby from backup, you can just do “clear logfile group”.

On standby DB, if you clear logfile group without log_file_name_convert being set, you will see
SQL> alter database clear logfile group 1;
alter database clear logfile group 1
*
ERROR at line 1:
ORA-19527: physical standby redo log must be renamed
ORA-00312: online log 1 thread 1:
‘/path/filename’

With the right log_file_name_convert setting, you will see

SQL> alter database clear logfile group 1;
Database altered.

Data files will be created on standby by Oracle.

Using bulk binds for fewer redo records

Redo logs are made of redo records; each redo record is composed by redo change vectors, each of which describes a DB block level change.

Insert with bulk binds, aka bulk insert, can boost your program’s performance by processing many rows from a cursor in one call instead of one row at a time. It also generates fewer redo change vectors therefore less amount of redo logs records.

create table t1 (
a number,
b varchar2(10)
);

[Case 1] Insert one row at a time.

DECLARE
v_Counter BINARY_INTEGER := 1;
BEGIN
WHILE v_Counter <= 10 LOOP
INSERT INTO t1 (a, b) VALUES (v_Counter, v_counter);
v_Counter := v_Counter + 1;
END LOOP;
END;
/
commit;

I dump the redo logs about the changes, after finding redo log name and DBA information.
ALTER SYSTEM DUMP LOGFILE '/…/redolog-1554-1-1232324.arc' DBA MIN 1 39898 DBA MAX 1 39898;

What I see is 10 redo change records. 1 redo record per insert. I paste 1 of 10 here.

REDO RECORD – Thread:1 RBA: 0×000612.0000014d.0140 LEN: 0x016c VLD: 0×01

CHANGE #1 TYP:0 CLS:19 AFN:2 DBA:0×00800099 OBJ:4294967295 SCN:0×0000.00d3ccef SEQ: 1 OP:5.2 ENC:0

CHANGE #2 TYP:0 CLS:20 AFN:2 DBA:0x0081b92d OBJ:4294967295 SCN:0×0000.00d3ccee SEQ: 3 OP:5.1 ENC:0

Tablespace Undo: No
0×00000000 prev ctl uba: 0x0081b92d.03bb.07
..
CHANGE #3 TYP:0 CLS: 1 AFN:1 DBA:0x00409bda OBJ:21674 SCN:0×0000.00d3cd01 SEQ: 3 OP:11.2 ENC:0
KTB Redo

null: –
col 0: [ 2] c1 02
col 1: [ 1] 31

[Case 2] Bulk insert with 10 rows at a time. To get an individual redo log, I switch logfile first.

declare
type testnumberarray is table of number index by binary_integer;
type testchararray is table of varchar2(10) index by binary_integer;

v_a testnumberarray;
v_b testchararray;

cursor cu_cursor is select a,b from t1_copy; — (t1_copy has 10 records.)
begin
open cu_cursor;

loop
fetch cu_cursor bulk collect into v_a, v_b LIMIT 10;

forall i in 1 .. v_a.count
insert into t1
values
( v_a(i), v_b(i) );

exit when cu_cursor%notfound;
end loop;
close cu_cursor;
end;
/

Then I dump the new redo log for the change to blocks.

REDO RECORD – Thread:1 RBA: 0x00061b.00000043.0140 LEN: 0x01dc VLD: 0×01

CHANGE #1 TYP:0 CLS:23 AFN:2 DBA:0x008000b9 OBJ:4294967295 SCN:0×0000.00d41e1a SEQ: 1 OP:5.2 ENC:0

CHANGE #2 TYP:0 CLS:24 AFN:2 DBA:0x0080c891 OBJ:4294967295 SCN:0×0000.00d41e19 SEQ: 1 OP:5.1 ENC:0

KTB Redo

tabn: 0 lock: 0 nrow: 10
slot[0]: 0
slot[1]: 1
slot[2]: 2
slot[3]: 3
slot[4]: 4
slot[5]: 5
slot[6]: 6
slot[7]: 7
slot[8]: 8
slot[9]: 9
CHANGE #3 TYP:0 CLS: 1 AFN:1 DBA:0x00409bda OBJ:21691 SCN:0×0000.00d41e38 SEQ: 3 OP:11.11 ENC:0
KTB Redo

tabn: 0 lock: 1 nrow: 10
slot[0]: 0
tl: 8 fb: –H-FL– lb: 0×0 cc: 2
col 0: [ 2] c1 02
col 1: [ 1] 31
slot[1]: 1
tl: 8 fb: –H-FL– lb: 0×0 cc: 2
col 0: [ 2] c1 03
col 1: [ 1] 32
slot[2]: 2
tl: 8 fb: –H-FL– lb: 0×0 cc: 2
col 0: [ 2] c1 04
col 1: [ 1] 33
slot[3]: 3
tl: 8 fb: –H-FL– lb: 0×0 cc: 2
col 0: [ 2] c1 05
col 1: [ 1] 34
slot[4]: 4
tl: 8 fb: –H-FL– lb: 0×0 cc: 2
col 0: [ 2] c1 06
col 1: [ 1] 35
slot[5]: 5
tl: 8 fb: –H-FL– lb: 0×0 cc: 2
col 0: [ 2] c1 07
col 1: [ 1] 36
slot[6]: 6
tl: 8 fb: –H-FL– lb: 0×0 cc: 2
col 0: [ 2] c1 08
col 1: [ 1] 37
slot[7]: 7
tl: 8 fb: –H-FL– lb: 0×0 cc: 2
col 0: [ 2] c1 09
col 1: [ 1] 38
slot[8]: 8
tl: 8 fb: –H-FL– lb: 0×0 cc: 2
col 0: [ 2] c1 0a
col 1: [ 1] 39
slot[9]: 9
tl: 9 fb: –H-FL– lb: 0×0 cc: 2
col 0: [ 2] c1 0b
col 1: [ 2] 31 30
END OF REDO DUMP

With bulk binds, Oracle generates 1 redo change records for all 10 rows, since only 1 insert sql is fired. As an result, given the same data to insert, with bulk binds generates much less amount of redo logs.

My bloginality – INTP

As an INTP, you are Introverted, iNtuitive, Thinking, Perceiving.
This makes your primary focus on Introverted Thinking with an Extraverted Intution.

This is defined as a NT personality, which is part of Carl Jung’s Rational (Knowledge Seeking) type, and more specifically the Architect or Thinker.

As a weblogger, you might not be as concerned about popularity, but more with the ideas and theories that you strive to understand. Because routines aren’t your strong point, you might be more likely to work on the concept of how to do a blog, but not be as excited to keep it up.

http://bloginality.love-productions.com/index2.php

So it is said. …

Netflix sharing “Availabile-Network Partition Tolerant” Storage practice

In May, I read the news that Netflix had been using Amazon Web Service for a year. Now, in October, Sid Anand from Netflix revealed “Netflix’s transition to high-availability storage system” practice. The paper has 2 attractive sections to me.
1. eBay’s best practice to achieve “Availabile-Network Partition Tolerant” (AP) with Oracle.
2. Netflix’s Best practice of “leaving RDBMS behind” and on SimpleDB.

Good stuff.

Follow

Get every new post delivered to your Inbox.