Oracle 11g SecureFile System-Part 1- Deduplication of LOBs
Oracle provides SecureFile System in 11g as an improved solution for semi–strunctured data, for example xml and images files stored as CLOB or BLOB. Oracle SecureFiles is backward compatible with Oracle 10g LOB APS and has a few impressive features, including Compression/Encryption/De-duplication. Here we do some experiences regarding de-dupication.
With de-duplication enabled SecureFiles, Oracle server will automatically detects duplicate SecureFiles, and stores only a single physical copy, therefore minizing storage usage.
First, I create two tables T1 and T2.
create table t1(c1 int, c2 clob)
LOB(c2) STORE AS SECUREFILE (
DEDUPLICATE
CACHE
) tablespace users;
create table t2(c1 int, c2 clob)
LOB(c2) STORE AS SECUREFILE (
KEEP_DUPLICATES
CACHE
) tablespace users;
Each table has a CLOB column C2, while T1 stores CLOB in De-Duplicate enabled SecureFile and T2 stores otherwise.
Before we move forward, let’s review how to calculate space used by LOB in the database.
SELECT SEG.BYTES
FROM DBA_SEGMENTS SEG JOIN DBA_LOBS LOB ON SEG.SEGMENT_NAME = LOB.SEGMENT_NAME
WHERE LOB.TABLE_NAME = ‘T1′
AND SEG.OWNER = ‘SYSTEM’;
Now, ready to load some semi-structed data into LOBs. Here, the lob is a XML file books.xml with 9076 bytes in size. I load it 100 times into T1 and T2 respectively.
declare
l_clob clob;
l_file bfile;
begin
for i in 1 .. 100
loop
insert into t1 values ( i, EMPTY_CLOB() )
return c2 into l_clob;
l_file := bfilename( ‘DATA_LOAD_DIR’, ‘books.xml’);
dbms_lob.fileopen( l_file );
dbms_lob.loadfromfile( l_clob, l_file, dbms_lob.getlength( l_file ) );
dbms_lob.fileclose( l_file );
end loop;
for i in 1 .. 100
loop
insert into t2 values ( i, EMPTY_CLOB() )
return c2 into l_clob;
l_file := bfilename( ‘DATA_LOAD_DIR’, ‘books.xml’);
dbms_lob.fileopen( l_file );
dbms_lob.loadfromfile( l_clob, l_file, dbms_lob.getlength( l_file ) );
dbms_lob.fileclose( l_file );
end loop;
end;
/
COMMIT;
First, we use dbms_lob.getlength to see how big in size is the over-all lobs.
Because there are 100 rows of same 9076 byte XML files, we get the same total amount of LOB data size in T1 and T2.
select sum(dbms_lob.getlength (C2)) from SYSTEM.T1;
– Return: 907600
select sum(dbms_lob.getlength (C2)) from SYSTEM.T2;
– Return: 907600
Then, we use DBA_SEGMENTS view’s BYTES column to see the space allocated to LOBs. This data including actual space occupied by LOBs, free spaces, and lob undo spaces.
SELECT LOB.TABLE_NAME, SEG.BYTES
FROM DBA_SEGMENTS SEG JOIN DBA_LOBS LOB ON SEG.SEGMENT_NAME = LOB.SEGMENT_NAME
WHERE LOB.TABLE_NAME in (‘T1′, ‘T2′)
AND SEG.OWNER = ‘SYSTEM’;
The result is
T1 327680
T2 3407872
As you can see, T1 table with deduplication enabled used 327K to store 907K byte data, while T2 without deduplication used 3407K to store that amount of data. Deduplication saves space.
As a matter of fact, we modify the testing steps of loading into De-Duplication enable LOB SecureFile. We load 100 rows one-by-one and exam the expansion of LOB segment size. We notice that LOB segment actually stops growing after initially expands to 327680 after the first 5 inserts. Next 95 inserts of duplicated LOBs do not consume more LOB spaces.
With deduplication enabled, Oracle doesn’t materialize the duplicated data. The speed for insert should be faster. Oracle uses different process for different SecureFile options. Comparing the insert into T1 and T2 in the 10046 trace, insert into deduplicate T1 is much faster than insert into T2 without dedups.
PARSING IN CURSOR #4 len=62 dep=1 uid=5 oct=2 lid=5 tim=24020009098 hv=3095865875 ad=’25f98840′ sqlid=’12j6jh6w8fahm’
INSERT INTO T1 VALUES ( 5 , EMPTY_CLOB() ) RETURN C2 INTO :O0
END OF STMT
PARSE #4:c=0,e=19,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=24020009094
BINDS #4:
Bind#0
oacdty=112 mxl=4000(4000) mxlc=00 mal=00 scl=00 pre=00
oacflg=01 fl2=206001 frm=01 csi=178 siz=4000 off=0
kxsbbbfp=0cb09054 bln=4000 avl=00 flg=05
EXEC #4:c=0,e=442,p=0,cr=1,cu=5,mis=0,r=1,dep=1,og=1,tim=24020009614
STAT #4 id=1 cnt=0 pid=0 pos=1 obj=0 op=’LOAD TABLE CONVENTIONAL (cr=1 pr=0 pw=0 time=0 us)’
WAIT #3: nam=’BFILE open’ ela= 1277 =0 =0 =0 obj#=-1 tim=24020011061
WAIT #3: nam=’BFILE get length’ ela= 57 =0 =0 =0 obj#=-1 tim=24020011170
WAIT #3: nam=’BFILE get length’ ela= 53 =0 =0 =0 obj#=-1 tim=24020011382
WAIT #4: nam=’BFILE internal seek’ ela= 6 =0 =0 =0 obj#=-1 tim=24020011453
WAIT #4: nam=’BFILE read’ ela= 42 =0 =0 =0 obj#=71301 tim=24020011660
WAIT #4: nam=’BFILE read’ ela= 8 =0 =0 =0 obj#=71301 tim=24020011694
WAIT #3: nam=’BFILE closure’ ela= 14 =0 =0 =0 obj#=71301 tim=24020011788
EXEC #3:c=0,e=2825,p=0,cr=3,cu=7,mis=0,r=1,dep=0,og=1,tim=24020011829
WAIT #3: nam=’SQL*Net message to client’ ela= 5 driver id=1413697536 #bytes=1 p3=0 obj#=71301 tim=24020011885
WAIT #3: nam=’SQL*Net message from client’ ela= 2979 driver id=1413697536 #bytes=1 p3=0 obj#=71301 tim=24020014897
PARSING IN CURSOR #4 len=62 dep=1 uid=5 oct=2 lid=5 tim=24115003777 hv=1312081138 ad=’25f807fc’ sqlid=’26wjtux739j7k’
INSERT INTO T2 VALUES ( 5, EMPTY_CLOB() ) RETURN C2 INTO :O0
END OF STMT
PARSE #4:c=0,e=64,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=24115003770
BINDS #4:
Bind#0
oacdty=112 mxl=4000(4000) mxlc=00 mal=00 scl=00 pre=00
oacflg=01 fl2=206001 frm=01 csi=178 siz=4000 off=0
kxsbbbfp=0ca79054 bln=4000 avl=00 flg=05
EXEC #4:c=0,e=619,p=0,cr=1,cu=5,mis=0,r=1,dep=1,og=1,tim=24115004513
STAT #4 id=1 cnt=0 pid=0 pos=1 obj=0 op=’LOAD TABLE CONVENTIONAL (cr=1 pr=0 pw=0 time=0 us)’
WAIT #3: nam=’BFILE open’ ela= 3550 =0 =0 =0 obj#=-1 tim=24115008320
WAIT #3: nam=’BFILE get length’ ela= 97 =0 =0 =0 obj#=-1 tim=24115008584
WAIT #3: nam=’BFILE get length’ ela= 91 =0 =0 =0 obj#=-1 tim=24115008949
WAIT #4: nam=’BFILE internal seek’ ela= 8 =0 =0 =0 obj#=-1 tim=24115009069
WAIT #4: nam=’BFILE read’ ela= 52 =0 =0 =0 obj#=-1 tim=24115009214
WAIT #4: nam=’BFILE read’ ela= 16 =0 =0 =0 obj#=-1 tim=24115009264
WAIT #3: nam=’BFILE closure’ ela= 22 =0 =0 =0 obj#=-1 tim=24115009879
EXEC #3:c=15625,e=6504,p=0,cr=3,cu=12,mis=0,r=1,dep=0,og=1,tim=24115009951
WAIT #3: nam=’SQL*Net message to client’ ela= 7 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=24115010038
WAIT #3: nam=’SQL*Net message from client’ ela= 4423 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=24115014510