Friday, December 3, 2010

Parallel Direct Path Inserts In To Partitioned Tables

Combining the DBMS_PARALLEL_EXECUTE package in 11gR2 of the Oracle database with send trail inserts in to partitioned tables is a utilitarian pairing. Another adaptableness aspect we used here is the use of interlude partitioning - so with the pattern, not usually is the ETL loading in to the partitions in an efficient, adjustable doing but the database is handling the partitions in the list too. Here I'll uncover the tender initial step of how it's completed then go about generalizing it in to the tools.

The figure next shows chunks of source information being processed around a pool of jobs essay in to a partitioned aim list in send path. The together govern package has a few tables that conclude the tasks and chunks to be processed by the together work pool, you can question these around the information compendium views (*_PARALLEL_EXECUTE_TASKS and *_PARALLEL_EXECUTE_CHUNKS). The package moreover supports resuming a charge to reprocess unsuccessful chunks that is useful. There is an rudimentary article on DBMS_PARALLEL_EXECUTE value checking out in the May/June Oracle publication from Steven Feuerstein.

In the Oracle SQL abbreviation the assign key value of the assign prolongation stipulation in the INSERT DML provides vicious information that will capacitate us to make a pattern for providing together send trail loads in to partitioned tables.

So if we make the chunking mainstay from DBMS_PARALLEL_EXECUTE utilitarian for identifying the assign key value on top of then we have a winner. The together govern chunking identifier is a numeric value - in the e.g. next the SALES list is partitioned by month, so we can suppose the chunking identifier using YYYYMM (ie. 200812 for December 2008) to act for a month in numeric form and this being converted to a date is to assign key value in the INSERT SQL stipulation using something similar to TO_DATE(200812, 'YYYYMM').

The painting here will bucket a partitioned SALES list that uses interlude partitioning so we obtain a list that the database will succeed the add-on of partitions.

CREATE TABLE sales
( prod_id NUMBER(6)
, cust_id NUMBER
, time_id DATE
, channel_id CHAR(1)
, promo_id NUMBER(6)
, quantity_sold NUMBER(3)
, amount_sold NUMBER(10,2)
)
PARTITION BY RANGE (time_id)
INTERVAL(NUMTOYMINTERVAL(1, 'MONTH'))
( PARTITION p0 VALUES LESS THAN (TO_DATE('1-1-2008', 'DD-MM-YYYY'))
);

The source list I'm using mirrors the aim table, without the partitions, moreover updated a few simple information here for a demo - any lot of rows we have updated will be processed in a lump (imagine it was a lot of information in that partition)

CREATE TABLE src_sales
( prod_id NUMBER(6)
, cust_id NUMBER
, time_id DATE
, channel_id CHAR(1)
, promo_id NUMBER(6)
, quantity_sold NUMBER(3)
, amount_sold NUMBER(10,2)
);
begin
for c in 1..1000000 double back
insert in to src_sales (prod_id,cust_id,time_id) values (1,1,'01-FEB-10');
insert in to src_sales (prod_id,cust_id,time_id) values (1,1,'01-MAR-10');
insert in to src_sales (prod_id,cust_id,time_id) values (1,1,'01-APR-10');
insert in to src_sales (prod_id,cust_id,time_id) values (1,1,'01-MAY-10');
insert in to src_sales (prod_id,cust_id,time_id) values (1,1,'01-JUN-10');
insert in to src_sales (prod_id,cust_id,time_id) values (1,1,'01-JUL-10');
commit;
finish loop;
end;
/

To emanate the charge and chunks is to carrying out we can use the DBMS_PARALLEL_EXECUTE APIs, in the call next we conclude the charge with a name and in this box a SQL matter to pick out the chunks (demo example, should be clever on opening here, ordinarily an indexed numeric margin is used);

begin
start
DBMS_PARALLEL_EXECUTE.DROP_TASK(task_name => 'TASK_NAME');
difference when others then null;
end;
DBMS_PARALLEL_EXECUTE.CREATE_TASK(task_name => 'TASK_NAME');
DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_SQL(task_name => 'TASK_NAME',
sql_stmt =>'select noteworthy to_number(to_char(time_id,''YYYYMM'')) startid, to_number(to_char(time_id,''YYYYMM'')) endid from src_sales', by_rowid => false);
end;

Then we have to conclude the beef of the charge and the number of jobs to routine the chunks, note we am using energetic SQL given the assign key value cannot be a connect non-static - and its value will change in any youngster charge that the together govern engine executes (it itself will pass start_id and end_id as connect variables to this block).

Whilst the on top of is running the together carrying out package spawns 2 jobs (since we indicated together turn of 2). If we rapidly look at the USER_PARALLEL_EXECUTE_CHUNKS perspective we see 6 chunks, given we had 6 noteworthy months of data. we can see next the initial two chunks are in ASSIGNED position and are being processed.

Checking the perspective once again we see 2 have are right away in PROCESSED status, and 2 are ASSIGNED - note the start_id and end_id columns here, these are the connect variables transfered to my PLSQL inhibit in the RUN_TASK routine above, it is these we use is to PARTITION FOR key value.

Finally all chunks are processed and we have processed the information in together essay send trail in to the partitioned aim tables.

I referred to progressing the resume capability, this is moreover really useful. There is other state PROCESSED_WITH_ERROR that will be flagged when the lump being processed has unsuccessful for whatever reason. The RESUME_TASK procedures allow you to retry the charge and reprocess just those chunks that failed,

Interesting stuff, mixing a few capabilities in 11gR2 of the database to speed up estimate and precedence those CPUs! Hopefully this sparks a few other ideas out there. Next up I'll take this in to the information formation collection and express how it may be commoditized and prevent the programming.

No comments:

Post a Comment