I have a vendor solution where it captures user data and uses the whole data set within a batch processing solution. The batches are processed every day and which users and for what depends on user behavior, all most all rows are modified each day.
the table structure is as follows,
batch_capture
-------------------
user_id -- varchar2 -- 99% unique
process_type -- varchar2 -- 4 values
status -- varchar2 -- 2 values --idle , processing
process_state -- varchar2 -- 2 values -- failed,successful
next_date -- next processing date -- 10/12 unique values
updated_date -- unique in most cases
server -- name of server -- 20 unique server
-- original indexes
PK = user_id,process_type
index-1 process_type ,next_date
index-2 process_type ,status ,server
initrans 32 for table and 64 for indexes.
currently contains 6 million rows and growing. All most all rows will be processed. So there is heavy block contention.
The solution is hosted in an Oracle 12.2 RAC 3 nodes instance. Block size is 8K
Top 5 wait events
==============
enq:TX row lock contention
gc cr block 2 way
gc block busy
gc current block busy
etc ..
So in general cluster waits are at the top and it consumes/adds this overhead to the batch process
All 20 App servers will hit the table and use an update query (step-1) to allocate a 100 rows work unit for that specific server. At a given time, app servers will work on a specific workload using the "process_type", so all app servers will be fighting for the same type of rows in the table.
Once each server acquires its 100 rows workload, (step-2), it will process each row and will update using(step-3) statement.
step - 1
-----------
update batch_capture
set
status = 'processing'
,server = ?
,updated_date= ?
where
user_id in (
select
user_id
from
(
select
user_id
from
batch_capture
where
process_type = ?
and status = 'idle'
and next_date <= ?
order by
next_date asc
)
where rownum <= 100
)
and process_type = ?
and status <> 'processing';
-- step 2, fetch all 100 records and process those records
------------------------------------------------------------------------------
select
*
from
batch_capture
where
process_type = ?
and status = 'processing'
and server= ?
-- step 3, update the next state for each record that it processed during the batch.
---------------------------------------------------------------------------------------------------------------
update batch_capture
set process_state = ?
,status = ?
,updated_date = ?
,next_date = ?
where
user_id = ?
and process_type = ?
and server = ?
Since this is a vendor solution, I have limited options to change. I thought of changing the table from a single table to a partition table.
I tried with HASH partitioning the indexes on "user_id,process_type" and then the table but it did not help. Because the allocation batch statement (step-1) is doing a full index/table scan.
I am seeing two distinct access patterns combined in this solution. Since the table can only be organized in 1 way it is somewhat challenging as of now.
My latest attempt is to partition this with LIST-LIST .
Also I have changed the index structure to,
index-1 PK removed
-- local none-prefixed
new index -1 - process_type,status,next_date,user_id (4 column covering index to support step-1 update sql)
new index -2 - server, process_type,user_id
new index -3- server, process_type,status
Partitioned the table with 12.2 auto LIst partitioning level -1 with "Server+process_type" and sub-partitioned on status (IDLE/PROCESSING) . I have enabled the row movement.
Step 2 and 3 SQL's are performing very well but the step-1 update is performing pretty bad!
step-1, allocation update is fighting among multiple instances of itself sent by different App servers as of now and the SQL's are timing out and batch processing is hindered.
Any ideas on how I can partition this table to meet the above 3 SQL's optimally? Invisible virtual columns?
Thanks!
Message was edited by: user591200
Message was edited by: user591200