Our multi-terabyte DW system typically has high PX qref latch waits during direct path loads with parallelism. Below is a sample statement we’re running as a benchmark. The source table has 100 mill rows. The target table is truncated just prior to the load.
The database is 11.2.0.3.8 running on a RHEL6 VM with 512G RAM and 16 CPU's. We run the benchmark when the server is idle and 50% of wait time is for PX qref latch. This gives us about 275,000 transactions(insert) per second(tps). This is with the parallel_execution_message_size (PEMS) set to the default value of 16KB. We've improved throughput and reduced PX qref latches by increasing PEMS to its max value of 64k. Throughput went to 320k tps. But if I have another parallel process running with 32 slaves and no other active sessions, we see significant increases in PX qref latch waits and 1/3 less thoughput from our benchmark.
With 16 CPUs I'd expect running 40 concurrent parallel slaves would not negatively impact performance, but from the increase in PX qref latches and reduced throughput it seems it does.
We've reviewed SAR disk data when testing is going on and IO seems good. Avg svctm is always under 3 on busy disks.
Any suggestions on how to reduce PX qref latches or other tuning suggestions?
INSERT /*+ append */ INTO ops$ORACLE.TEST_TP
NOLOGGING (select /*+ parallel(t,8) */ * from ops$ORACLE.TEST_TP2 t)
init.ora values
parallel_adaptive_multi_user | boolean | TRUE |
parallel_automatic_tuning | boolean | TRUE |
parallel_degree_limit | string | CPU |
parallel_degree_policy | string | AUTO |
parallel_execution_message_size | integer | 65536 |
parallel_force_local | boolean | FALSE |
parallel_instance_group | string |
parallel_io_cap_enabled | boolean | FALSE |
parallel_max_servers | integer | 160 |
parallel_min_percent | integer | 0 |
parallel_min_servers | integer | 0 |
parallel_min_time_threshold | string | AUTO |
parallel_server | boolean | FALSE |
parallel_server_instances | integer | 1 |
parallel_servers_target | integer | 64 |
parallel_threads_per_cpu | integer | 1 |
recovery_parallelism | integer | 0 |
pga_aggregate_target | big integer | 225G |
sga_target | big integer | 275G |
memory_target | big integer | 0 |
filesystemio_options | string | SETALL |