Hi all,
I need help with some analysis and problem solution related to the below case.
The long story:
I'm running into some massive performance problems on two 8-way HP ProLiant DL385 G5 severs with 14 GB ram and a ZFS storage pool in raidz configuration. The servers are running Solaris 10 x86 10/09.
The configuration between the two is pretty much the same and the problem therefore seems generic for the setup.
Within a non-global zone Im running a tomcat application (an institutional repository) connecting via localhost to a Postgresql database (the OS provided version). The processor load is typically not very high as seen below:
NPROC USERNAME SWAP RSS MEMORY TIME CPU
49 postgres 749M 669M 4,7% 7:14:38 13%
1 jboss 2519M 2536M 18% 50:36:40 5,9%
We are not 100% sure why we run into performance problems, but when it happens we experience that the application slows down and swaps out (according to below). When it settles everything seems to turn back to normal. When the problem is acute the application is totally unresponsive.
NPROC USERNAME SWAP RSS MEMORY TIME CPU
1 jboss 3104M 913M 6,4% 0:22:48 0,1%
#sar -g 5 5
SunOS vbn-back 5.10 Generic_142901-03 i86pc 05/28/2010
07:49:08 pgout/s ppgout/s pgfree/s pgscan/s %ufs_ipf
07:49:13 27.67 316.01 318.58 14854.15 0.00
07:49:18 61.58 664.75 668.51 43377.43 0.00
07:49:23 122.02 1214.09 1222.22 32618.65 0.00
07:49:28 121.19 1052.28 1065.94 5000.59 0.00
07:49:33 54.37 572.82 583.33 2553.77 0.00
Average 77.34 763.71 771.43 19680.67 0.00
Making more memory available to tomcat seemed to worsen the problem or at least didnt prove to have any positive effect.
My suspicion is currently focused on PostgreSQL. Turning off fsync boosted performance and made the problem less often to appear.
An unofficial performance evaluation on the database with vacuum analyze took 19 minutes on the server and only 1 minute on a desktop pc. This is horrific when taking the hardware into consideration.
The short story:
Im trying different steps but running out of ideas. Weve read that the database block size and file system block size should match. PostgreSQL is 8 Kb and ZFS is 128 Kb. I didnt find much information on the matter so if any can help please recommend how to make this change
Any other recommendations and ideas we could follow? We know from other installations that the above setup runs without a single problem on Linux on much smaller hardware without specific tuning. What makes Solaris in this configuration so darn slow?
Any help appreciated and I will try to provide additional information on request if needed
Thanks in advance,
Kasper