Hi all,
on our workgroup server, we run Nextcloud 13.0.4, PHP 7.1.17, Apache 2.4.33 in a non-global zone of our Solaris 11.3 SRU 34 host. Apache is run by using the (default) MPM model "event", PHP accessed via PHP-FPM. From time to time (irregularly, about one time in two weeks), we see freezes of Apache: While Apache's port 443 is still open, it just freezes any newly opened connection. In this case, typically more than hundred TCP connections between a client and our server in "CLOSE_WAIT" state which will stay there forever.
We don't see anything of big interest in Apache's or Nextcloud's logs. The connections which are stuck in CLOSE_WAIT state were caused by simple accesses to Nextcloud resources (e.g. "PROPFIND /nextcloud/remote.php/dav/files/... HTTP/1.1") - in this case most probably by the Nextcloud client software (which uses WebDAV so synchronize with the server). Restarting the PHP-FPM daemon doesn't solve the problem. The connections between client and Apache still stay in CLOSE_WAIT state. Only restarting Apache solves it.
Interestingly, sending Apache's processes a SIGKILL doesn't remove the CLOSE_WAIT TCP connections immediately, "netstat -aun" shows them for a few minutes longer (by referencing Apache's PID which isn't existing anymore).
I activated a "mod_status" page in Apache which I save every few seconds to debug this further when it occurs for the next time. I think the freeze of Apache is caused by reaching a certain limit in the number of connections. But why do the CLOSE_WAITs occur in the first place? The problem didn't start with SRU 34. We have been experiencing it for a few months, but (must probably due to increasing load) we see it a bit more often recently. We also experience a similar problem on another non-global zone running on the same machine which is primarily used for serving the learning management system Moodle 3.1.12+ on PHP 5.6.36 and the same Apache version.
Does anybody have an idea how to debug it further? Does anybody else experience it? Our Apache configuration is quite close to Solaris'/Apache's default.
Thank you very much in advance for any help!
Kind regards,
Steffen