Hello gurus,
I have a Solaris 10 zone running on a M4000.
SunOS spresapp011 5.10 Generic_150400-48 sun4u sparc SUNW,SPARC-Enterprise
where a Splunk agent aka Splunk forwarder is running.
rxiang@spresapp011$ svcs forwarder
STATE STIME FMRI
online 11:45:27 svc:/application/splunk/forwarder:default
Recently I noticed this SMF service has had quite some restarts, which I believe no user or cronjob action is responsible for.
In /var/svc/log/application-splunk-forwarder:default.log message snippets like below can be seen:
[ Oct 17 03:10:05 Method "stop" exited with status 0 ]
[ Oct 17 03:10:05 Executing start method ("/opt/splunkforwarder/bin/splunk start --accept-license --answer-yes") ]
Splunk> Take the sh out of IT.
Checking prerequisites...
Checking mgmt port [9089]: Checking configuration... Done.
Checking critical directories... Done
Checking indexes...
Validated: _audit _blocksignature _internal _thefishbucket firedalerts history main os summary
Done
open
Checking filesystem compatibility... Done
Checking conf files for typos... Done
All preliminary checks passed.
Starting splunk server daemon (splunkd)...
Done
[ Oct 17 03:10:23 Method "start" exited with status 0 ]
[ Oct 17 03:15:01 Stopping because service restarting. ]
[ Oct 17 03:15:01 Executing stop method ("/opt/splunkforwarder/bin/splunk stop --accept-license --answer-yes") ]
Stopping splunkd...
Shutting down. Please wait, as this may take a few minutes.
.
Stopping splunk helpers...
Done.
[ Oct 17 03:15:08 Method "stop" exited with status 0 ]
[ Oct 17 03:15:08 Executing start method ("/opt/splunkforwarder/bin/splunk start --accept-license --answer-yes") ]
Splunk> Take the sh out of IT.
Checking prerequisites...
Checking mgmt port [9089]: Checking configuration... Done.
Checking critical directories... Done
Checking indexes...
Validated: _audit _blocksignature _internal _thefishbucket firedalerts history main os summary
Done
open
Checking filesystem compatibility... Done
Checking conf files for typos... Done
All preliminary checks passed.
Starting splunk server daemon (splunkd)...
Done
[ Oct 17 03:15:26 Method "start" exited with status 0 ]
[ Oct 17 03:20:01 Stopping because service restarting. ]
[ Oct 17 03:20:01 Executing stop method ("/opt/splunkforwarder/bin/splunk stop --accept-license --answer-yes") ]
Stopping splunkd...
Shutting down. Please wait, as this may take a few minutes.
Stopping splunk helpers...
Done.
[ Oct 17 03:20:05 Method "stop" exited with status 0 ]
[ Oct 17 03:20:05 Executing start method ("/opt/splunkforwarder/bin/splunk start --accept-license --answer-yes") ]
Splunk> Take the sh out of IT.
Checking prerequisites...
Checking mgmt port [9089]: Checking configuration... Done.
Checking critical directories... Done
Checking indexes...
Validated: _audit _blocksignature _internal _thefishbucket firedalerts history main os summary
Done
open
Checking filesystem compatibility... Done
Checking conf files for typos... Done
All preliminary checks passed.
Starting splunk server daemon (splunkd)...
Done
[ Oct 17 03:20:23 Method "start" exited with status 0 ]
[ Oct 17 03:25:01 Stopping because service restarting. ]
[ Oct 17 03:25:01 Executing stop method ("/opt/splunkforwarder/bin/splunk stop --accept-license --answer-yes") ]
Stopping splunkd...
Shutting down. Please wait, as this may take a few minutes.
.
Stopping splunk helpers...
Done.
[ Oct 17 03:25:06 Method "stop" exited with status 0 ]
[ Oct 17 03:25:06 Executing start method ("/opt/splunkforwarder/bin/splunk start --accept-license --answer-yes") ]
Splunk> Take the sh out of IT.
Checking prerequisites...
Checking mgmt port [9089]: Checking configuration... Done.
Checking critical directories... Done
Checking indexes...
Validated: _audit _blocksignature _internal _thefishbucket firedalerts history main os summary
Done
open
Checking filesystem compatibility... Done
Checking conf files for typos... Done
All preliminary checks passed.
Starting splunk server daemon (splunkd)...
Done
[ Oct 17 03:25:24 Method "start" exited with status 0 ]
[ Oct 17 03:30:01 Stopping because service restarting. ]
[ Oct 17 03:30:01 Executing stop method ("/opt/splunkforwarder/bin/splunk stop --accept-license --answer-yes") ]
Stopping splunkd...
Shutting down. Please wait, as this may take a few minutes.
.
Stopping splunk helpers...
Done.
[ Oct 17 03:30:08 Method "stop" exited with status 0 ]
[ Oct 17 03:30:08 Executing start method ("/opt/splunkforwarder/bin/splunk start --accept-license --answer-yes") ]
Splunk> Take the sh out of IT.
Checking prerequisites...
Checking mgmt port [9089]: Checking configuration... Done.
Checking critical directories... Done
Checking indexes...
Validated: _audit _blocksignature _internal _thefishbucket firedalerts history main os summary
Done
open
Checking filesystem compatibility... Done
Checking conf files for typos... Done
All preliminary checks passed.
Starting splunk server daemon (splunkd)...
Done
[ Oct 17 03:30:26 Method "start" exited with status 0 ]
[ Oct 17 03:35:00 Stopping because service restarting. ]
[ Oct 17 03:35:00 Executing stop method ("/opt/splunkforwarder/bin/splunk stop --accept-license --answer-yes") ]
Stopping splunkd...
Shutting down. Please wait, as this may take a few minutes.
Stopping splunk helpers...
Done.
[ Oct 17 03:35:05 Method "stop" exited with status 0 ]
[ Oct 17 03:35:05 Executing start method ("/opt/splunkforwarder/bin/splunk start --accept-license --answer-yes") ]
Splunk> Take the sh out of IT.
Checking prerequisites...
Checking mgmt port [9089]: Checking configuration... Done.
Checking critical directories... Done
Checking indexes...
Validated: _audit _blocksignature _internal _thefishbucket firedalerts history main os summary
Done
open
Checking filesystem compatibility... Done
Checking conf files for typos... Done
All preliminary checks passed.
Starting splunk server daemon (splunkd)...
Done
[ Oct 17 03:35:22 Method "start" exited with status 0 ]
[ Oct 17 03:40:00 Stopping because service restarting. ]
[ Oct 17 03:40:00 Executing stop method ("/opt/splunkforwarder/bin/splunk stop --accept-license --answer-yes") ]
Stopping splunkd...
Shutting down. Please wait, as this may take a few minutes.
.
Stopping splunk helpers...
Done.
[ Oct 17 03:40:06 Method "stop" exited with status 0 ]
[ Oct 17 03:40:06 Executing start method ("/opt/splunkforwarder/bin/splunk start --accept-license --answer-yes") ]
Splunk> Take the sh out of IT.
Checking prerequisites...
Checking mgmt port [9089]: Checking configuration... Done.
Checking critical directories... Done
Checking indexes...
Validated: _audit _blocksignature _internal _thefishbucket firedalerts history main os summary
Done
open
Checking filesystem compatibility... Done
Checking conf files for typos... Done
All preliminary checks passed.
Starting splunk server daemon (splunkd)...
Done
[ Oct 17 03:40:23 Method "start" exited with status 0 ]
[ Oct 17 03:45:01 Stopping because service restarting. ]
[ Oct 17 03:45:01 Executing stop method ("/opt/splunkforwarder/bin/splunk stop --accept-license --answer-yes") ]
Stopping splunkd...
Shutting down. Please wait, as this may take a few minutes.
.
Stopping splunk helpers...
Done.
[ Oct 17 03:45:08 Method "stop" exited with status 0 ]
[ Oct 17 03:45:08 Executing start method ("/opt/splunkforwarder/bin/splunk start --accept-license --answer-yes") ]
Splunk> Take the sh out of IT.
Checking prerequisites...
Checking mgmt port [9089]: Checking configuration... Done.
Checking critical directories... Done
Checking indexes...
Validated: _audit _blocksignature _internal _thefishbucket firedalerts history main os summary
Done
open
Checking filesystem compatibility... Done
Checking conf files for typos... Done
All preliminary checks passed.
Starting splunk server daemon (splunkd)...
Done
[ Oct 17 03:45:26 Method "start" exited with status 0 ]
[ Oct 17 03:50:00 Stopping because service restarting. ]
[ Oct 17 03:50:00 Executing stop method ("/opt/splunkforwarder/bin/splunk stop --accept-license --answer-yes") ]
Stopping splunkd...
Shutting down. Please wait, as this may take a few minutes.
I have raised a case to Splunk, Splunk support investigated the Splunk diag output uploaded and responded that these seemed external rather than splunk triggered restarts.
I noticed the max file descriptor setup in the zone is 256 as the default value, which then Splunk recommend to increase to 8192
splunk@spresapp011: ~ $ ulimit -n
256
The question that I'm gonna throw here is that:
What action that SMF will take against a specified SMF service when system resource it is using exceeds the limit? I know the last parameter in a system resource project setting would be a 'deny', like below:
projadd -K "process.max-file-descriptor=(basic,8192,deny)" proj.files
Is there a mechanism in Solaris SMF that under such a scenario the service will be restarted by daemon like svc.startd?
Could the restarts be caused by some other reason?
Would much appreciated for any ideas posted.
Thanks in advance
Richard