Old Linux server running RHEL 4.7 (2.6.9-78.ELsmp) and Oracle 11.2.0.2 XE. (reason why I cannot log a SR as it is XE)
External programs (shell scripts) are executed via DBMS_SCHEDULER. This works well. For a while. Scripts are executed as per the user and group specified in /u01/app/oracle/product/11.2.0/xe/rdbms/admin/externaljob.ora, successfully. No errors. Can confirm execution of scripts via the log files they create.
A strace (including child processes) shows how the job process sets the current user and group id's for execution - the following is for a successful external scheduler execution:
[pid 15778] munmap(0x2a95557000, 4096) = 0
[pid 15778] setgroups(1, [500]) = 0
[pid 15778] setregid(500, 500) = 0
[pid 15778] getgid() = 500
[pid 15778] getegid() = 500
[pid 15778] setreuid(501, 501) = 0
Group id (of group parameter in externaljob.ora) is 500. Likewise, user name parameter in externaljob.ora refers to user id 501.
Some hours later, all external program scheduled jobs start to fail. An strace shows the following - the setreuid() command failing:
[pid 10979] munmap(0x2a95557000, 4096) = 0
[pid 10979] setgroups(1, [500]) = 0
[pid 10979] setregid(500, 500) = 0
[pid 10979] getgid() = 500
[pid 10979] getegid() = 500
[pid 10979] setreuid(501, 501) = -1 EAGAIN (Resource temporarily unavailable)
[pid 10979] rt_sigaction(SIGTERM, {SIG_DFL}, {SIG_IGN}, 8) = 0
No errors reported (by kernel or Oracle), except for this resulting in a failed job inside the database. No other errors in the straces either, except for this specific call failing. Yes, I am pretty sure of this after going through each and every call made by an XE job queue process when starting an external process, that it definitely is the setreuid() call failing that is the direct cause of the problem.
Shutdown the database. Startup the database. Nothing else. And external jobs work just fine again (with successful setreuid() calls).. for some hours... before failing again.
Any ideas as to what can cause intermittent failures of the setreuid() call?
Planning to look at upgrading the o/s to a later release level. But if this is a known issue that can otherwise easily be resolved, so much better (and hopefully easier).
Thanks.
PS. Neglected to explicitly mention that XE processes run as o/s user oracle and group dba. The externaljob.ora file is configured with user and group to execute external programs as. The actual executable (in $ORACLE_HOME/bin) has superuser bits set in order to run the external process as the configured user and group. Oracle SE and EE versions support the same concept and features.